Computer Diary

    "For to be free is not merely to cast off one's chains, but to live in a way that respects and enhances the freedom of others."

    Nelson Mandela
    from "Long Walk to Freedom" 1995

    Here my little rant and praise place, where the daily experiences of my programming work are expressed. I publish them with the idea that others might find it useful and benefit from it.

    Tag <Geonames>

    Check also other posts with other tags.

    Automatically Geotag Photos without GPS
    last edited 2009/05/18 11:30 (*)

    Update 2009/04/26: Since the script has grown beyond just a quick hack I moved it to my technology site GeoTag, Automatic Geotagging Photos without GPS .

    I made a couple of hundred photos while my past bicycle travels and wrote diaries and added description to many of the photos.

    I considered to geotag (find proper location and its coordinates of latitude and longitude) the photos - but I postponed it after my first attempts. Now I made another attempt with a database from , with cities1000.txt which lists apprx. 85,000 cities with over 1,000 population (allCountries.txt has 8,000,000 entries which I will test as next), and the most useful data in this dataset are the aliases which lists city names in different languages and variation - that made my second attempt a success.

    Finding Location

    As first I take cities1000.txt and fill a sqlite database, two tables, geocities and geoalias.

    • geocities is the entry of each city: name, alias, lat, long, district and country code
    • geoalias are all aliases pointing to geocities entries

    The geoalias table speeds up, so in theory, the lookup.

    Applying Heuristics

    To find a location from a text isn't that easy one might think, and I came up with some assumptions (aka heuristics) to find a location.

    Anatomy of City Names

    I assumed a city name starts always with an uppercase, followed by lowercase characters. Italian and French city names often have multiple terms, whereas middle terms may be lower case, but first and last term starts with uppercase.

    So I ended up with following pattern matching:

    [A-Z][a-z]+ [A-Za-z]+ [A-Za-z]+ [A-Z][a-z]+
    [A-Z][a-z]+ [A-Za-z]+ [A-Z][a-z]+
    [A-Z][a-z]+ [A-Z][a-z]+
    in that order.

    Additionally I implemented that in case multiple locations with the same name are found, sort according distance to last found location. How does this help? E.g. when I make a tour and travel from Prague/Praha to Vienna, and lookup Vienna I get 5 entries:

    • Vienna (VA,US) 38.9012225,-77.2652604 (#4791160)
    • Vienna (WV,US) 39.3270191,-81.5484578 (#4825976)
    • Vienna (GA,US) 32.0915577,-83.7954518 (#4228440)
    • Vienna (IL,US) 37.4153295,-88.8978435 (#4252025)
    • Wien (09,AT) 48.2084877601653,16.3720750808716 (#2761369)

    but the entry I want is "Wien", and this one is most close geographically to previous looked up "Prag". This small enhancement helped a lot to determine location correctly.

    Geotagging Photos

    For web-site internally I defined a file called "list" which resides in the folder of the photos, it lists every file with its description, like this:

    0001.jpg    Zurich by night
    0002.jpg    Rapperswil in the morning, after long night

    My little perl-script geotag either accepts locations or filenames, if it's a file, it tries to find the location names or if it's a 'list' file, it handles it accordingly and prints out an alike 'list' file I call 'list.geo' which looks like this:

    0001.jpg    geo:name=Zurich (ZH,CH),\
    0002.jpg    geo:name=Rapperswil (SG,CH),\

    the time is the timestamp of the photo.

    Interpolation of Locations

    Since I have many photos not all have description nor location description, there I try to optionally interpolate the location: I use found locations before and after, and interpolate according timestamp a linear location interpolation.

    This works for me quite well, since I usually stop and take a few photos within 1-2mins and then ride again to the next location and make there photos again, and only put a description of the first photo in sequence. Since it takes me 1-2 hours to reach the next location, as I ride the bicycle, using the timestamp of the photo gives a good guess that the photos I take quickly in timely sequence are also near the location of the first photo with the location description.



    • sqlite-3.x , install via local package manager
    • perl module DBD::SQLite
    • perl module Time::HiRes

    install the perl-module either with your local package manager, or

    % perl -MCPAN -e 'install DBD::SQLite'
    % perl -MCPAN -e 'install Time::HiRes'


    Copy geotag into /usr/local/bin (as root) or keep it locally; as first unzip the cities1000.txt.gz or

    % gzip -d cities1000.txt.gz

    As next run geotag and have cities1000.txt in the same directory:

    % ./geotag

    it will create ~/DB/ and populate ~/DB/geotag.db and takes a couple of minutes, on a Pentium4 2.4GHz about 20 mins to create the sqlite database geotag.db. After that the lookup will respond instantly of course.


    % ./geotag prag
    Praha (52,CZ) lat=50.0878367932108,long=14.4241322001241
    % ./geotag vienna
    Wien (09,AT) lat=48.2084877601653,long=16.3720750808716
    % ./geotag boulder
    Boulder (CO,US) lat=40.0149856,long=-105.2705456
    % ./geotag vienna
    Vienna (IL,US) lat=37.4153295,long=-88.8978435
    % ./geotag paris
    Paris (TN,US) lat=36.3020023,long=-88.3267107
    % ./geotag munich
    München (02,DE) lat=48.1376831438553,long=11.5743541717529
    % ./geotag paris
    Paris (A8,FR) lat=48.85341,long=2.3488
    % ./geotag paris,il,us
    Paris (IL,US) lat=39.611146,long=-87.6961374
    % ./geotag vienna,at
    Wien (09,AT) lat=48.2084877601653,long=16.3720750808716
    % ./geotag -f gpx diary.txt > list.gpx

    so it behaves as I wanted, depending on previously found matches determine the perimeter of the next found location.

    I made a test-run based on my ./list file with photo description of my Europe 2008 Tour:

    % ./geotag list > list.geo
    % ./geotag -f gpx list > list.gpx

    and it made 1-2 errors which I corrected by hand, and added one waypoint (Rapperswil) so the path doesn't go over a lake - this is the result:

    Note: I had no GPS coordindates to start with, I solely used the description of my photos to conclude the waypoints. I used for this, used some of their examples, and referenced the list.gpx within the javascript code:

       var lgml = new OpenLayers.Layer.GML("GPX", "list.gpx", {    
          format: OpenLayers.Format.GPX,
          style: {
              strokeColor: 'red', strokeWidth: 5, 
              strokeOpacity: 0.5 },
          projection: new OpenLayers.Projection("EPSG:4326")

    I added some verbosity which is printed to stderr like giving the distance of the looked up locations, whereas location data is stdout (so you can redirect it via > file):

    % ./geotag berlin paris london
    Berlin (16,DE) lat=52.5166667,long=13.4
    Paris (A8,FR) lat=48.85341,long=2.3488
    London (ENG,GB) lat=51.5084152563931,long=-0.125532746315002
            3 locations looked up, 3 successes, 0 failed (0.0%)
            1222.499km cumulative distance


    Since version 0.012 also tcp-based client/server is built in:

    % ./geotag -server

    if this machine has as IP, and then go on a client, and do this:

    % ./geotag -s 'new york'

    You can create a ~/.geotagrc where you can define the defaults:


    and then call

    % ./geotag 'new york'

    and it will use the server to lookup the locations, via tcp on port 10102.

    SQL vs GREP with 230K lines (12MB) GeoLite
    last edited 2009/04/23 08:17 (*)

    I like to index all my texts (articles, emails) according 'geonames', a database of locations, for that purpose I found a CSV from GeoliteCity , and started to create a database with DBD::SQLite, and finally made this comparison.

    The dataset is 12MB, with 235,000 lines:

    % wc GeoLiteCity-Location.csv 
      235422  277043 12450133 GeoLiteCity-Location.csv

    % time grep \"Marseille\" GeoLiteCity-Location.csv
    0.045u 0.090s 0:00.13 100.0%    105+1040k 0+0io 0pf+0w

    vs perl with DBD::SQLite, whereas the geonames.db is 14MB in size, and used in a script with

    select city,long,lat from cities where city == 'New York'

    where as an index was made for city column, then the command line call:

    % time ./geotag Marseille
            Marseille, B8, FR: 43.3, 5.4
    0.271u 0.062s 0:00.33 100.0%    10+2054k 0+0io 0pf+0w

    The computing time is grep 0.090 vs sql 0.062, which is 1/3 faster, but user time was 6 times longer, which is explainable by the overhead to load perl and the required modules.

    Update: seems to have better data with aliases of city names, I used it for a small tool named "geotag", see my post.

    Check also other posts with other tags.

    [ post new entry ] (only for administators)



    Tags: (separated by commas)

    Date (optional):



    Copyright 2007-2016, 2020-2023 © by René K. Müller <>
    Illustrations and graphics made with Inkscape, GIMP and Tgif