Update 2009/04/26: Since the script has grown beyond just a quick hack I moved it to my technology site
The-Labs.com: GeoTag, Automatic Geotagging Photos without GPS .
I made a couple of hundred photos while my past bicycle travels and wrote diaries and added description to many of the photos.
I considered to geotag (find proper location and its coordinates of latitude and longitude) the photos - but I postponed it after my first attempts. Now I made another attempt with a database from Geonames.org , with cities1000.txt which lists apprx. 85,000 cities with over 1,000 population (allCountries.txt has 8,000,000 entries which I will test as next), and the most useful data in this dataset are the aliases which lists city names in different languages and variation - that made my second attempt a success.
As first I take cities1000.txt and fill a sqlite database, two tables, geocities and geoalias.
- geocities is the entry of each city: name, alias, lat, long, district and country code
- geoalias are all aliases pointing to geocities entries
The geoalias table speeds up, so in theory, the lookup.
To find a location from a text isn't that easy one might think, and I came up with some assumptions (aka heuristics) to find a location.
I assumed a city name starts always with an uppercase, followed by lowercase characters. Italian and French city names often have multiple terms, whereas middle terms may be lower case, but first and last term starts with uppercase.
So I ended up with following pattern matching:
[A-Z][a-z]+ [A-Za-z]+ [A-Za-z]+ [A-Z][a-z]+
[A-Z][a-z]+ [A-Za-z]+ [A-Z][a-z]+
[A-Z][a-z]+ [A-Z][a-z]+
[A-Z][a-z]+
in that order.
Additionally I implemented that in case multiple locations with the same name are found, sort according distance to last found location. How does this help? E.g. when I make a tour and travel from Prague/Praha to Vienna, and lookup Vienna I get 5 entries:
- Vienna (VA,US) 38.9012225,-77.2652604 (#4791160)
- Vienna (WV,US) 39.3270191,-81.5484578 (#4825976)
- Vienna (GA,US) 32.0915577,-83.7954518 (#4228440)
- Vienna (IL,US) 37.4153295,-88.8978435 (#4252025)
- Wien (09,AT) 48.2084877601653,16.3720750808716 (#2761369)
but the entry I want is "Wien", and this one is most close geographically to previous looked up "Prag". This small enhancement helped a lot to determine location correctly.
For renekmueller.com web-site internally I defined a file called "list" which resides in the folder of the photos, it lists every file with its description, like this:
0001.jpg Zurich by night
0002.jpg Rapperswil in the morning, after long night
...
My little perl-script geotag either accepts locations or filenames, if it's a file, it tries to find the location names or if it's a 'list' file, it handles it accordingly and prints out an alike 'list' file I call 'list.geo' which looks like this:
0001.jpg geo:name=Zurich (ZH,CH),\
geo:long=8.55,lat=47.3666667,time=123238128
0002.jpg geo:name=Rapperswil (SG,CH),\
geo:long=8.82227897644043,geo:lat=47.2255721988597,\
time=123239228
...
the time is the timestamp of the photo.
Since I have many photos not all have description nor location description, there I try to optionally interpolate the location:
I use found locations before and after, and interpolate according timestamp a linear location interpolation.
This works for me quite well, since I usually stop and take a few photos within 1-2mins and then ride again to the next location and make there photos again, and only put a description of the first photo in sequence. Since it takes me 1-2 hours to reach the next location, as I ride the bicycle, using the timestamp of the photo gives a good guess that the photos I take quickly in timely sequence are also near the location of the first photo with the location description.
Requirements
- sqlite-3.x , install via local package manager
- perl module DBD::SQLite
- perl module Time::HiRes
install the perl-module either with your local package manager, or
% perl -MCPAN -e 'install DBD::SQLite'
% perl -MCPAN -e 'install Time::HiRes'
Copy geotag into /usr/local/bin (as root) or keep it locally; as first unzip the cities1000.txt.gz or cities1000.zip:
% gzip -d cities1000.txt.gz
As next run geotag and have cities1000.txt in the same directory:
% ./geotag
it will create ~/DB/ and populate ~/DB/geotag.db and takes a couple of minutes, on a Pentium4 2.4GHz about 20 mins to create the sqlite database geotag.db. After that the lookup will respond instantly of course.
% ./geotag prag
Praha (52,CZ) lat=50.0878367932108,long=14.4241322001241
% ./geotag vienna
Wien (09,AT) lat=48.2084877601653,long=16.3720750808716
% ./geotag boulder
Boulder (CO,US) lat=40.0149856,long=-105.2705456
% ./geotag vienna
Vienna (IL,US) lat=37.4153295,long=-88.8978435
% ./geotag paris
Paris (TN,US) lat=36.3020023,long=-88.3267107
% ./geotag munich
München (02,DE) lat=48.1376831438553,long=11.5743541717529
% ./geotag paris
Paris (A8,FR) lat=48.85341,long=2.3488
% ./geotag paris,il,us
Paris (IL,US) lat=39.611146,long=-87.6961374
% ./geotag vienna,at
Wien (09,AT) lat=48.2084877601653,long=16.3720750808716
% ./geotag -f gpx diary.txt > list.gpx
so it behaves as I wanted, depending on previously found matches determine the perimeter of the next found location.
I made a test-run based on my ./list file with photo description of my Europe 2008 Tour:
% ./geotag list > list.geo
% ./geotag -f gpx list > list.gpx
and it made 1-2 errors which I corrected by hand, and added one waypoint (Rapperswil) so the path doesn't go over a lake - this is the result:
Note: I had no GPS coordindates to start with, I solely used the description of my photos to conclude the waypoints. I used OpenStreetMap.org for this, used some of their examples, and referenced the list.gpx within the javascript code:
...
var lgml = new OpenLayers.Layer.GML("GPX", "list.gpx", {
format: OpenLayers.Format.GPX,
style: {
strokeColor: 'red', strokeWidth: 5,
strokeOpacity: 0.5 },
projection: new OpenLayers.Projection("EPSG:4326")
});
map.addLayer(lgml);
I added some verbosity which is printed to stderr like giving the distance of the looked up locations, whereas location data is stdout (so you can redirect it via > file):
% ./geotag berlin paris london
Berlin (16,DE) lat=52.5166667,long=13.4
Paris (A8,FR) lat=48.85341,long=2.3488
London (ENG,GB) lat=51.5084152563931,long=-0.125532746315002
statistics:
3 locations looked up, 3 successes, 0 failed (0.0%)
1222.499km cumulative distance
Since version 0.012 also tcp-based client/server is built in:
% ./geotag -server
if this machine has 192.168.1.8 as IP,
and then go on a client, and do this:
% ./geotag -s 192.168.1.8 'new york'
You can create a ~/.geotagrc where you can define the defaults:
server: 192.168.1.8
and then call
% ./geotag 'new york'
and it will use the server to lookup the locations, via tcp on port 10102.