last edited 2009/04/23 08:17 (
*)
I like to index all my texts (articles, emails) according 'geonames', a database of locations, for that purpose I found a CSV from
MaxMind.com: GeoliteCity , and started to create a database with DBD::SQLite, and finally made this comparison.
The dataset is 12MB, with 235,000 lines:
% wc GeoLiteCity-Location.csv
235422 277043 12450133 GeoLiteCity-Location.csv
% time grep \"Marseille\" GeoLiteCity-Location.csv
49739,"FR","B8","Marseille","",43.3000,5.4000,,
0.045u 0.090s 0:00.13 100.0% 105+1040k 0+0io 0pf+0w
vs perl with DBD::SQLite, whereas the geonames.db is 14MB in size, and used in a script with
select city,long,lat from cities where city == 'New York'
where as an index was made for city column, then
the command line call:
% time ./geotag Marseille
Marseille:
Marseille, B8, FR: 43.3, 5.4
0.271u 0.062s 0:00.33 100.0% 10+2054k 0+0io 0pf+0w
The computing time is grep 0.090 vs sql 0.062, which is 1/3 faster, but user time was 6 times longer, which is explainable by the overhead to load perl and the required modules.
Update: Geonames.org seems to have better data with aliases of city names, I used it for a small tool named "geotag", see my post.