ReneKMueller.com: Computer Diary *

Here my little rant and praise place, where the daily experiences of my programming work are expressed. I publish them with the idea that others might find it useful and benefit from it.

2009/02/23
SQL vs GREP with 230K lines (12MB) GeoLite
last edited 2009/04/23 06:17 (*)
I like to index all my texts (articles, emails) according 'geonames', a database of locations, for that purpose I found a CSV from MaxMind.com: GeoliteCity , and started to create a database with DBD::SQLite, and finally made this comparison.
The dataset is 12MB, with 235,000 lines:
% wc GeoLiteCity-Location.csv 235422 277043 12450133 GeoLiteCity-Location.csv

% time grep \"Marseille\" GeoLiteCity-Location.csv 49739,"FR","B8","Marseille","",43.3000,5.4000,, 0.045u 0.090s 0:00.13 100.0% 105+1040k 0+0io 0pf+0w

vs perl with DBD::SQLite, whereas the geonames.db is 14MB in size, and used in a script with
select city,long,lat from cities where city == 'New York'

where as an index was made for city column, then the command line call:

% time ./geotag Marseille Marseille: Marseille, B8, FR: 43.3, 5.4 0.271u 0.062s 0:00.33 100.0% 10+2054k 0+0io 0pf+0w

The computing time is grep 0.090 vs sql 0.062, which is 1/3 faster, but user time was 6 times longer, which is explainable by the overhead to load perl and the required modules.
Update: Geonames.org seems to have better data with aliases of city names, I used it for a small tool named "geotag", see my post.
Tags: Benchmark, Grep, Sqlite, GeoLite, Geonames | Permalink

All posts or individual posts:
MetaFS - Dealing With Metadata the Proper Way (2013/12/14 00:00)
My Cellphones & Smartphone (2010-2013) (2013/09/02 21:15)
KDE / Kubuntu 12.04: 10+ years terrible GUI, A Systemic Problem of OSS (2013/04/27 10:18)
UNIX Man on Windows 7: VirtualBox + Ubuntu + LXC (2012/11/27 18:24)
Metadata - The Unresolved Mess (2012/07/10 18:59)
Cellphone Networks: Thieves, Insanity & Crap (2010/01/26 12:57)
MacOS-X for a UNIX Man with a PC (2009/09/26 18:43)
Windows XP for a UNIX Man (2009/09/22 16:28)
Server Counting (2009/05/18 09:50)
Automatically Geotag Photos without GPS (2009/04/22 06:28)
Rebirth of FastCGI (2009/04/15 15:17)
Online Advertisement & Income for Web-Site Owners (2009/03/18 21:10)
iPhone JavaScript Frameworks (aka Avoiding Objective-C) (2009/03/14 21:08)
Google - The Almighty Tracker & Advertising Blocking (2009/03/12 21:09)
How To Save 300MB RAM (2009/03/07 21:07)
Verbosity of Programming Languages (2009/03/06 21:06)
Problems with MacOSX (2009/03/03 21:03)
MacOSX: My First Steps (2009/02/24 08:57)
Catch 22 with HDD/DVD Recorder Medion Life (2009/02/24 08:27)
Kubuntu 8.1 as guest on VirtualBox MacOSX host (2009/02/24 00:33)
VirtualBox vs VMWare Fusion on MacOSX (2009/02/24 00:19)
SQL vs GREP with 230K lines (12MB) GeoLite (2009/02/23 19:10)
Kubuntu 8.1: Eye-Candy & Memory Waste (2009/01/24 08:57)
Firefox 2.0.x / 3.0.x - Memory Waste (2009/01/22 18:34)

[ post new entry ] (only for administators)

Title:

Text:

Tags: (separated by commas)

Date (optional):

Password:

Topics
Kubuntu (5)
MacOSX (4)
UNIX (4)
Lighttpd (3)
Linux (3)
AdBlock Plus (2)
Android (2)
Dell (2)
Firefox (2)
Geonames (2)
Google (2)
MaraDNS (2)
Metadata (2)
Perl (2)
Sqlite (2)
VirtualBox.org (2)
Cellphone (1)
VirtualBox.org (1)
!Flickr (1)
!GoogleMaps (1)
!GPS (1)
Advertisement (1)
AFP (1)
Amrok (1)
Apache2 (1)
App Store (1)
Benchmark (1)
Beta Software (1)
Big Brother (1)
Bind9 (1)
Blackberry (1)
Bloatware (1)
Catch 22 (1)
Cellphone (1)
Cellphone Networks (1)
CGI (1)
Collaboration (1)
Crap (1)
Cygwin (1)
DNS (1)
Donations (1)
DVD/HDD Recorder (1)
Ethics (1)
EXIF (1)
Eye Candy (1)
FastCGI (1)
FUSE (1)
GeoLite (1)
Geotagging (1)
Gestures (1)
GMail (1)
Grep (1)
Hassle (1)
Heuristics (1)
iDeneb (1)
IMAP (1)
Insanity (1)
Internet Explorer (1)
iPhone (1)
iTunes Alike (1)
JavaScript (1)
KDE (1)
Keyboard Mapping (1)
Kinect (1)
Laptop (1)
Location Lookup (1)
LXC (1)
MacOS-X (1)
MacPorts.org (1)
Markup (1)
Medion Life (1)
Memory Waste (1)
MetaFS (1)
Micropayment (1)
NoGREP (1)
Not Ready (1)
NSD (1)
NTFS (1)
Objective-C (1)
Old Laptop (1)
Open Source Software (1)
OpenStreetMap (1)
Opera (1)
Paypal (1)
PhoneGap (1)
QuickConnect (1)
Rant (1)
Review (1)
Server Farm (1)
Servers (1)
Smartphone (1)
Systemic Limitation (1)
Thieves (1)
Thunderbird (1)
Ubuntu (1)
Useability (1)
Virtualization (1)
VMWare Fusion (1)
WebServer (1)
Windows 7 (1)
Windows XP (1)
XBMC (1)
XML (1)

.:.