Computer
Computer Diary
    RepRap


    "God is not external to anyone, but is present with all things, though they are ignorant that he is so."

    Plotinus
    204-270 AD

    Here my little rant and praise place, where the daily experiences of my programming work are expressed. I publish them with the idea that others might find it useful and benefit from it.

    2009/02/23
    SQL vs GREP with 230K lines (12MB) GeoLite
    last edited 2009/04/23 08:17 (*)

    I like to index all my texts (articles, emails) according 'geonames', a database of locations, for that purpose I found a CSV from MaxMind.com: GeoliteCity , and started to create a database with DBD::SQLite, and finally made this comparison.

    The dataset is 12MB, with 235,000 lines:

    % wc GeoLiteCity-Location.csv 
      235422  277043 12450133 GeoLiteCity-Location.csv
    

    % time grep \"Marseille\" GeoLiteCity-Location.csv
    49739,"FR","B8","Marseille","",43.3000,5.4000,,
    0.045u 0.090s 0:00.13 100.0%    105+1040k 0+0io 0pf+0w
    

    vs perl with DBD::SQLite, whereas the geonames.db is 14MB in size, and used in a script with

    select city,long,lat from cities where city == 'New York'
    

    where as an index was made for city column, then the command line call:

    % time ./geotag Marseille
    Marseille:
            Marseille, B8, FR: 43.3, 5.4
    0.271u 0.062s 0:00.33 100.0%    10+2054k 0+0io 0pf+0w
    

    The computing time is grep 0.090 vs sql 0.062, which is 1/3 faster, but user time was 6 times longer, which is explainable by the overhead to load perl and the required modules.

    Update: Geonames.org seems to have better data with aliases of city names, I used it for a small tool named "geotag", see my post.



    All posts or individual posts:

  • MetaFS - Dealing With Metadata the Proper Way (2013/12/14 01:00)
  • My Cellphones & Smartphone (2010-2013) (2013/09/02 23:15)
  • KDE / Kubuntu 12.04: 10+ years terrible GUI, A Systemic Problem of OSS (2013/04/27 12:18)
  • UNIX Man on Windows 7: VirtualBox + Ubuntu + LXC (2012/11/27 19:24)
  • Metadata - The Unresolved Mess (2012/07/10 20:59)
  • Cellphone Networks: Thieves, Insanity & Crap (2010/01/26 13:57)
  • MacOS-X for a UNIX Man with a PC (2009/09/26 20:43)
  • Windows XP for a UNIX Man (2009/09/22 18:28)
  • Server Counting (2009/05/18 11:50)
  • Automatically Geotag Photos without GPS (2009/04/22 08:28)
  • Rebirth of FastCGI (2009/04/15 17:17)
  • Online Advertisement & Income for Web-Site Owners (2009/03/18 22:10)
  • iPhone JavaScript Frameworks (aka Avoiding Objective-C) (2009/03/14 22:08)
  • Google - The Almighty Tracker & Advertising Blocking (2009/03/12 22:09)
  • How To Save 300MB RAM (2009/03/07 22:07)
  • Verbosity of Programming Languages (2009/03/06 22:06)
  • Problems with MacOSX (2009/03/03 22:03)
  • MacOSX: My First Steps (2009/02/24 09:57)
  • Catch 22 with HDD/DVD Recorder Medion Life (2009/02/24 09:27)
  • Kubuntu 8.1 as guest on VirtualBox MacOSX host (2009/02/24 01:33)
  • VirtualBox vs VMWare Fusion on MacOSX (2009/02/24 01:19)
  • SQL vs GREP with 230K lines (12MB) GeoLite (2009/02/23 20:10)
  • Kubuntu 8.1: Eye-Candy & Memory Waste (2009/01/24 09:57)
  • Firefox 2.0.x / 3.0.x - Memory Waste (2009/01/22 19:34)


    [ post new entry ] (only for administators)

    Title:

    Text:

    Tags: (separated by commas)

    Date (optional):

    Password:
  •  


    .:.




    Copyright 2007-2016, 2020-2024 © by René K. Müller <spiritdude@gmail.com>
    Illustrations and graphics made with Inkscape, GIMP and Tgif