Computer
Computer Diary
    RepRap


    "Yes, we can."

    Barack Obama
    4th November 2008, Chicago, USA

    Here my little rant and praise place, where the daily experiences of my programming work are expressed. I publish them with the idea that others might find it useful and benefit from it.

    Tag <Grep>

    Check also other posts with other tags.

    2009/02/23
    SQL vs GREP with 230K lines (12MB) GeoLite
    last edited 2009/04/23 08:17 (*)

    I like to index all my texts (articles, emails) according 'geonames', a database of locations, for that purpose I found a CSV from MaxMind.com: GeoliteCity , and started to create a database with DBD::SQLite, and finally made this comparison.

    The dataset is 12MB, with 235,000 lines:

    % wc GeoLiteCity-Location.csv 
      235422  277043 12450133 GeoLiteCity-Location.csv
    

    % time grep \"Marseille\" GeoLiteCity-Location.csv
    49739,"FR","B8","Marseille","",43.3000,5.4000,,
    0.045u 0.090s 0:00.13 100.0%    105+1040k 0+0io 0pf+0w
    

    vs perl with DBD::SQLite, whereas the geonames.db is 14MB in size, and used in a script with

    select city,long,lat from cities where city == 'New York'
    

    where as an index was made for city column, then the command line call:

    % time ./geotag Marseille
    Marseille:
            Marseille, B8, FR: 43.3, 5.4
    0.271u 0.062s 0:00.33 100.0%    10+2054k 0+0io 0pf+0w
    

    The computing time is grep 0.090 vs sql 0.062, which is 1/3 faster, but user time was 6 times longer, which is explainable by the overhead to load perl and the required modules.

    Update: Geonames.org seems to have better data with aliases of city names, I used it for a small tool named "geotag", see my post.



    Check also other posts with other tags.


    [ post new entry ] (only for administators)

    Title:

    Text:

    Tags: (separated by commas)

    Date (optional):

    Password:
     


    .:.




    Copyright 2007-2016, 2020-2022 © by René K. Müller <spiritdude@gmail.com>
    Illustrations and graphics made with Inkscape, GIMP and Tgif