Computer
Computer Diary
    RepRap


    "You are not a human being having a spiritual experience, but a spiritual being having a human experience."

    Pierre Teilhard de Chardin
    1881-1955

    Here my little rant and praise place, where the daily experiences of my programming work are expressed. I publish them with the idea that others might find it useful and benefit from it.

    Tag <Grep>

    Check also other posts with other tags.

    2009/02/23
    SQL vs GREP with 230K lines (12MB) GeoLite
    last edited 2009/04/23 08:17 (*)

    I like to index all my texts (articles, emails) according 'geonames', a database of locations, for that purpose I found a CSV from MaxMind.com: GeoliteCity , and started to create a database with DBD::SQLite, and finally made this comparison.

    The dataset is 12MB, with 235,000 lines:

    % wc GeoLiteCity-Location.csv 
      235422  277043 12450133 GeoLiteCity-Location.csv
    

    % time grep \"Marseille\" GeoLiteCity-Location.csv
    49739,"FR","B8","Marseille","",43.3000,5.4000,,
    0.045u 0.090s 0:00.13 100.0%    105+1040k 0+0io 0pf+0w
    

    vs perl with DBD::SQLite, whereas the geonames.db is 14MB in size, and used in a script with

    select city,long,lat from cities where city == 'New York'
    

    where as an index was made for city column, then the command line call:

    % time ./geotag Marseille
    Marseille:
            Marseille, B8, FR: 43.3, 5.4
    0.271u 0.062s 0:00.33 100.0%    10+2054k 0+0io 0pf+0w
    

    The computing time is grep 0.090 vs sql 0.062, which is 1/3 faster, but user time was 6 times longer, which is explainable by the overhead to load perl and the required modules.

    Update: Geonames.org seems to have better data with aliases of city names, I used it for a small tool named "geotag", see my post.



    Check also other posts with other tags.


    [ post new entry ] (only for administators)

    Title:

    Text:

    Tags: (separated by commas)

    Date (optional):

    Password:
     


    .:.




    Copyright 2007-2016, 2020-2024 © by René K. Müller <spiritdude@gmail.com>
    Illustrations and graphics made with Inkscape, GIMP and Tgif