Computer Diary

    "You are not a human being having a spiritual experience, but a spiritual being having a human experience."

    Pierre Teilhard de Chardin

    Here my little rant and praise place, where the daily experiences of my programming work are expressed. I publish them with the idea that others might find it useful and benefit from it.

    Tag <Grep>

    Check also other posts with other tags.

    SQL vs GREP with 230K lines (12MB) GeoLite
    last edited 2009/04/23 08:17 (*)

    I like to index all my texts (articles, emails) according 'geonames', a database of locations, for that purpose I found a CSV from GeoliteCity , and started to create a database with DBD::SQLite, and finally made this comparison.

    The dataset is 12MB, with 235,000 lines:

    % wc GeoLiteCity-Location.csv 
      235422  277043 12450133 GeoLiteCity-Location.csv

    % time grep \"Marseille\" GeoLiteCity-Location.csv
    0.045u 0.090s 0:00.13 100.0%    105+1040k 0+0io 0pf+0w

    vs perl with DBD::SQLite, whereas the geonames.db is 14MB in size, and used in a script with

    select city,long,lat from cities where city == 'New York'

    where as an index was made for city column, then the command line call:

    % time ./geotag Marseille
            Marseille, B8, FR: 43.3, 5.4
    0.271u 0.062s 0:00.33 100.0%    10+2054k 0+0io 0pf+0w

    The computing time is grep 0.090 vs sql 0.062, which is 1/3 faster, but user time was 6 times longer, which is explainable by the overhead to load perl and the required modules.

    Update: seems to have better data with aliases of city names, I used it for a small tool named "geotag", see my post.

    Check also other posts with other tags.

    [ post new entry ] (only for administators)



    Tags: (separated by commas)

    Date (optional):



    Copyright 2007-2016, 2020-2024 © by René K. Müller <>
    Illustrations and graphics made with Inkscape, GIMP and Tgif