1
\$\begingroup\$

My script looks like this:

with open('toy.json', 'rb') as inpt: lines = [json.loads(line) for line in inpt] for line in lines: records = [item['hash'] for item in lines] for item in records: print item 

What it does is read in data where each line is valid JSON, but the file as a whole is not valid JSON. The reason for that is because it's an aggregated dump from a web service.

The data looks, more or less, like this:

{"record":"value0","block":"0x79"} {"record":"value1","block":"0x80"} 

The code works, it allows me to interact with the data as JSON, but it's so slow that it's essentially useless. Is there a good way to speed up this process?

\$\endgroup\$

    1 Answer 1

    1
    \$\begingroup\$

    It's slow because of this line:

    for line in lines: records = [item['hash'] for item in lines] 

    Notice that records will be overwritten for each line. And in fact it's overwritten with the exact same value for each line. In fact you could replace this loop with just this line:

    records = [item['hash'] for item in lines] 

    That should already solve the problem with the slowness. But I would go one step further and get records in one step, by getting hash directly during the first pass over the input lines:

    with open('toy.json', 'rb') as input: records = [json.loads(line)['hash'] for line in input] for item in records: print item 
    \$\endgroup\$
    1
    • \$\begingroup\$He might want to keep lines around to get other fields from the JSON later in the code.\$\endgroup\$
      – Barmar
      CommentedSep 22, 2017 at 22:08

    Start asking to get answers

    Find the answer to your question by asking.

    Ask question

    Explore related questions

    See similar questions with these tags.