JSON-ification and processing of data with Python

Question

My script looks like this:

with open('toy.json', 'rb') as inpt: lines = [json.loads(line) for line in inpt] for line in lines: records = [item['hash'] for item in lines] for item in records: print item

What it does is read in data where each line is valid JSON, but the file as a whole is not valid JSON. The reason for that is because it's an aggregated dump from a web service.

The data looks, more or less, like this:

{"record":"value0","block":"0x79"} {"record":"value1","block":"0x80"}

The code works, it allows me to interact with the data as JSON, but it's so slow that it's essentially useless. Is there a good way to speed up this process?

janos · Accepted Answer · 2017-09-17 09:57:26Z

It's slow because of this line:

for line in lines: records = [item['hash'] for item in lines]

Notice that records will be overwritten for each line. And in fact it's overwritten with the exact same value for each line. In fact you could replace this loop with just this line:

records = [item['hash'] for item in lines]

That should already solve the problem with the slowness. But I would go one step further and get records in one step, by getting hash directly during the first pass over the input lines:

with open('toy.json', 'rb') as input: records = [json.loads(line)['hash'] for line in input] for item in records: print item

He might want to keep lines around to get other fields from the JSON later in the code. — Barmar, CommentedSep 22, 2017 at 22:08

Stack Exchange Network

JSON-ification and processing of data with Python

1 Answer 1

Hot Network Questions

JSON-ification and processing of data with Python

1 Answer 1

Related

Hot Network Questions