Python file reader/writer, with processing pipeline and context manager

Question

The following code does three things:

The filename is determined by the RawData class
The reader and writer send data through a processing pipeline
(Hopefully) makes changes and extensions easy

Regarding (2), the processing pipeline consists of "processors", that can process and deprocess data. For example, if you run the code below and look in ortho_animals.txt, you'll see the JSON version of a dict with (keys, values) reversed.

I left comments out to keep the size of the code to a minimum. Please let me know what you think, with a particular eye on the class designs and interactions.

import contextlib import json class Processor(object): @classmethod def process(cls, data): raise NotImplementedError @classmethod def deprocess(cls, data): raise NotImplementedError class JsonProcessor(Processor): @classmethod def process(cls, data): return json.dumps(data) @classmethod def deprocess(cls, data): return json.loads(data) class SwapProcessor(Processor): @classmethod def process(cls, data): return cls.swap(data) @classmethod def deprocess(cls, data): return cls.swap(data) @classmethod def swap(cls, data): return {v: k for k, v in data.iteritems()} class AggregateProcessor(Processor): def __init__(self, processors): self.processors = processors def process(self, data): return reduce(lambda d, kls: kls.process(d), self.processors, data) def deprocess(self, data): return reduce(lambda d, kls: kls.deprocess(d), self.processors[::-1], data) class RawReader(object): def __init__(self, file_obj, processor): self._file = file_obj self._processor = processor def read(self): data = self._file.read() return self._processor.deprocess(data) class RawWriter(object): def __init__(self, file_obj, processor): self._file = file_obj self._processor = processor def write(self, data): new_data = self._processor.process(data) self._file.write(new_data) class RawData(object): processors = [SwapProcessor, JsonProcessor] aggregator = AggregateProcessor(processors) def __init__(self, obj_type, key): self.obj_type = obj_type self.key = key def _build_filename(self): return self.obj_type + '_' + self.key + '.txt' @contextlib.contextmanager def writer(self): filename = self._build_filename() with open(filename, 'w') as f: yield RawWriter(f, self.aggregator) @contextlib.contextmanager def reader(self): filename = self._build_filename() with open(filename, 'r') as f: yield RawReader(f, self.aggregator) if __name__ == '__main__': data = {'cats': 'suck', 'dogs': 'rule'} rd = RawData('ortho', key='animals') with rd.writer() as writer: writer.write(data) with rd.reader() as reader: new_data = reader.read() assert(new_data['cats'] == data['cats'])

Projski · Accepted Answer · 2016-05-17 22:44:05Z

This code looks very nice! I admire how clean and organized your code is. It appears that you tried to think through how to arrange it, and I really appreciate that.

There are really only a few areas that I see things could be improved: the speed of execution, the robustness of the code when a user attempts to supply invalid data, and the verbiage for readability when others must see this code.

Efficiency

I had to look it up, but I remember reading about how reducing all of your code that follows if __name__ == '__main__': into a single main() function is good practice. After searching around I remember where I read it, it was in the Python Cookbook (3rd edition) by O'Reilly Media. The authors David Beazley and Brian K. Jones wrote:

A little-known fact is that code defined in the global scope like this runs slower than code defined in a function. The speed difference has to do with the implementation of local versus global variables (operations involving locals are faster) ... The speed difference depends heavily on the processing being performed, but in our experience, speedups of 15-30% are not uncommon

Robustness

I can see in a couple places where someone using this code may (whether accidentally or with malicious intent) break it and cause an error, or perhaps not cause an error but make it do something not designed. You may want to consider adding checks throughout your code, so that if say a value in a dict is of a type other than a string, then you know something is wrong and you shouldn't continue with the swap function. I realize adding Exception handling throughout or if/then branches will make for less clean and readable syntax, but it is an area where you may need to take this next if you plan to put this into a production environment.

Readability

Think about how this code will look if someone else has to get up in the middle of the night when something breaks, and half conscious and under duress, go through and figure out where a problem might be. (I think my friend and QA Guru Jon Gluck called it "Drunk Proofing" your code, so even the intoxicated who get a late night emergency call can fix the problem.)

As an example, consider the word "processing." This is a general enough word that you could use it almost anywhere in your code, and across multiple classes and functions even though different uses of this word may be completely unrelated. It reminds me of "college speak" for the word "do." Imagine replacing the word "process" (including "processing" and "processor" and "deprocessor" with the words "doing", "doer" and "undoer", respectively). I don't have a suggestion for what to replace the word process with, and maybe it truly is the best and only word to use here (just as 'handler' is often the best word to use in many cases), but just be careful. Be especially careful if you have used these same terms in other parts of your code that could cause significantly more confusion if someone attempts to wade through a stack trace to troubleshoot an error.

Caveat

Please understand that I think your code is quite beautiful, and frankly I'm not sure my code would look this good if I were tasked with the same project. Regardless, since you posted in code review asking for suggestions, I am trying to poke some holes in it, and make suggestions of where to go from here.

Great job, and keep it up!

Stack Exchange Network

Python file reader/writer, with processing pipeline and context manager

1 Answer 1

Efficiency

Robustness

Readability

Caveat

Hot Network Questions

Python file reader/writer, with processing pipeline and context manager

1 Answer 1

Efficiency

Robustness

Readability

Caveat

Related

Hot Network Questions