I have a 'processing' function and a 'serializing' function. Currently the processor returns 4 different types of data structures to be serialized in different ways.
Looking for the best practise on what to do here.
def process(input): ... return a,b,c,d def serialize(a,b,c): ... # Different serialization patterns for each of a-c. a,b,c,d = process(input) serialize(a,b,c) go_on_to_do_other_things(d)
That feels janky. Should I instead use a class where a,b,c,d
are member variables?
class VeryImportantDataProcessor: def process(self,input): self.a = ... self.b = ... ... def serialize(self): s3.write(self.a) convoluted_serialize(self.b) ... vipd = VeryImportantDataProcessor() vipd.process(input) vipd.serialize()
Keen to hear your thoughts on what is best here!
Note after processing and serializing, the code goes on to use variable d
for further unrelated shenanigans. Not sure if that changes anything.
a
,b
andc
get used outside ofprocess
andserialize
? Or is the "point" of this code to returnd
, with serialization of some values as a side effect, anda
,b
andc
migrated to the API by necessity of implementation rather than by design?a
,b
, andc
are processed products of a raw data stream, serialized for other live APIs to pull down for use in their different tasks. Theprocess
function here is essentially the SQL-like data manipulation in Spark. After this stage we're done with Spark processing.d
is another related subset of the data, but it goes on to additional steps (ML model training)