0

I'm struggling a bit for a preferred way to organize a sequence of asynchronous tasks that can be applied in parallel. Say, you are parsing data from many files. In my case I'm using javascript and promises, but this could be most any language. (Hence the weird tags as "javascript" and "language-agnostic".

Option A: Parallelize at the end

1) First, create the chain of tasks for a single file / stream, e.g.

function readAndParseAndConvert(file) { return read(filename) .then((body) => parse(body)) .then((parsed) => convert(parsed)); } 

2) Then, put it all together

Promise.all(theArrayOfFilenames.map(readAndParseAndConvert)); 

Option B: Parallelize each step

1) Create the steps

function readFiles(filenames) { return Promise.all(filenames.map((filename) => read(filename)) } function parseBodies(bodies) { return Promise.all(bodies.map(body) => parse(body)) } function convertAll(parsed) { return Promise.all((parsed) => convert(parsed)); } 

2) Put them together

readFiles(filenames) .then(parseBodies) .then(convertAll); 

Ultimately this may get flagged as "opinion based", but any objective thoughts? Remember that real code would have try/catch, closing files, etc...

1
  • 1
    There really no objective way to compare those two approaches. If you have particular case - implement both and measure… Note that general wisdom is "parallelization of I/O bound tasks rarely helps". Make sure to consider some sort of consumer-producer architecture too, especially if size of your files vary a lot.CommentedMar 7, 2020 at 1:10

1 Answer 1

1

JavaScript does not support parallelism. What you can do is do something while waiting for something else, concurrently. In your scenario, you might process the contents of one file while waiting for the next file to be loaded.

Your second solution, using intermediate Promise.all() steps, prevents this by blocking until all requests have completed each stage: you only allow loading of files concurrently, but not processing concurrently with loading. Concurrent operations that contend for the same resource have little benefit, whether that resource is disk I/O or CPU.

So your first approach which creates no dependencies between jobs is strictly better.

You can still apply the separate processing stages separately as per your second solution, but you should map across an array of promises at each stage:

Promise.all( filenames .map(read) .map(body => body.then(parse)) .map(parsed => parsed.then(convert))) 

    Start asking to get answers

    Find the answer to your question by asking.

    Ask question

    Explore related questions

    See similar questions with these tags.