Making API calls with celery

Question

I'm designing a system for a client where the requirements are:

they upload a JSON file (one object/line)
make a call to an API with the JSON object as the payload
record the state (success/failure) of each API call in a database
make one retry if there's a failure.

I decided to build it out using celery and a sqlite database as the backend. The number of JSON lines is not large — perhaps a couple million at most — which will fit in memory. I have all the individual components working fine (can upload file, can read file, can call API, can write to db, etc.), but I am not sure about the overall architecture of dispatching tasks using celery.

Assuming there are N lines in the file, should I:

Option A:

Create N objects in the database with a result column (initially null).
Create N celery tasks and pass the object id as the parameter and the payload
Make the subtask call the API and update the object's result field to success/failure.
Let celery's retry feature attempt to call the API again in case of failure.

Option B:

Create N objects in the database with a result column (initially null).
Create 1 celery task and pass the entire list of N object ids and N payloads
Loop through all N objects and update the database with the result at each step.
When the previous task finishes, it fires another one-time celery task that reads the database for all objects with failure result and retries them.

I'm favoring option A because of its simplicity but I don't know what the limits are on the number of celery tasks that can be scheduled and if the broker (RabbitMQ) will handle it. With option B, the big risk is that if the celery task were to get terminated for any reason at some line M, then all following objects will never be tried.

Any thoughts on these two or if there's a third better alternative?

ZeroSoul13 · Accepted Answer · 2015-05-20 06:27:19Z

Option A sounds like the way to go as you can set workers it the API independently instead of a huge tasks that only a single worker can manage.

I have a very similar scenario using kombu and Celery:

We get a message posted to RMQ by some integration to a RMQ queue
We have a kombu consumer draining events
When event is received we execute the callback (post to local python queue)
celery gets the message sent over the python queue and process it
Once finished it returns the results and we relay the message to a kombu producer
The producer posts back to RMQ

As you can see, we use basically the same approach as you. We have this in production handling around 2000 messages per hour with no problem.

Stack Exchange Network

Making API calls with celery

Option A:

Option B:

1 Answer 1

Hot Network Questions

Making API calls with celery

Option A:

Option B:

1 Answer 1

Related

Hot Network Questions