How to implement time triggered batch processing in a K8s/Java Spring Boot environment?

Question

Environment: Kubernetes Cluster with Spring Boot microservices. One microservice contains a database with a date in one column of a read-only table.

Problem/Requirement: When the date is reached, an event shall be triggered:

some business logic is done
containing a call to a third party system
and the processing is logged in another database table (success or failure)

I see several ways for implementing this:

a kubernetes cron job which accesses the database, checks the date and executes the logic.
a Spring Boot cron job which ...
a Spring Boot with quartz job
a cron job which first writes all found entries into a queue and following cron jobs which process the queue?
Spring Batch ?
???

The problems I see:

Performance: There can be up to 100.000 entries in the database which are to be processed on the same day (even better in one hour). Each entry is handle in 1-2 seconds. (100.000 * 2s = 55h) This requires parallel execution.
Several long running jobs which query the same data from the table (select * where date = today) will certainly conflict with each other.

What architecture would you suggest to solve this problem? What architectures have proved successful for this kind of problem?

null · Accepted Answer · 2020-08-27 16:20:30Z

The architecture all pivots on the extraction of the data.

Extracting and splitting the data in one micro-service that then spawns worker threads will only scale vertically. Not the best if you have a cluster.

So, the question is.

How to have multiple micro-services that know how to select a subset of data to extract and process. If there is time information with the date then maybe a micro-service only works with data within a certain time period. Or, maybe a service only operates on a certain percentage of the records. 10 micro-services each process 10% of the data.

But, how do the micro-services know which data to work on?

Maybe that can be done with configuration. Or maybe, there is a micro-service whose job it is to delegate the records out to other micro-services that do the actual processing. That should be the micro-service that already handles the data. So, as the batch processing micro-services spins up they ask the service you have for the next 1000 records and processes them. Once all records are processed a return value of 0 records tells the micro-service to stop, and potentially die.

The question now is why are these records only processed after the end of the day, why is this a batch? Can the records be processed throughout the day?

If so, then this isn't a batch it's another micro-service, one or two, that polls for data to process.

Thanks, for your ideas. To answer your question, all items "shall" be processed at 0:00, because the day before is to early, and the information must be available at the given day :-/ That's why I need to process all data around midnight. I think I will evaluate two approaches: a) a fast-running service which puts all relevant items into a queue, where several batch-processing services can fetch them and b) a service which provides always the next n items, when a batch-processing service asks for. — Datz, CommentedAug 28, 2020 at 6:26

Stack Exchange Network

How to implement time triggered batch processing in a K8s/Java Spring Boot environment?

1 Answer 1

Hot Network Questions

How to implement time triggered batch processing in a K8s/Java Spring Boot environment?

1 Answer 1

Related

Hot Network Questions