-3

Environment: Kubernetes Cluster with Spring Boot microservices. One microservice contains a database with a date in one column of a read-only table.

Problem/Requirement: When the date is reached, an event shall be triggered:

  • some business logic is done
  • containing a call to a third party system
  • and the processing is logged in another database table (success or failure)

I see several ways for implementing this:

  • a kubernetes cron job which accesses the database, checks the date and executes the logic.
  • a Spring Boot cron job which ...
  • a Spring Boot with quartz job
  • a cron job which first writes all found entries into a queue and following cron jobs which process the queue?
  • Spring Batch ?
  • ???

The problems I see:

  • Performance: There can be up to 100.000 entries in the database which are to be processed on the same day (even better in one hour). Each entry is handle in 1-2 seconds. (100.000 * 2s = 55h) This requires parallel execution.
  • Several long running jobs which query the same data from the table (select * where date = today) will certainly conflict with each other.

What architecture would you suggest to solve this problem? What architectures have proved successful for this kind of problem?

    1 Answer 1

    2

    The architecture all pivots on the extraction of the data.

    Extracting and splitting the data in one micro-service that then spawns worker threads will only scale vertically. Not the best if you have a cluster.

    So, the question is.

    How to have multiple micro-services that know how to select a subset of data to extract and process. If there is time information with the date then maybe a micro-service only works with data within a certain time period. Or, maybe a service only operates on a certain percentage of the records. 10 micro-services each process 10% of the data.

    But, how do the micro-services know which data to work on?

    Maybe that can be done with configuration. Or maybe, there is a micro-service whose job it is to delegate the records out to other micro-services that do the actual processing. That should be the micro-service that already handles the data. So, as the batch processing micro-services spins up they ask the service you have for the next 1000 records and processes them. Once all records are processed a return value of 0 records tells the micro-service to stop, and potentially die.

    The question now is why are these records only processed after the end of the day, why is this a batch? Can the records be processed throughout the day?

    If so, then this isn't a batch it's another micro-service, one or two, that polls for data to process.

    1
    • Thanks, for your ideas. To answer your question, all items "shall" be processed at 0:00, because the day before is to early, and the information must be available at the given day :-/ That's why I need to process all data around midnight. I think I will evaluate two approaches: a) a fast-running service which puts all relevant items into a queue, where several batch-processing services can fetch them and b) a service which provides always the next n items, when a batch-processing service asks for.
      – Datz
      CommentedAug 28, 2020 at 6:26

    Start asking to get answers

    Find the answer to your question by asking.

    Ask question

    Explore related questions

    See similar questions with these tags.