1

We have a design challenge here for a project we are working on and I wonder if folks from the community can provide some guidance:

Our product is built in a microservices structure. So we have different services caring for different aspects of the solution. Each service has its own database. Both databases run on Aurora PostgreSQL

There is one service called users which records all users, teams and the relationships between users and teams.

There is also another service which handles a gamification concern. In this second service, we have the idea of "games" and those games can be associated with different teams by their team_ids. In the gamification database we also have a points table which associates a user with some point they may have won for executing whatever action within the system.

So far so good. But here is where it gets tricky:

Imagine we are trying to list a leaderboard with all users from that game and their current points and we'd like to also include users with zero points in this list.

If all tables were together in the same database, this would be easy. We would just inner join teams and users, then left join with points and we would have what we need. A list of users that belong to those teams, together with a sum of their points.

Since these info are not on the same database, this is a little more complicated. So our team has came up with a few different ideas of how to solve for this:

  1. Use an events architecture: every time a user joins or gets removed from a team, we would fire an event and the gamification microservice would use those events to keep its own copy of the user/team relationships.

    Pros: seems like a clean way to handle this within microservices infrastructure
    Cons: we'll create duplicate data, could lead to sync issues if we fail to process events

  2. Make an API call to users service every time we are building the leaderboard, grab all users from those teams and join that data with the points data in application runtime.

    Pros: keep everything where it belongs
    Cons: could be resource-expensive to run constantly

  3. Use something like federated queries to query one database from the other

    Pros: keep everything where it belongs
    Cons: will only work for PSQL and if we have other services in other databases, this solution wouldn't work

In your experience, what are correct ways to handle this challenge? Or which are the angles we may not be exploring correctly here?

Thanks very much

2
  • 1
    Seems like you've done the right thing, which is to evaluate the pros and cons of each approach. The "correct" way is the one that most effectively meets your project's functional and non-functional requirements.CommentedMay 6, 2022 at 18:48
  • 1
    It sounds like that your intended solution is either not scalable or you are not expecting "too many" users when considering to fetch "all users" for the leaderboard. I assume no one is looking at the entire leaderboard all the time and it does not need to be kept up-to-date in real-time? A solution may be to go with option number 2 but design the service communications to fetch users in batches on demand, while maintaining a local cache and limiting cache updates to some reasonable rate.
    – J.R.
    CommentedMay 6, 2022 at 21:21

2 Answers 2

2

Because there are many ways of looking at this and as many ways of solving your challenge, some personal from-the-top-of-my-head thoughts on this while lacking an answer to myriad questions that anyone would need an answer to before making any sort of truly informed decision would be this:

  • Data duplication in a microservices architecture is unavoidable and even desired, depending on your situation. In terms of performance, it might be preferable to have data come from a single service and not have to query multiple apis. Nothing's holding you back of course, it just forces you to deal with possible inconsistency on the consuming side without a clear single source of truth. Don't worry too much about duplication. Storage is cheap. Whether you keep some id in sync or an entire record really makes little to no difference but I would certainly not deal with inconsistencies at application runtime. It's simply not an application's responsibility.

  • One of the standard ways of providing (eventual) consistency across microservice databases is some sort of enterprise service bus mechanism. Each service can publish data onto the bus and interested parties can subscribe to channels on the ESB. I'd avoid point-to-point communication. It doesn't scale well as it's too tightly coupled. Keep your services oblivious of one another. In that regard, I'd personally refrain from federated queries on the PSQL level. Mind you, setting up a good service bus is no mean feat. Synchronization is never trivial, however you decide to implement it but it does get easier once you've done the groundwork.

  • You could keep your individual microservices that communicate amongst themselves over the ESB in the background and offer a search provider like Elastic Search that does the aggregation and indexing and serves as a data storefront, alleviating many of your microservice data duplication concerns and keeps the services "cleaner". It also integrates with most garden variety databases.

  • If you have a need for a leaderboard and you worry about querying multiple apis, then why not simply make a leaderboard microservice?

Of course, none of us can judge whether all this is a good candidate for your case. Much will depend on team, time, budget, need for scaleability and any number of other questions that you and you alone can answer. These are all just personal musings.

    1

    There isn't a problem here.

    • Call the game service to get the score and userIds for the LeaderBoard
    • Loop through the userIds and call the user service to get the username

    Optimising.

    • Paginate the LeaderBoard
    • Add a GetUsersByListOfUserIds method to the user service
    • Cache the compiled LeaderBoard data

    You might worry that this is a slow solution, but even if you are writing a monolith with all the data in one data base and a single object graph you are still going to have outside calls for data enrichment. This is no different.

    Also I would challenge the idea that this is much slower than a single query. You have the same volume of data, perhaps even less than a monolith, and one extra call per page, which you can cache in any case.

      Start asking to get answers

      Find the answer to your question by asking.

      Ask question

      Explore related questions

      See similar questions with these tags.