0

I have to create a reporting system based on hundreds of different tables in a database (which means a huge volume) and I'd like to know the best practices and/or your wild ideas about it.

Here are the details:

  • I have to generate many different reports, (around 50 report types are defined at this moment, but this number will grow in the future), like the engine parameter changes, prices, profit etc. (over a preselected time period) for different brands.
  • Each brand has a different number of tables in a database, obviously with different field names for the same thing. For example: BMW has a table Motor with a field named 'cc' (for engine displacement), but the Ford stores the same thing in a field named 'cu' in a table named Engine. On the other hand the prices and sales are stored in one table in case of Toyota but in five (foreign key linked) tables in case of Mazda.

The easiest solution would be to implement each report individually for each brand, but this would be time-consuming and pretty much a repetitive job, so I'd like to avoid this approach.

I am thinking on a 'pipeline' style architecture (see attached image), where I just define some rules for each segment and the same algorithm can calculate the result no matter of input parameters. This way if I have to display a new report for Rolls Royce I just define some rules and I get the results. About the rules, I was thinking about table definitions per brands, field mappings, etc.

enter image description here

My problems with this solution is that I have to implement some kind of an interpreter for each step, to analyse and execute the rule sets which is risky, not maintainable, and I'm not sure if it is scalable at all. Are there any other approaches or best practices to avoid repetitive work?

6
  • "based on hundreds of different tables in a database" why number of tables in database matters? Software architecture is not about how many, but what are the parts of system.
    – jimjim
    CommentedAug 9, 2021 at 10:22
  • 2
    Welcome on SE! Question asking for ideas are too broad and in general opinion-based. I have therefore slightly reworded your question. It would be good however if you could narrow it down further, and clarify why you worry that your pipeline approach could not work.CommentedAug 9, 2021 at 11:12
  • 1
    @jimjim: I guess the OP did not mean just "hundreds of different tables", but "a huge number of different reports". For only one report, implementing it manually can be acceptable. For many reports, providing a report generator can make sense. Not looking at the numbers can easily lead to overengineering, hence I tend to disagree to what you wrote - software architecture is also about picking suitable parts for the requirements within a system.
    – Doc Brown
    CommentedAug 9, 2021 at 11:27
  • What about using an off-the-shelve reporting engine? For all major relational DB vendors, there are several products available.
    – Doc Brown
    CommentedAug 9, 2021 at 11:31
  • @Doc Brown: thanks for helping me out, you're right. I just wanted to put an accent on the volume, not the number is what matters, but implementing each report manually takes a lot of time and it is not a solution you can learn and improve from.CommentedAug 9, 2021 at 11:36

1 Answer 1

1

Based on the presented information, a simple interface will cover your reusability needs.

Note: One could argue about using a base class (abstract or not) instead of an interface. I suggest erring on the side of interfaces in languages which don't support multiple inheritance or when the needed contract does not require a base implementation.

For example: BMW has a table Motor with a field named 'cc' (for engine displacement), but the Ford stores the same thing in a field named 'cu' in a table named Engine.

You've established here that while the data is stored differently, you expect to handle the same data (once retrieved). The retrieved data is not specific to a given manufacturer and you can therefore create a single data entity to represent it:

public class Car { public string Make { get; set; } public string Model { get; set; } public int CC { get; set; } } 

This is just a basic example.

However, the fetching of the data is different per manufacturer, so your repositories will be distinctly different. Nonetheless, they share the same reusable expectation: retrieving cars. This is why we define the interface:

public interface ICarRepository { IEnumerable<Car> GetAll(); } 

You can of course expand this interface, e.g. GetByEngineCC(int minCC). This depends on what you need.

This can then be implemented by your manufacturer-specific repositories, which individually have different private logic to fetch the needed data.

public class FordCarRepository : ICarRepository { public IEnumerable<Car> GetAll() { // fetch from the Ford tables } } public class BmwCarRepository : ICarRepository { public IEnumerable<Car> GetAll() { // fetch from the BMW tables } } 

This ensures that you can write report generator logic that works for all possible manufacturers. For example:

public void PrintReport(ICarRepository repo) { var cars = repo.GetAll(); foreach(var car in cars) { Console.WriteLine($"{car.Make} - {car.Model} - {car.CC}cc"); } } 

Notice how this reporting logic works for all manufacturers, provided that these manufacturers' repositories implement the needed interface.

This is the core principle how to handle internally different implementations (i.e. unique data fetching) but with externally common handling (i.e. generalized reports). You can now add new manufacturers into the mix by creating additional concrete repositories but no other code needs to be changed, which is the desirable goal.

On the other hand the prices and sales are stored in one table in case of Toyota but in five (foreign key linked) tables in case of Mazda.

Same story here:

  1. Assuming the sales data is the same (once retrieved), create a reusable SalesFigure DTO
  2. Define a ISalesRepository interface with appropriate methods (e.g. GetSalesForYear(int year)
  3. Create manufacturer-specific sales repositories which implement ISalesRepository
  4. Write your common logic so that it works with ISalesRepository objects, not manufacturer-specific repository types.

This way if I have to display a new report for Rolls Royce I just define some rules and I get the results.

This is not a "rule". This is a different implementation. Your underlying idea is correct but I suggest refraining from calling it a rule, because people will misunderstand your intention.

Essentially, all you have to do is add a Rolls-Royce-specific repository which implements the same interface (cfr step 3 mentioned above).

Note: Because you talked about database tables, this example used different concrete repositories using the same interface. In other cases, it might make more sense to implement this interface on the BLL rather than the DAL. But it's the same principle at play regardless of what layer you implement it on.

    Start asking to get answers

    Find the answer to your question by asking.

    Ask question

    Explore related questions

    See similar questions with these tags.