12

I maintain a large legacy app working with SQL database. The app uses raw SQL queries to query the database. Together with app rewrite I plan to introduce ORM to work with the database.

The simplest solution would be to keep the database in current state and build ORM mappings based on the existing database. It seems to be quite common approach, and many tools exist on different platforms to automatically reverse-engineer the database into the ORM (e.g. Django - inspectdb, MikroORM - Entity Generator, Entity Framework Scaffolding).

Another opinion however says that one should always design ORM entities first and create database schema based on them and that the reverse engineering idea is perverted and leads to enormous problems. I don't quite understand this. When the existing database is logically designed and normalized, how would it conceptually differ from ORM entities?

Some ideas that come to my mind:

  • ORM may use its own naming conventions in the db. schema
  • ORM may implement table relations (1:N, M:N) differently
  • ORM may offer some functionality for which the existing schema isn't designed (e.g. entities inheritance)
  • existing schema relies on database functionalities that ORM doesn't support (unusual data types, stored functions, triggers...)
  • existing database tables aren't designed with OOP in mind, i.e. database entities don't correspond to programming language objects and their behaviour (but how? this is still vague for me)
  • design of the existing database may assume that the data will be queried using raw SQL queries rather than much simpler ORM manipulation

I understand that ORM creates db. schema automatically and that the resulting schema is often a "black-box" for the developer. However building ORM on top of existing schema is still possible (the complications mentioned above aren't very common) and it seems to be better solution than break the whole database structure of existing app.

Is preserving the existing database worth even when doing complete app rewrite?

(The question is platform independent, however I am personally interested mainly in Node.js and PHP.)

10
  • 4
    I don't know if it's an "anti-pattern" but my belief, based on my own experiences and discussions with other people who have attempted this, that ORMs work much better when the database is designed to be used with them.CommentedDec 6, 2023 at 20:07
  • 4
    I'll add that I regretted using one of the most popular ORMs with a DB that was not designed for it. It was challenging and I had to disable pretty much everything that using an ORM was supposed to buy me. It just added more problems to solve and complicated troubleshooting.CommentedDec 6, 2023 at 20:13
  • 5
    "Is preserving the existing database worth even when doing complete app rewrite" is nothing strangers from the internet can objectively answer for you. The question you should ask yourself first, however, is why you are planning for a complete app rewrite, regardless of using an ORM or not - and if you are sure you are not underestimating the task. DId you ever do a large rewrite like this before? Did you read Spolsky's famous article why it may be a really bad idea?
    – Doc Brown
    CommentedDec 6, 2023 at 22:02
  • 4
    A bit of practical advice, rather than an answer. If you're trying to use an ORM on top of a pre-existing DB, you're better off using a micro-ORM like Dapper that just does the object mapping than something fancier like Entity FrameworkCommentedDec 7, 2023 at 14:41
  • 8
    The questions you should be asking yourself before you get asked and don't have a good answer is "what business value does introducing an ORM provide and does it outweigh both the initial cost of adoption and the ongoing maintenance costs" and you should have convincing data to back up your answer. N.B. the default answer to "should I introduce X to my tech stack" should always be no: the burden of proof lies with the thing being added to justify it's value.CommentedDec 7, 2023 at 15:51

5 Answers 5

22

Reverse engineering was created for precisely your kind of situation. It was meant to be a time-saving tool when using an ORM with an existing database. The tool can do in a few minutes what a human (or team of humans) can do in hours or days, depending on the number of tables being mapped.

I prefer the database-first approach (reverse-engineering) to get that initial blob of data mapping code for existing databases. I will sometimes attempt retrofitting some model and mapping changes after the initial import for certain entities, but only if it would make the rest of my job easier.

I don't typically prefer continuing to reverse-engineer after the initial import. Existing databases tend to have their own idioms and patterns that don't necessarily fit the conventions that most ORMs come with. You can decide to spend time customizing the mappings or conventions. For example, FluentNHibernate in the .NET ecosystem provides a means to customize table and column names, types, etc. so you don't need to hard code "foo_id" in your primary key mappings when the ORM assumes the primary key is named simply id. As the database evolves, I prefer changing the mappings by hand, but that is just personal preference.

Ultimately the code-first or database-first approaches are just tools. You are free to weigh the benefits and drawbacks of using each one and decide for yourself on a database-by-database basis.


To be honest, I don't like code-first approaches, because the database tends to reflect an OO behavioral model rather than a relational data model. If I use the reverse-engineer tool, I tend to add new mappings by hand as the database structure evolves, but I'm a SQL guy. I speak just as fluently in SQL as I do in C#. Your experience might be different, though.

9
  • 9
    +1 for the OO vs Relational comment. Object Oriented code plus an ORM can generate some truly appalling SQL that makes the application run like a slug.
    – Phill W.
    CommentedDec 7, 2023 at 7:27
  • 1
    @PhillW. that would be the ORM n+1 problem, and yes, it can be horrendous. Some of that can be mitigated with configuration or calling additional ORM methods to perform necessary joins, etc.CommentedDec 7, 2023 at 14:33
  • 2
    @GregBurghardt However I think the n+1 problem is caused by data manipulation done by ORM rather than inefficient database structure created by ORM.CommentedDec 7, 2023 at 14:56
  • 4
    "I don't like code-first approaches, because the database tends to reflect an OO behavioral model rather than a relational data model." Truer words have never been spoken on this issue. I tend to want the database put together by human experts in the craft, taking into account current and future business needs as well as performance needs, all based on the good design principles data architects were taught. And thus, reverse engineering.CommentedDec 8, 2023 at 1:09
  • 1
    Even Entity Framework wants to create an anemic domain model without major configuration changes. You've got to be careful with most ORMs. Creating a true rich domain model with them requires a lot of customization to adapt your relational model to your behavioral model. It is easier to create new design patterns like CQRS, or take the architectural hit and move all business logic into "service" classes than it is to spend the time to customize mappings for complex ORMs. It's all a balance of time spent and developer knowledge. Not ideal, but that's the lay of the land.CommentedDec 11, 2023 at 13:10
6

Both patterns are supported. Code first versus Model first. If you have an opiniated database administrator(s) or strong SQL skills one may choose Model first. If one has less SQL expertise, usually Code First is preferred since there is less interest in the database side of things.

If one already has a database, an analysis on the existing database schema is needed to see if it modeled properly from an entity standpoint as well as current business requirement support. This would be necessary before throwing it away. If it was not modeled properly to begin with, then starting over may be the best course of action.

As @JimmyJames has pointed out, most of these ORMs work best when the entities are defined in code and then the schema is "auto-magically" generated. Depending on how complex your existing model is, one may run into issues with your existing schema.

7
  • 1
    The issue I ran into that really made me feel ORMs are 'opinionated' was that I spent a lot of time dealing with the fact that the ORM I was using would try to commit changes before the necessary dependencies were there. The only way I could get it to stop doing so was to manage commits myself. Not a huge deal but that and some other issues that I can't remember made almost everything else about using the ORM overhead. The worst part was that things seemed good up till the end of the project.CommentedDec 6, 2023 at 22:45
  • @JimmyJames By "committing changes" you mean reflecting changes in the ORM entity into the database schema?CommentedDec 7, 2023 at 12:50
  • @user14967413 Changes to the tables. Come to think of it, it was actually executing inserts prior to inserting the parent records. Random failures, or at least I couldn't figure out why it was 'deciding' to insert when it did. IIRC, it didn't happen every time.CommentedDec 7, 2023 at 14:54
  • @JimmyJames Odds are you did a mapping of the database model, and your object model (the interface you use in programming) looked nearly like the database model. Mapping databases was meant to be an easy way to "get started" but it never understands the data, it understands the tables. That means each table is treated like a top-level object. If you restructure you object model, such that you have fewer db backed objects, when you change one object that spans two or three tables, ORM will update the tables in the correct order. It requires a bit more effort in configuration. It's worth itCommentedDec 9, 2023 at 14:14
  • 1
    Orms typically function as an abstraction and de-coupling of the data layer. For example, I can always work with Datareaders and IDataRecords (C#) directly. But then someone said, you can't intermingle business logic with database logic! So, lets return a list of objects back to the business layer. Mapping all this manually stinks, let's create a way to do this faster! Orms! Then someone said you can't return business objects to the client! So, we have another layer to convert business objects to DTOs. Self-imposed complexity for the sake of "layered" architecture.CommentedDec 12, 2023 at 15:44
4

Neither is an anti pattern, they are just two different mindsets regarding database integration:

The application centric: The database is just a service for persisting application state. The role of the ORM is to make this persistence as transparent as possible for the application. The database schema reflect tha data structures of the application. The database is only accessed through the ORM.

The database centric: The database is the core of the business and multiple applications are built on top of this database. The database tend to live longer than any individual application. Each application may have a different data access strategy, and multiple different ORM's may be in use by different applications.

In the application-centric mindset, you adapt the database to the application. In the database-centric mindset, you adapt the application to the database.

ORM's tend fall into one of the mindsets. The ORM's which have grown out of web application frameworks tend to be application-centric, while the ORM's coming from the enterprise world tend to be database-centric.

5
  • 2
    ORMs like Entity Framework treat entities as a DTO for the database. I don't see people really creating rich domain models from these kinds of ORM entities. Libraries like Hibernate for Java and NHibernate for .NET allow mapping to a rich domain model, providing you can be happy with protected class members rather than private for encapsulation.CommentedDec 6, 2023 at 22:08
  • So, I guess I don't view code-first or database-first approaches as adapting the application to the database (or vice-versa). I see them as different tools for generating mappings, rather than adapting the application architecture.CommentedDec 6, 2023 at 22:10
  • 1
    @GregBurghardt One of the places where it really matters is when your model needs multiple database tables for one "model" object. If you go database-centric, you'll effectively have an object per table; but, if you go model-centric, you'll have the ability to combine queries against multiple tables, joining them in the database layer, and returning a single object. So, with database-centric, you might need to "store a new billing address in an invoice" by storing a billing-address, while in application centric, you update the invoice and store the invoice (which then updates joined tables)CommentedDec 8, 2023 at 22:08
  • @EdwinBuck, that is a valid point, however that requires advanced knowledge to configure the ORM appropriately, which in my experience very few people have. Most people accept the default conventions because they are fast to create. Even so, updates to joined tables result in separate DML statements, which is not the N+1 problem. That's simply how relational databases work (which I'm sure you know). The N+1 problem occurs when fetching data from the DB, and mappings are not configured to do JOINS correctly.CommentedDec 8, 2023 at 23:13
  • 1
    @GregBurghardt It requires more knowledge, but since it's how the ORM is made to be configured, I wouldn't call it advanced knowledge. Many people, even professionals, don't bother to read much of the ORM's documentation. When is bores them, they google search to find examples. Follow the easiest examples and you now have an object model of your database, not your domain. Doc's are there, as they were 20 years ago, when I read them.CommentedDec 9, 2023 at 14:00
4

Who "owns" the database? Is it your app? Or some other business? Do you have a continuity plan to support the old app while you develop new one?

Unless the database is for your app only, and you have easy way to migrate, do not touch the database schema.
Use an ORM, and decouple from database using Repository pattern.
What you return from repository is "your" app objects that should be oblivious to database existence. They may reflect database rows 1:1 or be totally different. It is a job of repository to persist them to database and populate them from database.
Usually, these objects will be saved to database 1:1, but sometimes you may need to do some 'magic' under the hood to make things easier, like gluing data together from 3 tables into one object, then persisting said object into three tables back.
The rest of your application should not be aware of the ORM - it should be contained in repository.

9
  • 1
    @Thomas thank you very much for not giving in and saying the only correct thing!
    – Basilevs
    CommentedDec 7, 2023 at 21:09
  • 1
    @user14967413 It is normal practice to have repository per business entity. So UsersRepository, BooksRepository and so on. If an entity is an aggregate root, say Order contains OrderItems and the latter cannot exist alone, you keep them in single repository. Depending on size of application, you will have to decide whether return "dumb" items, or take advantage of eg EntityFramework change tracking to persist them back - you have to be aware of chosen way and cater to that.
    – Thomas
    CommentedDec 8, 2023 at 9:51
  • 1
    @user14967413 Repository is usually written by hand - it should be a thin layer, basically a place to keep your queries. Linq, sql, whatever you use to interact with a db. Please do not return objects that allow further queries to database be realized, but finalized collections. It is absolutely possible to glue three tables together using an orm ;-). You just make three queries, or leverage the existing connections between entities for a single query, and map everything into single object per row. On persisting, you map back into three separate entities.
    – Thomas
    CommentedDec 8, 2023 at 9:54
  • 1
    @user14967413 Fun fact about repository pattern - it does not have to talk to database. I have app where repository reads files from hard drive. Its small app and I was too lazy to setup a database. I worked on application where repository made a web request to another application to load customer data. Latency was surprisingly good. Repository can read and write from anything, as long as it supports the contract of 'give me thing with this keys'. And the app should be none the wiser what it is reading from.
    – Thomas
    CommentedDec 8, 2023 at 10:06
  • 1
    @user14967413 EntityFramework for example. Automatically - Table-Per-Type will split entities into multiple tables based on inheritance. If your schema is different, you can do it manually by querying multiple tables in repository and returning new instance of your model based on data you got (aka mapping). On persisting to database you do inwerse - map the entity into separate entities and persist those to database. You will need to fetch them again to do so. There might be a way to do the second thing automatically too, but I never handled such scenario personally, so can't confrim.
    – Thomas
    CommentedDec 11, 2023 at 7:45
0

Starting with an implementation is an anti-pattern for software engineering.

Engineer your software to solve the business problem. If you are redesigning, redesign by starting with user stories and so on. Build your business objects from a pure data modelling standpoint. When you're done, choose how to implement that solution.

An ORM may be a very good way to do this (implement, that is), but the time to decide that is after you've determined the project specifications. Selecting the implementation before defining the model is premature optimization.

If you are simply retrofitting an ORM into an existing, working DB-based solution, that's not a rewrite. That's a retrofit and you may want to consider why you'd do that. In real-world (engineering) terms, that will cost a lot of money with relatively little return.

As for your related point about re-use of the existing data model, you absolutely should start with the existing data model in requirements gathering. Unless the business process that you're modelling has changed significantly, it will provide the purest source of truth of the actual object model. It is essentially, the best single source of truth for how a system supports the business.

2
  • 1
    ORM never models a domain even remotely appropriately. Every project where ORM model is visible outside of Repository is a disaster. This is has empirical and rational evidence - business logic should have nothing to do with persistence.
    – Basilevs
    CommentedDec 7, 2023 at 21:02
  • 1
    @Basilevs yes, absolutely. "This," refers to implement, not model. Edited to reflect this.CommentedDec 8, 2023 at 12:25

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.