Database design - how to model database tables for similar objects - booleans / enums / inheritance / archived vs active

Question

I only recently started to work with backends. My database is supposed to store information about maven-like artifacts. There are going to be applications and configuration artifacts. Now those artifacts are pretty much alike - they all have group name, artifact name, versions etc. However many times I want my logic to treat config and regular artifacts differently. How do I store this in the database? In programming I've learnt that passing boolean to a function is a bad idea - just make 2 functions. On the other hand code duplication is bad and I can imagine it is the same when desining a database.

Should I keep all those artifacts in 1 table with a boolean / enum column? Should I make separate tables with duplicated columns? Is there anything inheritance-like in databases? How to model it? How does that affect queries performance? Is there any book / article / course that you would recommend and/or is valued among software developers that explains this and many other databases modeling scenarios?

Another scenario is where I want to keep a list of (e.g.) users but also keep an archive of users that no longer exist. Should I keep those records with active users and just put a different enum/boolean? Should I have a separate table for archived users?

The OO paradigm is primarily focused on organising code based on behaviour so is therefore generally unhelpful for relational database modelling, which is focused on data integrity. Try to avoid thinking in 'OO' terms or attempting to draw analogies between relational DB entities with classes/objects; the concept you need is Normalisation where databases in 3rd normal form frequently don't map neatly across to classes or objects in an OO application. — Ben Cottrell, CommentedJun 15, 2022 at 17:17
Thanks @BenCottrell I still think I cant grasp it. How would you model a database of artifacts where: There are many groups. Each group has many projects (artifactName). Each project has many versions/tags over time. Some artifacts are applications and some are configuration. Configuration artifacts are developede by the same group as regular artifacts, they belong to a given artifact, name - could be the same as artifact or have extra "config" in its name and versions are versioned separately from regular artifact versions). — Marcin K., CommentedJun 15, 2022 at 17:52
You're not explaining it very clearly, but if you're saying that configuration artifacts are associated to a given regular artifact, and thus with the data attributes of that artifact, maybe your configuration artifacts just need a foreign key (FK) to a row in the regular artifacts table. You don't actually need to copy the columns, just do a join. What would you say, how familiar (or unfamiliar) are you with database concepts? Do you understand how to relate tables via FKs, or what happens when you do a join? Have you heard of normal forms, etc? — Filip Milovanović, CommentedJun 15, 2022 at 19:21
@FilipMilovanović To be on the same page with naming lets use maven naming convention: groups, artifactIds (which I called "projects") and versions. Config artifacts are associated with one given non-config artifactId. It doesn't make sense to use config with different non-config artifact. Any version of config artifact can be paired with any version of non-config and vice versa - it is entirely user responsibility which combination makes sense. — Marcin K., CommentedJun 16, 2022 at 9:49
The idea for config to have FK to non-config is good but as a separate information, since the only thing that they would always have in common is group. Config artifacts have their own ArtifactId as it is used in each artifacts path in Nexus, where they are "physically" stored. On the other hand config ArtifactId is the non-config's ArtifactId with a postfix. — Marcin K., CommentedJun 16, 2022 at 9:49

Jon Raynor · Accepted Answer · 2022-06-15 17:48:08Z

For your first question, it best to model that as an column on a table. Consider students; there are undergrads and grads. But they live in the same table because they share data attributes. In your code, you would look at that attribute and maybe have logic that only applies to undergrads versus grads. Example: if studenttype === grad

For your second question, there are two approaches, a hard delete and a soft one. A soft one leaves the record but marks it as deleted. This is useful if that user is tied to other data in the system and you still want to maintain that data relation. If the user doesn't have any relations to any data and you don't expect them to ever come back, maybe a hard delete would be better as that would delete the record for good. Before deleting, one can always push that data to an archive for reporting or data analytics purposes.

What would that archive be? A different table in the same database? Different database? I'm sorry for asking so many questions but as I mentioned I'm new to database design despite programming for some time already — Marcin K., CommentedJun 15, 2022 at 18:09
For most systems it's usually sent somewhere else because the requirements are different than your normal transactional system. The data that is archived is not queried or needed as much as your active data. It could also be as simple as an archive table in the same database as you have mentioned. Not "right" answer here. — Jon Raynor, CommentedJun 17, 2022 at 14:41
Even if the archive is a simple as another table in the same database, it has the massive advantage over a soft-deletion flag that it's very difficult to read it by accident. It's easy to mistakenly query data including deleted when you want to exclude deleted if you have to test a flag with every query. — bdsl, CommentedJul 15, 2022 at 18:14
@bdsl, an indexed/materialised view of live/active records might also be an alternative, and a naming scheme that gives primacy to these views. As you say, it can otherwise become quite a burden and constant hazard when querying. — Steve, CommentedJul 23, 2022 at 7:22
"But they live in the same table because they share data attributes." This may be a simple and generally working solution but it is not the only way to reason about things. Just because two conceptually different classes have some overlap does not mean that they should therefore be merged into a single table (= TPH or Table-Per-Hierarchy) Even if there is a clear inheritance between them, there are other approaches that can make more sense (TPT or Table-Per-Type; TPC or Table-Per-Concrete-Type). How you structure your logic and how you store your data are separate considerations. — Flater, CommentedJul 25, 2022 at 9:55

AngleWyrm · Accepted Answer · 2022-07-24 17:18:37Z

A good subject to study for answering this question is database normalization. It will help with what isn't clearly expressed in the question: What uniquely identifies a record. I suggest the primary key to the database table is a composite that includes two columns [ID, isConfiguration]

Here's a video series on database normalization

kiwiron · Accepted Answer · 2022-07-23 07:07:45Z

As you are obviously more at home with class modelling more than database design, I suggest the following approach:

Design your classes, complete with inheritance, lists of pointers to other classes etc.
Read up on Object relational mapping theory and especially the three standard methods (see below) of handling inheritance.
Design the database
Have an experienced database designer review your database design. They will check topics like cardinality, indexing, normalisation, primary key generation etc. This is a great learning opportunity.
Consider using a simple ORM (Object Relational Mapping) product to do most of the mapping work for you. There is a list of ORMS for various languages on Wikipedia If using Java, I would add MyBatis to the list. Java has the JPA ORM inbuilt, and C# has Entity Framework (I've never used it as I program in Java).

Suppose we have three classes Cat, Dog, Snake that inherit from a base class Animal. The three standard methods of object relational mapping are:

One database table Animal: with fields for all of the Cat, Dog and Snake properties. You will need to add a column that defines the class type of the record.
Three database tables: Cat, Dog and Snake. Common fields are repeated in each table.
Four database tales: One table for Animal (containing common fields) plus a table each for Cat, Dog and Snake, containing only the fields personal to each animal.

Typically more than one of these methods will be used in any one large database.

Downvote: I would strongly warn against this approach to database design - it may appear to work for trivial examples such as the one described here, but the approach of applying OO concepts such as inheritance/composition as a means to design a relational data model generally falls apart in most non-trivial data modelling scenarios where object-orientation is a counter-productive mindset for relational modelling. Unless the application has really trivial data requirements, there's no viable alternative to taking the time to learn relational database concepts and normalisation. — Ben Cottrell, CommentedJul 16, 2022 at 9:43
@BenCotterell. First, thanks for explaining your downvote. I used your feedback to add a step for peer review of the database. — kiwiron, CommentedJul 23, 2022 at 6:59
Like Ben I think this is an interesting answer in the capacity of being a cautionary tale. For a start, no application working with real databases (that is, in a typical industrial setting) can assume the right to load the entire contents of the database at once, or to have exclusive access to the entire data at once. How do you even load a Cat object, if it inherits from Animal, and you have no right to retrieve the Animal-related data for the action you're performing on the data? Database records can map to whole objects, but they do not have to, and for good reasons usually won't. — Steve, CommentedJul 23, 2022 at 10:24
@Steve I don't understand why you think that I am advocating loading the entire contents. That would be an extremely silly thing to do. As to your comments on permissions, an application needs to be given access to the data it needs. This applies to both ORM and traditional database access. — kiwiron, CommentedJul 24, 2022 at 2:40
@kiwiron, then how do you load only the Cat fields, when you've already inherited from Animal and wrote loads of code assuming the presence of all fields? My point is that for many operations concerning Cats, you won't need the Animal data in principle. You might need it as a result of having designed your application to require additional data to be loaded unnecessarily - but your database and network infrastructure will quickly cease to scale if you are extravagant like this. This is one, but certainly not all, of the hazards. (1/2) — Steve, CommentedJul 24, 2022 at 7:54

Steve · Accepted Answer · 2022-07-23 09:23:39Z

I often think it would pay, before people are confused by OO programming, for them to sit down and think about how they would store data if they only had pen and paper, and if they had to perform all operations on the data themselves with the pen.

"Objects" tend to encourage a mental model where everything is a little machine of its own. These quickly compose into Rube-Goldberg machines. They can be fascinatingly intricate, but also extremely inefficient with design effort, computer power, and amenability to change.

By contrast, data on paper is a dead thing. It doesn't have buttons you can press. But it is also there in its most essential form, free from the clutter of all machinery. And people tend to be familiar with the culture of paper records, or with skeuomorphic representations of it (a text document, a spreadsheet, a file, a folder, a link, etc).

A database is effectively a computerisation of structured paper record-keeping, and a rigorous and scientific implementation of many of the concepts and practices that were developed for paper record-keeping.

If you wonder how to design the database and don't know where to start, a good place to start would be to simply write down on paper the data you want to store (or at least a subset that appears to you to be a whole example of what you'd be storing).

At least, having done that much, you can also show other people, or describe something you already have.

There are specific transforms available from paper to (relational) database. For example, if on paper you have a heading and list 5 items under, to express some sort of hierarchy or grouping of those items, then in the database you have a repeated value which represents the key for the group - and potentially, a link to another table which stores things about the group.

People tend naturally using paper to minimise redundancy and avoid all unnecessary work, within their ability to do so.

Sometimes, the trick with people who are very familiar with paper but not computers, is explaining how visual relationships between fields on paper, or widgets that hold many sheets together like treasury tags, staples, and folders, are often represented in the database by shared values - that appear to be redundant repeated data, but are in fact the minimum repetition necessary to express certain kinds of common connection. And it's very easy to think of justifying examples where this actually happens on paper.

But it's easier for anyone to talk about transforming a specific case, rather than consider every possible case.

Stack Exchange Network

Database design - how to model database tables for similar objects - booleans / enums / inheritance / archived vs active

4 Answers 4

Hot Network Questions

Database design - how to model database tables for similar objects - booleans / enums / inheritance / archived vs active

4 Answers 4

Related

Hot Network Questions