3

I want to understand what is considered best-practice to better align with OOP when handling relational databases. I cannot find any online examples where classes and a more maintainable/re-usable design is implemented for relational databases.

Say we have a few different relational database (CSV table) files:

  • Environment: primary_id (PK), event_id (FK), datetime (SK), weather, temperature, temperature_unit
  • Incident: primary_id (PK), event_id (FK), datetime (SK), incident_category, severity, reported_via
  • Reporter: primary_id (PK), event_id (FK), name, sex, occupation

where PK: primary keys, FK: foreign keys, SK: secondary keys.

A snapshot of the CSV table files where I display a few records per table as an example:

Environment:

  • 1001678125, 10016781, 202401251123, sunny, 19, celsius
  • 1024692335, 10246923, 202406120958, cloudy, 15, celsius

Incident:

  • 1001678125, 10016781, 202401251123, main_entrance_breach, B, serious, police
  • 1024692335, 10246923, 202406120958, side_entrance_breach, C, mild, reception

Reporter:

  • 1001678125, 10016781, 202401251123, Joe, M, police officer
  • 1024692335, 10246923, 202406120958, Dave, M, reception staff

The goal

The application would read each table, load it in an appropriate dataframe (e.g. pandas in Python) and have a variety of methods to exercise on a selection of the data records and carry out calculations on it. For example, one calculation method could be to calculate the number of serious incidents that occurred after 11pm where police was involved.

My idea of having each element (column header) as its own class was to enable the application to present more detailed information and carry out extra calculcations appropriate for each element. For example, the temperature (in combination with the temperature_unit) could have class methods to calculate the average temperature, have methods to categorise values (below 0 with a celsius unit to be labelled as "freezing"), etc.

But also, using OOP should enable the application to extend its functionality (more specific methods/calculations) or adapt to cases where the database file might have an extra column added to it.

This is why I would like to use proper Object-Oriented-Programming design and define classes for my key objects.

--

The OOP design I though of is the following:

An Elementparent (base) class to capture the high-level details of what information each element (column header) would hold: name, abbreviated_name, value, description, data_file_used_in, column_pos_in_data_file, notes.

Create more low-level specific child (derived) classes, one for each specific element (column header), that inherit from the Element class. For example, a primary_id class, an event_id class, a datetime class, etc. each providing specific values to the Element class fields.

Where appropriate, create classes for specific data types. For example, the incident_category element can be one of the following values, A, B, or C.

Then, create a DatabaseFileparent (base) class to capture the details of each database file: filename, size, element(s) where element(s) is all of the element column headers found in a database file.

Finally, create child (derived) classes for each of the three specific database files: Environment, Incident, and Reporter. Each will hold specific values for that file as well as list all element (column headers) found in it.

Initially the data from each CSV file will be loaded into an appropriate data type, for example using the pandas package in Python: df = pandas.read_csv('environment.csv') and and an instance of the Environment class will be initialised. The intention is that each child class will hold specific methods to manipulate data.

--

Using Python pseudocode the above would look something like:

class Element: def __init__(self, name, abbreviated_name, value, description, data_file_used_in, column_pos_in_data_file, notes): self.name = name self.value = value self.abbreviated_name = abbreviated_name self.value = value self.description = description self.data_file_used_in = data_file_used_in self.column_pos_in_data_file = column_pos_in_data_file self.notes = notes class Temperature(Element): def __init__(self, name, abbreviated_name, value, description, data_file_used_in, column_pos_in_data_file): Element.__init__(self, name, abbreviated_name, value, description, data_file_used_in, column_pos_in_data_file, notes) class IncidentCategoryCode: VALID_CODES = {"A", "B", "C"} def __init_(self, code): if code not in self.VALID_CODES: raise ValueError() self.code = code class IncidentCategory(Element): def __init__(self, name, abbreviated_name, value, description, data_file_used_in, column_pos_in_data_file, category_code): self.category_code = IncidentCategoryCode(category_code) Element.__init__(self, name, abbreviated_name, value, description, data_file_used_in, column_pos_in_data_file, notes) (...) class DatabaseFile: def __init__(self, filename, size): self.filename = filename self.size = size class Environment(DatabaseFile): def __init__(self, filename, date, size, *elementArgs): self.primary_id = elementArgs[0] self.event_id = elementArgs[1] self.datetime = elementArgs[2] self.weather = elementArgs[3] self.temperature = elementArgs[4] self.temperature_unit = elementArgs[5] DatabaseFile.__init__(self, filename, date, size) (...) 

--

Summary

In short, below is a mapping of the classes to the databases/elements reference in the Python pseudocode example:

  • Element is a parent class not mapped to a database file.
  • Temperature is child class of the Element class that maps to the element column header of the Environment database.
  • IncidentCategoryCode is data type for the IncidentCategory` class
  • IncidentCategory is a child class of the Element class that maps to the element column header of the incident database.
  • DatabaseFile is a parent class not mapped to a database file.
  • Environment is a child class of the DatabaseFile class that maps to the environment database.

--

Questions

Is the above design/implementation sensible - is there a better way (best practice) to design this? Also, should the primary/secondary/foreign key details be captured in the design?

For example, is better to read all the tables and concatenate all the data (using the primary key) into a single dataframe, or should one maintain the separation and create three dataframes each with their corresponding data found in each CSV table file?

12
  • 2
    I removed the last sentence requesting offsite references or sources. That is off-topic for this community, and attracts unnecessary down-votes and close-votes.CommentedMay 31, 2024 at 19:16
  • 2
    AFAIK, a candidate key not selected as a primary key is called secondary key.
    – Yannis
    CommentedMay 31, 2024 at 20:08
  • 1
    Can you edit your question to include some pseudo code for how your proposed classes will be used? The code in a class exists to serve the code using the class. I'm utterly confused about what you are trying to accomplish, because from an OOP perspective the code using your classes is what counts.CommentedMay 31, 2024 at 20:17
  • 1
    The question mentions "element" column headers, but I see no indication of what the column headers actually are or what kinds of values they contain.CommentedMay 31, 2024 at 20:39
  • 3
    BTW, I voted to reopen your question. Thank you for spending the time to clarify this.CommentedJun 1, 2024 at 12:16

1 Answer 1

6

(Note: The question had been substantially re-written after this answer was posted. The original question seemed to be more about using OO to design a relational database, so this answer is focused on Entity-Relationship modelling, contrasted against OO Design).

Entities in a relational database are not "OO" objects

Don't fall into the trap of trying to mix OO design with Relational Database design; OO and Relational Databases have fundamentally different goals, and are intended for completely different contexts to each other.

  • Relational Database design is based on Entity-Relationship Modelling, for data stored outside of your program in a relational structure.
  • OO Design evolved from procedural programming for modelling a program's behaviour and internal state.
  • One goal for Entity-Relationship modelling (relational database design) is to establish a set of rules which ensure data integrity whenever any data is Inserted/Updated/Deleted.
  • Another goal for Entity-Relationship modelling is to minimise the impact of future changes to the schema when requirements change. Once real data is in the database, the cost, risk and complexity of migrating to a new structure tends to increase a lot - particularly changes which might change the semantics or integrity of existing data.
  • One goal for OO design is to use abstraction to express a program's behaviour and state at a higher level than traditional procedural code.
  • Another goal for OO design is to have a program structure which is really easy to change as your users' requirements evolve.

OO, Behaviour and Code Flexibility

When designing behaviour, people typically focus on users' needs/requirements; such as the functionality they expect and the problems they want to solve; for example, by thinking about their UX, interactivity, 'journeys' through the system, outcomes from those journeys.

Ideally the code would be expressed in a way which allows someone who understands user requirements to be able to follow and understand how the code relates back to something that users/stakeholders understand from the 'real-world' business/problem domain.

Code is something which developers are likely going to be changing quite a lot as as users' needs evolve -- developers generally want as much freedom as possible to change it, even completely changing its structure or throwing everything away and starting again.

Well-designed abstractions following OO principles should make the code easier to change.

Entity-Relationship-Modelling, Normalisation and Minimising Change

Unlike code, change to existing entities/relationships is something that you ideally want to avoid in a database, because redesigning a database containing your users' data is inherently risky as it generally involves a one-way irreversible migration process; if that process goes wrong then it's a bad day, the best way to handle it is just not to do it in the first place unless you have no other choice.

When designing a database, forget Object-orientation and focus instead on normalisation and modelling your data in 3rd Normal Form (3NF). Getting the data into 3NF should also help to minimise the need for any potentially-destructive changes in future as your users' needs evolve.

Object-Relational Mappers (ORMs)

Of course, there are ORM libraries which use programming language features such as classes to help programmers bridge the divide between the programming language and the database, database tables can often conveniently be represented using classes, however this is merely a way of using a programming language feature for convenience when working with databases.

The ORM 'data layer' in a program is usually the only place within the codebase where I would expect to see structures that mirror the database; otherwise the code that models behaviour is most likely going to look completely different, since it exists for completely different reasons, with different concerns and different goals.

These 'entity' classes usually only represent fields/data without any behaviour (no validation, no calculations, etc); as the data they represent doesn't "belong" to the program but to the database; there's typically clean separation between classes containing this externally-owned data and the objects which model the program's behaviour/state.

17
  • "... if that process goes wrong then it's a bad day..." — this reminds me of the old joke about what DBA stands for: Done Best After-hours.CommentedJun 1, 2024 at 0:07
  • 1
    "Relational Database design is about modelling persisted data" - I wouldn't say it's as simple as whether data is durably stored or not. SQL engines are primarily designed for processing (a) high volumes of (b) shared data. What we mean by (a) are volumes that potentially strain the best hardware economically available. And what we mean by (b) is hundreds or thousands of users making simultaneous queries on and changes to the same data. And it's designed to make programming these things trivial, by comparison to a fully-bespoke implementation in a comparatively low-level OO language.
    – Steve
    CommentedJun 1, 2024 at 7:38
  • 1
    @Steve Just to clarify, this answer is intentionally limited to looking at relational database design (Entity-relationship modelling) from a conceptual point of view . I'm intentionally ignoring all DBMS-specific implementation detail here, as I don't see anything in the question which seems to relate to concerns about data/user volumes, performance, economy, or hardware. Am I missing something? All of that seems completely out-of-scope for the question to me.CommentedJun 2, 2024 at 8:36
  • 1
    @Steve The answer didn't ever make the claim "SQL is not designed to model runtime states and behaviour". I intentionally avoided making any reference to SQL (as that is a separate topic out of scope of the question). However I made an edit earlier to remove the specific mention about persistence and clarified that I'm referring to relational database design being based on Entity-Relationship Modelling. The answer also does not make any claim that OO is "more high level" than a relational database. Nor does it make any claims about features about database enginesCommentedJun 2, 2024 at 18:32
  • 1
    It's simply that I don't see these topics even being hinted at anywhere in the original question (even before it was rewritten), and I just find it unhelpful for Q+A answer threads go too far into unnecessary tangents. I view the core of the originally-written question as being about modelling entities and relationships, with the OP looking to use OO methods (though the updated question doesn't read this way any more), so the focus of the answer is deliberately focused just on covering why E-R modelling and Normalisation are used for this rather than OO methods.CommentedJun 3, 2024 at 12:41

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.