2
\$\begingroup\$

Motivated by the first commentor's suggestion on my MSAccess question: Python SQL insert to MSAccess with VBScript

I am moving to use SQLite databases for a python application and I created this query builder class to help implement the SQL logic. One constraint is having to use Python 3.6 so that is why I used .format() instead of the more modern F-string for string formatting. I'm looking for an overall review of my code quality, etc. Any suggested improvements to the code...

import sqlite3 from typing import Union, List, Optional import logging class QueryBuilder: ''' Query Builder for SQLite databases Usage: query_builder = QueryBuilder("database.db") query_builder.insert("users", {"username": "John", "password": "johnspassword"}) query_builder.update("users", "password = ?", "username = ?") query_builder.delete("users", {"username": "John"}) query_builder.select("users", ["username", "password"], "username = ?") ''' def __init__(self, db_name: str, log_file_path: str = 'querybuilder.log'): ''' Establish database connection and cursor creation and configure logging. Arguments: - db_name: The name of the SQLite database Usage: query_builder = QueryBuilder("database.db") ''' logging.basicConfig(filename=log_file_path, level=logging.INFO, format='%(asctime)s:%(levelname)s:%(message)s') self.conn = sqlite3.connect(db_name, isolation_level=None) self.cursor = self.conn.cursor() def __execute_query(self, query: str, params: Union[tuple, list] = None): ''' Execute SQL query Arguments: - query: SQL query as a string - params: Parameters for the SQL query ''' try: if params: self.cursor.execute(query, params) else: self.cursor.execute(query) return self.cursor.fetchall() except sqlite3.Error as error: logging.error("Error while executing SQLite Script: {}".format(error)) raise def select(self, table: str, fields: List[str] = ['*'], where: Optional[str] = None, join: Optional[str] = None, group_by: Optional[str] = None, order_by: Optional[str] = None): ''' Select records from a table with various clauses Arguments: - table: Table to select from - fields: List of fields to be selected - where: WHERE clause - join: JOIN clause - group_by: GROUP BY clause - order_by: ORDER BY clause Usage: result = query_builder.select("users", ["username", "password"], "username = ?", "LEFT JOIN orders ON users.id = orders.user_id", "username", "orders.id DESC") ''' selected_fields = ', '.join(fields) base_query = "SELECT {} FROM {}".format(selected_fields, table) if join: base_query += " {}".format(join) if where: base_query += " WHERE {}".format(where) if group_by: base_query += " GROUP BY {}".format(group_by) if order_by: base_query += " ORDER BY {}".format(order_by) return self.__execute_query(base_query) def insert(self, table: str, data: dict): ''' Insert a new record into a table Arguments: - table: Table to insert into - data: Dictionary of columns and values to insert Usage: query_builder.insert("users", {"username": "John", "password": "johnspassword"}) This will execute: INSERT INTO users (username,password) VALUES (?,?) with parameters ('John', 'johnspassword') ''' fields = ', '.join(data.keys()) placeholder = ', '.join('?' * len(data)) query = "INSERT INTO {} ({}) VALUES ({})".format(table, fields, placeholder) return self.__execute_query(query, list(data.values())) def update(self, table: str, data: Optional[dict] = None, where: Optional[dict] = None): ''' Update an existing record in a table Arguments: - table: Table to update - data: Dictionary of new data for the record - where: Dictionary for WHERE clause to select the record Usage: query_builder.update("users", {"password": "new_password"}, {"username": "John"}) This will execute: UPDATE users SET password = ? WHERE username = ? with parameters ('new_password', 'John') ''' if data: set_query = ', '.join(["{} = ?".format(k) for k in data.keys()]) query = "UPDATE {} SET {}".format(table, set_query) else: print("Error: No data provided to update.") return if where: where_query = ' AND '.join(["{} = ?".format(k) for k in where.keys()]) query += " WHERE {}".format(where_query) params = tuple(list(data.values()) + list(where.values() if where else [])) return self.__execute_query(query, params) def delete(self, table: str, where: Optional[dict] = None): ''' Delete an existing record from a table Arguments: - table: Table to delete from - where: WHERE clause to select the record Usage: query_builder.delete("users", {"username": "John"}) This will execute: DELETE FROM users WHERE username = ? with parameter ('John') ''' query = "DELETE FROM {}".format(table) if where is not None: condition = ' AND '.join(["{} = ?".format(key) for key in where.keys()]) query += " WHERE {}".format(condition) params = tuple(where.values()) self.__execute_query(query, params) return self.__execute_query(query, params) def close_connection(self): ''' Close the connection to the database Usage: query_builder.close_connection() ''' self.conn.close() def __del__(self): ''' Class destructor. Closes the connection to the database ''' self.conn.close() 
\$\endgroup\$
1
  • 2
    \$\begingroup\$Please tag Python question with python so the syntax highlighting is correct.\$\endgroup\$
    – Peilonrayz
    CommentedFeb 13, 2024 at 17:07

2 Answers 2

2
\$\begingroup\$

It's good that there is some logging built-in, but this is not the right way to do it. It is inflexible: the output goes to a file (console is useful for debugging) and the format cannot even be customized (by way of optional arguments).

This is all simple really. Do logging in your code, but let the caller decide on destination and formatting. After all this class, like any other class, is meant to be used by an other piece of code.

It is enough to do this:

import logging class QueryBuilder: def __init__(self, db_name: str): self.logger = logging.getLogger(__name__) 

And decide in the upstream routines what you want to do with the log messages emitted by your libs, or any other third-party libs doing logging. Imagine if they were all doing their own thing, and logging to different files for example. That negates the benefits of logging, which is to consolidate messages from multiple sources. At times, you will want to mute them. At other times you will want to decrease or increase verbosity level.

That being said, the logging is underutilized at the moment. In update, instead of:

 print("Error: No data provided to update.") return 

you could do logger.exception and even raise a ValueError or a custom exception. At the very least you should return something meaningful, even return False to signal a problem. You have passed on this opportunity. There is no easy way for the caller routine to find out if the call went through as expected. This is problematic.

As for the insert function, what happens if I pass an empty dict? I haven't tested but I expect the code to crash. You don't seem to be testing the data much in depth here.

close_connection is not in use at the moment and essentially repeats what __del__ already does.

This implementation looks too basic to be useful at the moment. Of course it's light when compared to SQL Alchemy but is still very limited and handles only SQLite.

There is no support for bulk operations or transactions either. If an insert/update/delete operation crashes in the middle, then you should roll back to preserve data integrity.

There could be a limited use case for bulk operations, provided that you make the code more robust. The code is not bad, but does this class help you very much in your day to day development?

This could work as long as the queries are very basic. Say I want to do a select with a join of two tables or more, I'll have to examine your code to figure out the proper syntax to use. If that is even possible. If I need to do a group by, then I will probably go back to SQL Alchemy or plain SQL.

\$\endgroup\$
    2
    \$\begingroup\$

    version

    use Python 3.6

    I'm still not believing the constraint, given that interpreter should come from a venv rather than from /usr/bin, but fine, let's roll with it.

    from typing import Union, List, Optional 

    We would typically use list[int] rather than List[int], and float | str | None rather than those. Adding this line just after the imports would help set expectations and explain our intent:

    assert sys.version_info[:2] >= (3, 6) 

    credentials

    Thank you for using bind parameters.

     query_builder.update("users", "password = ?", "username = ?") 

    Storing a plaintext password in the table would be Bad. Prefer to store an argon2idhash.

    Kudos on adding those optional type annotations.

    name mangling

    I don't understand this signature:

     def __execute_query(self, query ... 

    Typically mangling is not desired -- recommend you call it def execute_query.

    If it is desired, a docstring or comment should explain the reason mangling is beneficial here. Usually that would involve setting out some rules for maintenance engineers to follow, perhaps restrictions on how they interact with inheritance.

    Maybe what you really wanted was just a single _ underscore to show it's private?

    bind params

     if params: self.cursor.execute(query, params) 

    Good, no injection!

    We know that params is a list at this point. If you're able to support it being a dict, consider preferring that, or using it exclusively.

    It's a debugging concern. By the time there's five or six ? question mark place holders in there, it can become slightly challenging for a maintenance engineer to insert a seventh parameter in the middle and get everything to line up properly. Sometimes swapped parameters may "look right" and be silently ignored. In contrast, a line of source code like "age": age, is hard to get wrong, we won't accidentally mistake it for weight.

    Hmmm, your def insert() actually goes in the opposite direction, converting a dict by extracting its .values(), interesting.

    logging

     except sqlite3.Error as error: logging.error("Error while executing SQLite Script: {}".format(error)) raise 

    Excellent, we see a proper stack trace from raise, with a nice stack trace.

    Someone reading the log won't see that. And there might be multiple candidate call sites. Consider separately logging the query, or even the params, for the benefit of a maintenance engineer trying to diagnose a hard-to-reproduce issue.

    Modern interpreters make it easy to inspect the stack trace and reveal the call site for the benefit of a logger. I imagine that even back in 3.6 days it wasn't too difficult to obtain such details -- consider revealing the caller location in the log.

    str vs None

    Clearly this works:

     def select(self, table: str, fields: List[str] = ['*'], where: Optional[str] = None, join: Optional[str] = None, group_by: Optional[str] = None, order_by: Optional[str] = None): 

    IDK, for Optional[str] maybe just use str and default it to the empty string? Also, there's no trouble with unconditionally tacking on a default empty-string to the evolving query.

    mutable default

    For that fields parameter, you might prefer that it defaults to None. And then we can assign ... = ', '.join(fields or ['*']) to accomplish the defaulting.

    Creating the list just once at class definition time, rather than N times for N calls, can yield surprising behavior. SO supplies answers describing the effect. It's the sort of thing we try to habitually avoid, even if it causes no trouble in this code, out of kindness to future maintenance engineers.

    table alias

    It's pretty common for queries to look like this:
    SELECT * FROM my_wonderful_dataset mwd .... Especially when JOINing several tables.

    As it stands, I anticipate callers will slightly abuse your API by sneaking the occasional alias into what is supposed to be a table. Maybe use some vague terminology like table_clause to bless such usage?

    The join parameter is a bit looser, but maybe call it join_clause?

    Similarly I would expect caller to sometimes tack on HAVING to a group_by.

    Consider adding a limit parameter.

    injection

     base_query += " WHERE {}".format(where) ... return self.__execute_query(base_query) 

    I don't see any way for Little Bobby Tables' mom or any other caller to offer bind parameters and have them passed along to the helper.

    Your mention of , "username = ?", seemed promising, so I'm sure it's just an oversight.

    queries don't have side effects

     def insert(self, ... ): ... query = "INSERT INTO ... 

    Consider calling that dml, command, or just sql.

    Similarly for update() and delete().

    WHERE clause

    Defining different Public APIs for "same thing" is slightly odd. update() handles it differently from select():

     if where: where_query = ' AND '.join(["{} = ?".format(k) for k in where.keys()]) query += " WHERE {}".format(where_query) 

    Let's think about how caller would specify a blind query. That's right, pass in a condition that is true for every row:

     result = query_builder.select("users", ["username"], "TRUE") 

    Let's apply that insight to the update() WHERE clause. We could unconditionally start with WHERE true. Then tack on zero or more AND conjuncts. Any way you slice it we obtain valid sql syntax. No biggie, it's just a little more convenient.

    High level item is to offer consistent behavior to caller across the various methods that support WHERE clauses.

    You might want to write a single, tiny helper for this, and have {select, update, delete} call it.

    exceptions

     print("Error: No data provided to update.") 

    Please raise a ValueError here. Caller messed up, and should be notified. A silent return makes it too easy for a caller bug to remain in the source repo without being repaired.

    Also, why do we have , data: Optional[dict] = None, in the signature? We intend it should be mandatory, right? And then the check is looking for an empty dict.

    repeat delete

    I don't understand this.

     def delete(self, ... ): ... self.__execute_query(query, params) return self.__execute_query(query, params) 

    That first one snuck in by accident, right?

    closing

    I guess this is pretty OK.

     def close_connection(self): ... self.conn.close() def __del__(self): ''' Class destructor. Closes the connection to the database ''' self.conn.close() 

    I have my doubts about that __del__, but maybe it's needed.

    Let's take a step back and think about design of Public API. First let's think about dynamic allocation in C, which gave us the infamous {malloc, free} API. Now of course, it's a wonderful API. But it turns out humans are incapable of matching up a bunch of free()'s with the corresponding malloc()'s. Thus were born various OO languages to clean up the garbage automatically.

    Returning to your query builder, we want to give caller convenient access in a way that they're unlikely to mess up. Soooo, maybe we don't even offer a public close() method? That is, maybe we want to strongly encourage caller to always use a with context handler when obtaining a DB connection. Then there is simply nothing to think about later, it always gets dealt with.

    Think about how often you lately have done f = open(...) and subsequently f.close(). Very few times, right? Because with is just too convenient. Same notion here.

    BTW, maybe issue a helpful COMMIT just before .close()? More generally, caller is going to need to be able to BEGIN ... COMMIT transactions that are small or large. Consider only offering a session (transaction) interface. Possibly you'd care to offer a .rollback() method, but wait for the need to arise in calling code before you implement it.

    raw SQL

    It's not clear that you already have calling code which exercises all these {select, insert, update, ...} verbs. Consider saving yourself some work by offering raw access to the underlying DB connection. Then wait for a bunch of app code to be authored, with lots of calls into this library. Look at "why was the raw interface needed here?", "what should the library have offered?", in order to guide subsequent library development.

    Be sure to write new test functions when you enhance the library. Think of each test function as a "stand in" for the application code which motivated the enhancement. If the tests still show Green bar, then the app code is winning!

    elevator pitch

    Why use this library?

    There are competing ORMs and similar middlewares, including SQL Alchemy. Add a module docstring describing the use case where this code wins.

    I suspect that "support for 3.6" may figure into such an argument.


    code coverage

    When you run your automated test suite against this target code, use pytest --cov --cov-report=term-missing so the coverage plugin runs. It will tell you which of those many ifs your test suite is not yet exercising.

    Also, maybe you want to offer two layers:

    1. Synthesize a DML statement from args, and return it.
    2. Synthesize and execute (with bind parameters) some DML.

    A test suite could take advantage of the _private interface offered by layer 1. Applications would just call public layer 2.


    This evolving codebase benefits from the careful design that clearly went into it. No doubt it will continue to improve. It achieves most of its design goals.

    I would be reluctant to support an unmaintained 3.6 setup in a production setting, but if, say, 3.8 were used then I would be willing to delegate or accept maintenance tasks on this code base.

    \$\endgroup\$

      Start asking to get answers

      Find the answer to your question by asking.

      Ask question

      Explore related questions

      See similar questions with these tags.