5
\$\begingroup\$

Feedback Request

  • Is the manner in which I use attrs adequate (i.e. designed well)?
  • Should I not expect to be able to use argument ordering to order the joining of groups, given my current setup (see def label(self): below)?
  • Are there any general Pythonic suggestions for the code I have?

Context

I am working on creating labels for specimens in paleontological collections.

At the moment, I have a poetry package called paleo_utils. __init__.py contains code that takes Label, CollectionsLabel (subclassed Label), and SystematicsLabel (subclassed Label) from label.py. For the sake of this question, the code in Label and SystematicsLabel does not matter.

I would prefer to not have to use __init__ while also using attrs in the code below but at present I am not sure I am able to get {key: kwargs[key] for key in kwargs} using just __attrs_post_init__.

My wish is to be able to have these two cases work as follows:

import paleo_utils label = paleo_utils.CollectionsLabel( save_directory="../assets/saved_images/", id_number="3244", collection="AMNH", collector="Dr. Montague", location="North America", formation="Navesink", coordinates=(40.7128, -74.0060), date_found="2024-01-01", title_overrides={"collection": "Museum: "}, ) print(label.label()) # ID Number: 3244 # Museum: AMNH # Collector: Dr. Montague # Location: North America # Formation: Navesink # Coordinates: (40.7128, -74.006) # Date Found: 2024-01-01 

and

import paleo_utils label = paleo_utils.CollectionsLabel( save_directory="../assets/saved_images/", date_found="2024-01-01", id_number="3244", formation="Navesink", collection="AMNH", collector="Dr. Montague", location="North America", coordinates=(40.7128, -74.0060), title_overrides={ "date_found": "Date Collected: ", "location": "Locality: "}, ) print(label.label()) # Date Collected: 2024-01-01 # ID Number: 3244 # Formation: Navesink # Collection: AMNH # Collector: Dr. Montague # Locality: North America # Coordinates: (40.7128, -74.006) 

Code For CollectionsLabel

@attrs.define(kw_only=True) class CollectionsLabel(Label): collection: str | None = attrs.field( default=None ) id_number: str | None = attrs.field( default=None ) collector: str | None = attrs.field( default=None ) species: str | None = attrs.field( default=None ) species_author: str | None = attrs.field( default=None ) common_name: str | None = attrs.field( default=None ) location: str | None = attrs.field( default=None ) coordinates: tuple[float, float] | None = ( attrs.field(default=None) ) coordinates_separate: bool = attrs.field( default=False ) date_found: str | None = attrs.field( default=None ) date_cataloged: str | None = attrs.field( default=None ) formation: str | None = attrs.field( default=None ) formation_author: str | None = attrs.field( default=None ) chrono_age: str | None = attrs.field( default=None ) chrono_age_author: str | None = attrs.field( default=None ) size: str | None = attrs.field(default=None) link: str | None = attrs.field(default=None) default_titles = { "collection": "Collection: ", "id_number": "ID Number: ", "collector": "Collector: ", "species": "Scientific Name: ", "species_author": "Species Author: ", "common_name": "Common Name: ", "location": "Location: ", "coordinates": "Coordinates: ", "date_found": "Date Found: ", "date_cataloged": "Date Cataloged: ", "formation": "Formation: ", "formation_author": "Formation Author: ", "chrono_age": "Age: ", "chrono_age_author": "Age Author: ", "size": "Size: ", "link": "Link: ", } title_overrides: dict[str, str] = attrs.field( factory=dict ) # empty by default _ordered_kwargs: dict = attrs.field(init=False) def __init__(self, **kwargs): self._ordered_kwargs = {key: kwargs[key] for key in kwargs} def __attrs_post_init__(self): # update title_overrides with any user-provided overrides if self.title_overrides: # merge user-provided titles, overriding defaults for ( key, value, ) in self.title_overrides.items(): if key in self.default_titles: self.default_titles[key] = ( value ) def _get_collections_attrs(self): label_attrs = { attr.name for attr in Label.__attrs_attrs__ } # collections_attrs = { # attr.name: getattr(self, attr.name) # for attr in self.__attrs_attrs__ # if attr.name not in label_attrs # } # print(self.__attrs_attrs__) collections_attrs = { key: value for key, value in self._ordered_kwargs.items() if key not in label_attrs } return collections_attrs def label(self): # empty list for parts of the final label parts = [] # collections label exclusive attrs collections_attrs = ( self._get_collections_attrs() ) # iterative over collections attrs for ( key, value, ) in collections_attrs.items(): # for all non-None collections attrs, proceed if ( value is not None and not isinstance(value, dict) ): # edit title with spaces and capitalized title = self.default_titles.get( key, f"{key.replace('_', ' ').capitalize()}: ", ) # add the group parts.append(f"{title}{value}") # consolidate to multiline label return "\n".join(parts) 
\$\endgroup\$
1
  • \$\begingroup\$Do you feel your class has a benefit over a TypedDict with a helper static class?\$\endgroup\$
    – Peilonrayz
    CommentedNov 3, 2024 at 6:25

2 Answers 2

6
\$\begingroup\$

use pathlib

label = paleo_utils.CollectionsLabel( save_directory="../assets/saved_images/", date_found="2024-01-01", 

Rather than str, consider storing that folder name as a Path. Then it will be significantly more convenient for clients to make use of.

Kudos on using ISO8601, so lexical order matches chronological order.

optional values

You consistently admit of None for each attribute. This resembles a relational database table lacking even a single NOT NULL clause in its DDL.

I encourage you to go back and reconsider whether a CollectionsLabel truly could have e.g. no collection and no id_number.

data type

Several date fields curiously use str rather than a datetime.

I imagine chrono_age: str gives you the flexibility to store human-friendly descriptions like "1.2 Bya" or "66 Mya", which would lead to unfortunate lexical ordering. And I'm concerned that someone might try to jam "late Devonian" in there. Consider using one or two numeric fields to store age, and worry about human-friendly formatting when outputting a retrieved record.

Giving size a type of str seems mind boggling to me, but I imagine you have your reasons. Perhaps it is a {small, medium, large} enum? Or some sizes are in square meters while others are in cubic meters?

idiom

Certainly this works,

 ... {key: kwargs[key] for key in kwargs} 

but it is more usual to iterate over kwargs.items(). And in this particularly simple case you should use kwargs.copy() to produce the desired shallow copy.

But it's not clear why you'd want to make a copy at all, given that each function call makes a shallow copy already. The OP contains no unit test which would turn Red if we stop copying. If you are trying to conform to some documented attrs advice, then please add a # comment which cites the docs. If there is another concern at work here, please comment on it, and write a unit test which highlights the effect of interest. I claim that any valid calling code one might write wouldn't care about whether a copy had happened, that is, one could not demonstrate a Green test which turns Red when we remove that copying operation. If inheritance plays into this, show that in your unit test. As written the code is obscure -- I cannot think of a good reason for making a copy of the args.

(In an inheritance situation is common enough for a V2 child class to destructively .pop() off V2 args that it knows about, so they won't bother the V1 parent class. But that doesn't appear to be relevant here.)

zero items

You have declared that title_overrides cannot be None, which is a good choice.

 def __attrs_post_init__(self): ... if self.title_overrides: 

No need for the guard, there. Just iterate over zero items in an empty dict, no harm done.

Please phrase the for as for key, value in self.title_overrides.items():, and use similar single-line formatting down in label().

Wouldn't a simple .update() suffice here?

meaningful identifier

 label_attrs = { attr.name ... 

That looks more like you're creating a new names variable, to me.

Also, rather than merge that code fragment into the main branch, just delete it, along with the half dozen related lines of commented code that follow it.

isinstance()

 if ( value is not None and not isinstance(value, dict) ): 

Please simplify to

 if not isinstance(value, (dict, NoneType)): 

(Your IDE will help you add from types import NoneType.)

embedded newlines

 return "\n".join(parts) 

I note in passing that you apparently prohibit certain characters in all those fields, such as NEWLINE, but that wasn't enforced anywhere. If one crept in, it would corrupt this returned value, turning a single record into apparently multiple records.


Overall this seems a reasonable design.

expect to be able to use argument ordering ...

You didn't mention your target environment, but I'm willing to assume it does not include e.g. Jython, and will be restricted to just modern cPython. The python language distinguishes between unordered dict and the OrderedDict class which makes stronger guarantees. The cPython interpreter now imposes ordering on dict. The old promise was "an arbitrary order which is non-random", but that changed several years ago.

Starting with interpreter 3.7, dict iteration promises to preserve insertion order. Also, PEP 468 ensures that kwargs order shall match the arg order seen in the source code.

So, yes, your expectation is reasonable.

\$\endgroup\$
1
  • 1
    \$\begingroup\$+1. Thank you very much for this thoughtful comment. I will likely approve your answer once I've gone through my code again (in the next 36 hours) with the above considerations. I will provide an update with how I've modified my code.\$\endgroup\$
    – R509
    CommentedNov 4, 2024 at 2:41
3
\$\begingroup\$

Attrs

You need to code your own __init__ function to be able to use **kwargs and get the argument order. Otherwise, attrs will use the order the fields are defined. But you can still use attrs to initialize the class instance by calling __attrs_init__ from your __init__:

def __init__(self, **kwargs): self._ordered_kwargs = kwargs.copy() self.__attrs_init__(**kwargs) 

But should you?

This seems like an overly complicated way to provide different label formats. In the future, a poor hapless programmer tasked with adding a new label format would need to figure out that the order of the output depends on the order of the arguments in the call to the constructor.

Instead, use the attrs library (or dataclasses) to create the class, including __init__ and other useful dunder methods. And use string.format to get your desired output. For example:

# use *string.format* to create functions that return a formatted label AMNH = """\ ID Number: {0.id_number} Museum: {0.collection} Collector: {0.collector} Location: {0.location} Formation: {0.formation} Coordinates: {0.coordinates} Date Found: {0.date_found}""" formation = """\ Date Collected: {0.date_found} ID Number: {0.id_number} Formation: {0.formation} Collection: {0.collection} Collector: {0.collector} Locality: {0.location} Coordinates: {0.coordinates}""" # create an instance of the label label = paleo_utils.CollectionsLabel( save_directory="../assets/saved_images/", id_number="3244", collection="AMNH", collector="Dr. Montague", location="North America", formation="Navesink", coordinates=(40.7128, -74.0060), date_found="2024-01-01", ) # print the label in any format print(AMNH.format(label)) print(formation.format(label)) 

Because formats defined this way are just strings, it makes it much easier to select formats at run-time. For example, a number of format strings can be loaded from a configuration file (e.g., toml or ini) and one selected for use based on a command line argument.

\$\endgroup\$
1
  • \$\begingroup\$Thank you for this and my apologies for the delay in getting back with an acknowledgement! This is simpler and only occurred to me in such vague mental terms that I did not have enough confidence to shift paths.\$\endgroup\$
    – R509
    CommentedNov 27, 2024 at 5:59

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.