Code structure for tree representation with multiple types (classes) of nodes and circular dependencies

Question

I am developing a piece of code around a tree structure. The tree can do a few things - one of them being its ability to serialize & de-serialize its data. There are multiple different types of nodes, e.g. NodeA and NodeB, each represented by a class. Different types of nodes may have significantly different functionality and may hold different types of data. As a common base, all node classes inherit from a "base" Node class. Any given type of node can be at the root of the tree. A simplified example of the envisioned structure looks as follows:

from typing import Dict, List from abc import ABC class NodeABC(ABC): pass class Node(NodeABC): subtype = None def __init__(self, nodes: List[NodeABC]): self._nodes = nodes def as_serialized(self) -> Dict: return {"nodes": [node.as_serialized() for node in self._nodes], "subtype": self.subtype} @classmethod def from_serialized(cls, subtype: str, nodes: List[Dict], **kwargs): nodes = [cls.from_serialized(**node) for node in nodes] if subtype == NodeA.subtype: return NodeA(**kwargs, nodes = nodes) elif subtype == NodeB.subtype: return NodeB(**kwargs, nodes = nodes) class NodeA(Node): subtype = "A" def __init__(self, foo: int, **kwargs): super().__init__(**kwargs) self._foo = foo * 2 def as_serialized(self) -> Dict: return {"foo": self._foo // 2, **super().as_serialized()} class NodeB(Node): subtype = "B" def __init__(self, bar: str, **kwargs): super().__init__(**kwargs) self._bar = bar + "!" def as_serialized(self) -> Dict: return {"bar": self._bar[:-1], **super().as_serialized()} demo = { "subtype": "A", "foo": 3, "nodes": [ { "subtype": "B", "bar": "ghj", "nodes": [] }, { "subtype": "A", "foo": 7, "nodes": [] }, ] } assert demo == Node.from_serialized(**demo).as_serialized()

Bottom line: It works. The problem: There are "circular" dependencies between the actual node types NodeA/NodeB and the base Node class. If all of this code resides in a single Python file, it works fine. However, if I try to move each class to a separate file, the Python interpreter will become unhappy because of (theoretically required) circular imports. The actual classes are really big, so I would like to structure my code a bit.

Question: A common wisdom says that if circular imports / dependencies become a topic of debate, then the code's design / structure sucks and is at fault in the first place. I'd agree with that but I really do not have many good ideas of how to improve the above.

I am aware that I could eliminate the circular import "limitation" by doing a "manual run-time import" at least for one part of the circle plus some botching, but this is something that I'd like to avoid ...

CONTEXT

I have been developing the zugbruecke Python module.

It allows to call routines in Windows DLLs from Python code running on Unices / Unix-like systems such as Linux, MacOS or BSD. zugbruecke is designed as a drop-in replacement for Python's standard library's ctypes module. zugbruecke is built on top of Wine. A stand-alone Windows Python interpreter launched in the background is used to execute the called DLL routines.

Its code for synchronizing both ctypes datatypes and ctypes data has become a bit old and dusty and could use some serious refactoring. Its current form really is not object oriented can be found here (data type definitions), here (actual data, mostly), here (pointer synchronization definitions) and here (actual pointer synchronization). An early sketch of how I'd like to introduce proper object orientation (and some sort of a proper file structure) can be found here. The above example is an oversimplified version of my actual sketch.

Ted Brownlow · Accepted Answer · 2021-01-24 22:39:18Z

You could separate your definition of a Node from the serialization process, which allows for the dependency order of (node serialization) -> (node implementation) -> (node base). This also reduces the amount of super juggling that you need to perform in each subclass.

If you wanted to further lighten up the specific node types, you could also remove the need for **kwargs by instantiating the objects using a dictionary that doesn't contain the entries for "subtype" or "nodes".

from typing import Dict, List, Optional from abc import ABC # node.py class Node(ABC): nodes:List['Node'] subtype:Optional[str] = None def serialize(self) -> Dict: raise NotImplementedError() # node_a.py (depends on node) class NodeA(Node): subtype = "A" def __init__(self, foo: int, **kwargs): self._foo = foo * 2 def serialize(self) -> Dict: return {"foo": self._foo // 2} # node_b.py ( depends on node) class NodeB(Node): subtype = "B" def __init__(self, bar: str, **kwargs): self._bar = bar + "!" def serialize(self) -> Dict: return {"bar": self._bar[:-1]} # node_serialization.py ( depends on node_a, node_b ) NODE_TYPES = { nodeclass.subtype:nodeclass for nodeclass in [NodeA,NodeB] } def deserialize(serialized_node: Dict) -> Node: node = NODE_TYPES[serialized_node['subtype']](**serialized_node) node.nodes = [ deserialize(node) for node in serialized_node['nodes'] ] return node def serialize(node:Node) -> Dict: return { **node.serialize(), 'nodes': [ serialize(node) for node in node.nodes ], 'subtype': node.subtype } # usage original = { "subtype": "A", "foo": 3, "nodes": [ { "subtype": "B", "bar": "ghj", "nodes": [] }, { "subtype": "A", "foo": 7, "nodes": [] }, ] } assert original == serialize(deserialize(original))

Thanks a lot. Hmm simply moving the deserialize method outside of the class and into a sort of independent function kind of solves the issue, yes. I honestly did not like that ... it kind of feels wrong because in my real code, it's not only one single method like this that would have to be moved outside of the class. — s-m-e, CommentedJan 25, 2021 at 19:36
FYI, I just added a context section to my question for further details. See the link to my sketch. Maybe this helps to draw some additional light onto the topic. — s-m-e, CommentedJan 25, 2021 at 19:47

RootTwo · Accepted Answer · 2021-01-25 11:07:58Z

Python class instances include the attribute __class__. Node.as_serialized() uses self.__class__.__module__ and self.__class__.__name__ to serialize a node's subtype and to recreate the node when deserializing. Now, Node.from_serialized() doesn't need to reference the other Node classes, so there isn't a circular import problem.

import sys class Node: def __init__(self, nodes): self._nodes = nodes def as_serialized(self): return {"nodes": [node.as_serialized() for node in self._nodes], "subtype": (self.__class__.__module__, self.__class__.__name__)} @classmethod def from_serialized(cls, subtype, nodes, **kwargs): nodes = [cls.from_serialized(**node) for node in nodes] module, klass = subtype return getattr(sys.modules[module], klass)(**kwargs, nodes=nodes) class NodeA(Node): def __init__(self, foo: int, **kwargs): super().__init__(**kwargs) self._foo = foo * 2 def as_serialized(self): return {"foo": self._foo // 2, **super().as_serialized()} class NodeB(Node): def __init__(self, bar: str, **kwargs): super().__init__(**kwargs) self._bar = bar + "!" def as_serialized(self): return {"bar": self._bar[:-1], **super().as_serialized()}

Note the "subtype" field has changed:

demo = { "subtype": ("__main__", "NodeA"), "foo": 3, "nodes": [ { "subtype": ("__main__", "NodeB"), "bar": "ghj", "nodes": [] }, { "subtype": ("__main__", "NodeA"), "foo": 7, "nodes": [] }, ] }

Thanks for answer. Actually, your code still as a circular dependency. NodeA is derived from Node. In addition, NodeA also has to be imported in the context of Node for your getattr function call to work. Maybe I am missing something, but how would you put each of the three classes into a separate .py-file and, then, how would you solve the circular import issue which arises (that in my opinion still remains)? — s-m-e, CommentedJan 25, 2021 at 19:19
@s-m-e, how would you use Node, NodeA, and NodeB if you didn't import them somewhere? Your main (or other) code imports them so it can build a tree. When you import a module, it gets cached in sys.modules. That's why my code looks up the module in sys.modules and then gets the class from the module. If you imported NodeA from "fileA.py", then getattr(sys.modules["fileA"], "NodeA") would get the appropriate class. If you want , Node.from_serialized() could also load the module if it wasn't already loaded. Look as code for _Loader in the pickle module. — RootTwo, CommentedJan 25, 2021 at 19:29
Yeah, that's the "run-time loading" / bodging I was referring to. It's kind of ugly and has a tendency to fail in fascinating ways, but yes, I have done this before. It's a valid option after all. — s-m-e, CommentedJan 25, 2021 at 19:35
Thanks. FYI, I just added a context section to my question for further details. See the link to my sketch. Maybe this helps to draw some additional light onto the topic. — s-m-e, CommentedJan 25, 2021 at 19:46

Stack Exchange Network

Code structure for tree representation with multiple types (classes) of nodes and circular dependencies

2 Answers 2

Hot Network Questions

Code structure for tree representation with multiple types (classes) of nodes and circular dependencies

2 Answers 2

Related

Hot Network Questions