Motivation
Python's Abstract Base Classes in the collections.abc
module work as mixins and also define abstract interfaces that invoke common functionality in Python's objects.
While reading the actual source is quite enlightening, viewing the hierarchy as a graph would be a useful thing for Python programmers.
Unified Modeling Language (UML) is a way of visualizing system design.
Libraries
Graphviz is a natural choice for this kind of output. One can find many examples of people generating these kinds of outputs using Graphviz.
But, to ensure correctness and completeness, and easy tweakability of the logic used to create them, we need a way to feed the Python classes into Graphviz.
A natural choice is python-graphviz, but we'll need to pipeline the Python into the graphviz interface.
Our grammar of graphics
UML communicates features that Python does not have. Using a smaller subset of the relevant features will give us a graph that is cleaner and easier for those less familiar with UML to understand.
Our subset will be class diagrams where:
- Abstract classes' names, data members, and methods are italicized, concrete implementations are not.
- Edges (lines) are dashed when the subclass is abstract, and not dashed when the relationship is a specialization.
- We only show is-a relationships (class hierarchy) not has-a relationships (composition), because it doesn't really apply for our use-case and because the
abc
module doesn't have such relevant annotations (if we did have annotations showing this relationship we could support it, but I do not here).
Avoiding information overload in the output
We exclude most of the default methods and members in object
. We also specifically hide other members that relate to class relationships and implementations that don't directly affect users.
We do not show return types (e.g. : type
) because we lack the annotations in the module.
We do not show +
(public), -
(private), and #
(protected) symbols, because in Python everything is accessible, and it is merely convention that indicates users should avoid interfaces prefixed with underscores (_
).
The code
I got this environment with Anaconda, which used a fresh install and required
$ conda install graphviz python-graphviz
And I did my construction in a notebook using Jupyter Lab.
First we import the python-graphviz Digraph object, some functions from the standard library, the collections.abc
module for the abstract base classes that we want to study, and some types we'll want to use to discriminate between objects.
We also put types or methods and members that we want to exclude in a tuple, and put the names of methods we also intentionally hide in a set.
from graphviz import Digraph from inspect import getclasstree, isabstract, classify_class_attrs, signature import collections.abc from html import escape from abc import ABCMeta from types import WrapperDescriptorType, MethodDescriptorType, BuiltinFunctionType # usually want to hide default (usually unimplemented) implementations: # ClassMethodDescriptorType includes dict.fromkeys - interesting/rare- enough? EXCLUDED_TYPES = ( WrapperDescriptorType, # e.g. __repr__ and __lt__ MethodDescriptorType, # e.g. __dir__, __format__, __reduce__, __reduce_ex__ BuiltinFunctionType, # e.g. __new__ ) HIDE_METHODS = { '__init_subclass__', # error warning, can't get signature '_abc_impl', '__subclasshook__', # why see this? '__abstractmethods__', }
Now we must create the table (labels) for the graphviz nodes. Graphviz uses a syntax that looks like html (but isn't).
Here we create a function that takes the class and returns the table from the information that we can derive from the class.
def node_label(cls, show_all=False, hide_override=set()): italic_format = '<i>{}</i>'.format name_format = italic_format if isabstract(cls) else format attributes = [] methods = [] abstractmethods = getattr(cls, "__abstractmethods__", ()) for attr in classify_class_attrs(cls): if ((show_all or attr.name[0] != '_' or attr.name in abstractmethods) and not isinstance(attr.object, EXCLUDED_TYPES) and attr.name not in hide_override): if name in abstractmethods: name = italic_format(attr.name) else: name = attr.name if attr.kind in {'property', 'data'}: attributes.append(name) else: try: args = escape(str(signature(attr.object))) except (ValueError, TypeError) as e: print(f'was not able to get signature for {attr}, {repr(e)}') args = '()' methods.append(name + args) td_align = '<td align="left" balign="left">' line_join = '<br/>'.join attr_section = f"<hr/><tr>{td_align}{line_join(attributes)}</td></tr>" method_section = f"<hr/><tr>{td_align}{line_join(methods)}</td></tr>" return f"""< <table border="1" cellborder="0" cellpadding="2" cellspacing="0" align="left"> <tr><td align="center"> <b>{name_format(cls.__name__)}</b> </td></tr> {attr_section} {method_section} </table>>"""
The code above gives us a way to create the tables for the nodes.
Now we define a function that takes a classtree (the kind returned by inspect.getclasstree(classes)
) and returns a fully instantiated Digraph
object suitable to display an image.
def generate_dot( classtree, show_all=False, hide_override=HIDE_METHODS, show_object=False): """recurse through classtree structure and return a Digraph object """ dot = Digraph( name=None, comment=None, filename=None, directory=None, format='svg', engine=None, encoding='utf-8', graph_attr=None, node_attr=dict(shape='none'), edge_attr=dict( arrowtail='onormal', dir='back'), body=None, strict=False) def recurse(classtree): for classobjs in classtree: if isinstance(classobjs, tuple): cls, bases = classobjs if show_object or cls is not object: dot.node( cls.__name__, label=node_label(cls, show_all=show_all, hide_override=hide_override)) for base in bases: if show_object or base is not object: dot.edge(base.__name__, cls.__name__, style="dashed" if isabstract(base) else 'solid') if isinstance(classobjs, list): recurse(classobjs) recurse(classtree) return dot
And the usage:
classes = [c for c in vars(collections.abc).values() if isinstance(c, ABCMeta)] classtree = getclasstree(classes, unique=True) dot = generate_dot(classtree, show_all=True, show_object=True) #print(dot.source) # far too verbose... dot
We do show_object=True
here, but since everything is an object in Python, it's a little redundant and will be at the top of every class hierarchy, so I think it's safe to use the default show_object=False
for regular usage.
dot.render('abcs', format='png')
Gives us the following PNG file (since Stack Overflow does not let us show SVGs.):
Suggestions for further work
An expert in Python or Graphviz might find important things we've missed here.
We could factor out some functionality and write some unittests.
We could also look for and handle annotations for data members normally held in __dict__
.
We might also not be handling all ways of creating Python objects as we should.
I mostly used the following sample code to model the classes for building the node table:
from abc import abstractmethod, ABCMeta from inspect import isabstract, isfunction, classify_class_attrs, signature class NodeBase(metaclass=ABCMeta): __slots__ = 'slota', 'slotb' @abstractmethod def abstract_method(self, bar, baz=True): raise NotImplementedError def implemented_method(self, bar, baz=True): return True @property @abstractmethod def abstract_property(self): raise NotImplementedError @property def property(self): return False class NodeExample(NodeBase): __slots__ = 'slotc' NE = NodeExample
__dict__
but not on the type?\$\endgroup\$