Draw a UML-style class hierarchy of Python's Abstract Base Classes with Python, python-graphviz, and graphviz

Question

Motivation

Python's Abstract Base Classes in the collections.abc module work as mixins and also define abstract interfaces that invoke common functionality in Python's objects.

While reading the actual source is quite enlightening, viewing the hierarchy as a graph would be a useful thing for Python programmers.

Unified Modeling Language (UML) is a way of visualizing system design.

Libraries

Graphviz is a natural choice for this kind of output. One can find many examples of people generating these kinds of outputs using Graphviz.

But, to ensure correctness and completeness, and easy tweakability of the logic used to create them, we need a way to feed the Python classes into Graphviz.

A natural choice is python-graphviz, but we'll need to pipeline the Python into the graphviz interface.

Our grammar of graphics

UML communicates features that Python does not have. Using a smaller subset of the relevant features will give us a graph that is cleaner and easier for those less familiar with UML to understand.

Our subset will be class diagrams where:

Abstract classes' names, data members, and methods are italicized, concrete implementations are not.
Edges (lines) are dashed when the subclass is abstract, and not dashed when the relationship is a specialization.
We only show is-a relationships (class hierarchy) not has-a relationships (composition), because it doesn't really apply for our use-case and because the abc module doesn't have such relevant annotations (if we did have annotations showing this relationship we could support it, but I do not here).

Avoiding information overload in the output

We exclude most of the default methods and members in object. We also specifically hide other members that relate to class relationships and implementations that don't directly affect users.

We do not show return types (e.g. : type) because we lack the annotations in the module.

We do not show + (public), - (private), and # (protected) symbols, because in Python everything is accessible, and it is merely convention that indicates users should avoid interfaces prefixed with underscores (_).

The code

I got this environment with Anaconda, which used a fresh install and required

$ conda install graphviz python-graphviz

And I did my construction in a notebook using Jupyter Lab.

First we import the python-graphviz Digraph object, some functions from the standard library, the collections.abc module for the abstract base classes that we want to study, and some types we'll want to use to discriminate between objects.

We also put types or methods and members that we want to exclude in a tuple, and put the names of methods we also intentionally hide in a set.

from graphviz import Digraph from inspect import getclasstree, isabstract, classify_class_attrs, signature import collections.abc from html import escape from abc import ABCMeta from types import WrapperDescriptorType, MethodDescriptorType, BuiltinFunctionType # usually want to hide default (usually unimplemented) implementations: # ClassMethodDescriptorType includes dict.fromkeys - interesting/rare- enough? EXCLUDED_TYPES = ( WrapperDescriptorType, # e.g. __repr__ and __lt__ MethodDescriptorType, # e.g. __dir__, __format__, __reduce__, __reduce_ex__ BuiltinFunctionType, # e.g. __new__ ) HIDE_METHODS = { '__init_subclass__', # error warning, can't get signature '_abc_impl', '__subclasshook__', # why see this? '__abstractmethods__', }

Now we must create the table (labels) for the graphviz nodes. Graphviz uses a syntax that looks like html (but isn't).

Here we create a function that takes the class and returns the table from the information that we can derive from the class.

def node_label(cls, show_all=False, hide_override=set()): italic_format = '<i>{}</i>'.format name_format = italic_format if isabstract(cls) else format attributes = [] methods = [] abstractmethods = getattr(cls, "__abstractmethods__", ()) for attr in classify_class_attrs(cls): if ((show_all or attr.name[0] != '_' or attr.name in abstractmethods) and not isinstance(attr.object, EXCLUDED_TYPES) and attr.name not in hide_override): if name in abstractmethods: name = italic_format(attr.name) else: name = attr.name if attr.kind in {'property', 'data'}: attributes.append(name) else: try: args = escape(str(signature(attr.object))) except (ValueError, TypeError) as e: print(f'was not able to get signature for {attr}, {repr(e)}') args = '()' methods.append(name + args) td_align = '<td align="left" balign="left">' line_join = '<br/>'.join attr_section = f"<hr/><tr>{td_align}{line_join(attributes)}</td></tr>" method_section = f"<hr/><tr>{td_align}{line_join(methods)}</td></tr>" return f"""< <table border="1" cellborder="0" cellpadding="2" cellspacing="0" align="left"> <tr><td align="center"> <b>{name_format(cls.__name__)}</b> </td></tr> {attr_section} {method_section} </table>>"""

The code above gives us a way to create the tables for the nodes.

Now we define a function that takes a classtree (the kind returned by inspect.getclasstree(classes)) and returns a fully instantiated Digraph object suitable to display an image.

def generate_dot( classtree, show_all=False, hide_override=HIDE_METHODS, show_object=False): """recurse through classtree structure and return a Digraph object """ dot = Digraph( name=None, comment=None, filename=None, directory=None, format='svg', engine=None, encoding='utf-8', graph_attr=None, node_attr=dict(shape='none'), edge_attr=dict( arrowtail='onormal', dir='back'), body=None, strict=False) def recurse(classtree): for classobjs in classtree: if isinstance(classobjs, tuple): cls, bases = classobjs if show_object or cls is not object: dot.node( cls.__name__, label=node_label(cls, show_all=show_all, hide_override=hide_override)) for base in bases: if show_object or base is not object: dot.edge(base.__name__, cls.__name__, style="dashed" if isabstract(base) else 'solid') if isinstance(classobjs, list): recurse(classobjs) recurse(classtree) return dot

And the usage:

classes = [c for c in vars(collections.abc).values() if isinstance(c, ABCMeta)] classtree = getclasstree(classes, unique=True) dot = generate_dot(classtree, show_all=True, show_object=True) #print(dot.source) # far too verbose... dot

We do show_object=True here, but since everything is an object in Python, it's a little redundant and will be at the top of every class hierarchy, so I think it's safe to use the default show_object=False for regular usage.

dot.render('abcs', format='png')

Gives us the following PNG file (since Stack Overflow does not let us show SVGs.):

Suggestions for further work

An expert in Python or Graphviz might find important things we've missed here.

We could factor out some functionality and write some unittests.

We could also look for and handle annotations for data members normally held in __dict__.

We might also not be handling all ways of creating Python objects as we should.

I mostly used the following sample code to model the classes for building the node table:

from abc import abstractmethod, ABCMeta from inspect import isabstract, isfunction, classify_class_attrs, signature class NodeBase(metaclass=ABCMeta): __slots__ = 'slota', 'slotb' @abstractmethod def abstract_method(self, bar, baz=True): raise NotImplementedError def implemented_method(self, bar, baz=True): return True @property @abstractmethod def abstract_property(self): raise NotImplementedError @property def property(self): return False class NodeExample(NodeBase): __slots__ = 'slotc' NE = NodeExample

Could you provide your code in one code block? I'm interested in this, but I don't copy multiple code blocks. Also do you think there are class attributes, not instance attributes, that are defined in __dict__ but not on the type? — Peilonrayz, CommentedDec 8, 2019 at 12:56

Reinderien · Accepted Answer · 2020-04-15 04:30:38Z

Formatting

There are some statements with non-standard format in here that I'm not too bothered by, but this one makes my eye twitch:

EXCLUDED_TYPES = ( WrapperDescriptorType, # e.g. __repr__ and __lt__ MethodDescriptorType, # e.g. __dir__, __format__, __reduce__, __reduce_ex__ BuiltinFunctionType, # e.g. __new__ ) HIDE_METHODS = { '__init_subclass__', # error warning, can't get signature '_abc_impl', '__subclasshook__', # why see this? '__abstractmethods__', }

Ending parens and braces should be at the level of indentation of the beginning of the first line of the statement, not the level of indentation of the opening paren/brace. If you run pylint on your code, it will produce this:

C0330: Wrong hanging indentation. ) | | ^ (bad-continuation)

In other words,

EXCLUDED_TYPES = ( WrapperDescriptorType, # e.g. __repr__ and __lt__ MethodDescriptorType, # e.g. __dir__, __format__, __reduce__, __reduce_ex__ BuiltinFunctionType, # e.g. __new__ ) HIDE_METHODS = { '__init_subclass__', # error warning, can't get signature '_abc_impl', '__subclasshook__', # why see this? '__abstractmethods__', }

Bug

This line:

 if name in abstractmethods:

references name before it has been defined. Below I assume that you meant attr.name.

False positives

I do not think that MethodDescriptorType should be included in EXCLUDED_TYPES. When I passed str in, all but one of its instance methods were this type and were erroneously excluded.

Confusing signature defaults

When you fail to get a signature you display it as (), but that's misleading because it is a valid signature. Instead consider something like (?).

Mutable defaults

Don't assign hide_override a default of set(), because that does not create a new set every time - it reuses the same set. If your function were to modify it it would contaminate future calls. For this reason many linters flag this and instead recommend that you give it a default of None and assign an empty set in the method itself.

Templating

Looking at your node_label, you could get a lot of mileage out of a templating engine like Jinja. From their documentation,

Jinja is a general purpose template engine and not only used for HTML/XML generation. For example you may generate LaTeX, emails, CSS, JavaScript, or configuration files.

So you should be able to handle GraphViz markup just fine. This will provide you with a cleaner way to separate your presentation layer.

Here is some example code of what this could look like. First, the Python:

from html import escape from inspect import ( isabstract, classify_class_attrs, signature, ) from jinja2 import Template, FileSystemLoader, Environment from types import ( WrapperDescriptorType, BuiltinFunctionType, ) from typing import Type, Optional, Set EXCLUDED_TYPES = ( WrapperDescriptorType, # e.g. __repr__ and __lt__ # This does NOT only apply to functions like __dir__, __format__, __reduce__, __reduce_ex__ # MethodDescriptorType, BuiltinFunctionType, # e.g. __new__ ) def load_template(filename: str) -> Template: # See https://stackoverflow.com/a/38642558/313768 loader = FileSystemLoader(searchpath='./') env = Environment(loader=loader, autoescape=True) return env.get_template(filename) TEMPLATE = load_template('class.jinja') def node_label( cls: Type, show_all: bool = False, hide_override: Optional[Set[str]] = None, ) -> str: if hide_override is None: hide_override = set() attributes, methods = [], [] abstract_methods = set(getattr(cls, "__abstractmethods__", ())) for attr in classify_class_attrs(cls): if ( ( show_all or attr.name[0] != '_' or attr.name in abstract_methods ) and not isinstance(attr.object, EXCLUDED_TYPES) and attr.name not in hide_override ): is_abstract = attr.name in abstract_methods if attr.kind in {'property', 'data'}: attributes.append((attr.name, is_abstract)) else: try: args = escape(str(signature(attr.object))) except (ValueError, TypeError) as e: print(f'unable to get signature for {attr}, {repr(e)}') args = '(?)' methods.append((attr.name, args, is_abstract)) attributes.sort() methods.sort() return TEMPLATE.render( name=cls.__name__, is_abstract=isabstract(cls), attributes=attributes, methods=methods, ) from abc import ABC, abstractmethod class NodeBase(ABC): __slots__ = 'slota', 'slotb' @abstractmethod def abstract_method(self, bar, baz=True): raise NotImplementedError def implemented_method(self, bar, baz=True): return True @property @abstractmethod def abstract_property(self): raise NotImplementedError @property def property(self): return False print(node_label(AbstractExample))

And the template:

< <table border="1" cellborder="0" cellpadding="2" cellspacing="0" align="left"> <tr> <td align="center"> <b> {%- if is_abstract -%} <i>{{name}}</i> {%- else -%} {{name}} {%- endif -%} </b> </td> </tr> <hr/> <tr> <td align="left" balign="left"> {% for a_name, a_abstract in attributes %} {%- if a_abstract -%} <i>{{a_name}}</i> {%- else -%} {{a_name}} {%- endif -%} <br/> {% endfor %} </td> </tr> <hr/> <tr> <td align="left" balign="left"> {% for m_name, m_args, m_abstract in methods %} {%- if m_abstract -%} <i>{{m_name}}{{m_args}}</i> {%- else -%} {{m_name}}{{m_args}} {%- endif -%} <br/> {% endfor %} </td> </tr> <hr/> </table> >

This outputs:

< <table border="1" cellborder="0" cellpadding="2" cellspacing="0" align="left"> <tr> <td align="center"> <b><i>NodeBase</i></b> </td> </tr> <hr/> <tr> <td align="left" balign="left"> <i>abstract_property</i><br/> property<br/> slota<br/> slotb<br/> </td> </tr> <hr/> <tr> <td align="left" balign="left"> <i>abstract_method(self, bar, baz=True)</i><br/> implemented_method(self, bar, baz=True)<br/> </td> </tr> <hr/> </table> >

The template has been written in a simple, "dumb" fashion so there is some repetition, for example in the td definitions and the conditional italics. There are ways to reduce this that I will leave as an exercise to you.

As I said - "Graphviz uses a syntax that looks like html (but isn't)." - maybe it works for this - can you give an example? — Aaron Hall, CommentedApr 14, 2020 at 22:49
Sure; edited. The suggested code is not a perfect one-to-one mapping of the output of your code (e.g. trailing break tags, whitespace, etc.). — Reinderien, CommentedApr 15, 2020 at 4:31
p.s. in the process of creating the suggested code I uncovered a small handful of other issues. In particular, name not being defined at the right time surprised me. — Reinderien, CommentedApr 18, 2020 at 17:48
Since the code is supposed to be in a working state, should I fix it from my working copy? — Aaron Hall, CommentedApr 18, 2020 at 18:00
Based on codereview.meta.stackexchange.com/questions/9078 the answer appears to be no - If the question has been answered, you must not edit it in a way that would invalidate the answers; that is not permitted on Code Review. — Reinderien, CommentedApr 18, 2020 at 18:02

Stack Exchange Network

Draw a UML-style class hierarchy of Python's Abstract Base Classes with Python, python-graphviz, and graphviz

Motivation

Libraries

Our grammar of graphics

Avoiding information overload in the output

The code

Suggestions for further work

1 Answer 1

Formatting

Bug

False positives

Confusing signature defaults

Mutable defaults

Templating

Hot Network Questions

Draw a UML-style class hierarchy of Python's Abstract Base Classes with Python, python-graphviz, and graphviz

Motivation

Libraries

Our grammar of graphics

Avoiding information overload in the output

The code

Suggestions for further work

1 Answer 1

Formatting

Bug

False positives

Confusing signature defaults

Mutable defaults

Templating

Related

Hot Network Questions