1
\$\begingroup\$

Background:

I was working on a project were I needed to write some rules for text-processing. After working on this project for a couple of days and implementing some rules, I realized I needed to determine the order of the rules. No problem; we have topological sorting to help. But then I realized that I can't expect the graph to always be full. So I came up with this idea, that given a single rule with a set of dependencies (or a single dependency) I need to check the dependencies of the dependencies. Sounds familiar? Yes. This subject is very similar to depth-first-search of a graph. I am not a mathematician, nor did I study C.S. Hence, Graph Theory is a new field for me. Nevertheless, I implemented something below which works (inefficiently, I suspect).

The code:

This is my search and yield algorithm. If you run it on the examples below, you will see it visits some nodes more than once. Hence, the speculated inefficiency. A word about the input. The rules I wrote are basically Python classes, which have a class property depends. I was criticized for not using inspect.getmro- But this would complicate thing terribly because the class would need to inherit from each other (See example here).

def _yield_name_dep(rules_deps): global recursion_counter recursion_counter = recursion_counter +1 # yield all rules by their named and dependencies for rule, dep in rules_deps.items(): if not dep: yield rule, dep continue else: yield rule, dep for ii in dep: i = getattr(rules, ii) instance = i() if instance.depends: new_dep={str(instance): instance.depends} for dep in _yield_name_dep(new_dep): yield dep else: yield str(instance), instance.depends 

Input to test:

demo_class_content =""" class A(object): depends = 'B' def __str__(self): return self.__class__.__name__ class B(object): depends = ('C','F') def __str__(self): return self.__class__.__name__ class C(object): depends = ('D', 'E') def __str__(self): return self.__class__.__name__ class D(object): depends = None def __str__(self): return self.__class__.__name__ class F(object): depends = 'E' def __str__(self): return self.__class__.__name__ class E(object): depends = None def __str__(self): return self.__class__.__name__ """ with open('demo_classes.py', 'w') as clsdemo: clsdemo.write(demo_class_content) import demo_classes as rules rule_start={'A': 'B'} def _yield_name_dep(rules_deps): # yield all rules by their named and dependencies for rule, dep in rules_deps.items(): if not dep: yield rule, dep continue else: yield rule, dep for ii in dep: i = getattr(rules, ii) instance = i() if instance.depends: new_dep={str(instance): instance.depends} for dep in _yield_name_dep(new_dep): yield dep else: yield str(instance), instance.depends if __name__ == '__main__': # this is yielding nodes visited multiple times, # list(_yield_name_dep(rule_start)) # hence, my work around was to use set() ... rule_dependencies = list(set(_yield_name_dep(rule_start))) print rule_dependencies 

The questions:

  • I tried classifying my work, and I think what I did is similar to DFS. Can you really classify it like this?
  • How can I improve this function to skip visited nodes, and still use generators?

answers ...

Using the feedback from Gareth and other kind users of Stackoverflow, here is what I came up with. It is clearer, and also more general:

def _dfs(start_nodes, rules, visited): """ Depth First Search start_nodes - Dictionary of Rule with dependencies (as Tuples): start_nodes = {'A': ('B','C')} rules - Dictionary of Rules with dependencies (as Tuples): e.g. rules = {'A':('B','C'), 'B':('D','E'), 'C':('E','F'), 'D':(), 'E':(), 'F':()} The above rules describe the following DAG: A / \ B C / \ / \ D E F usage: >>> rules = {'A':('B','C'), 'B':('D','E'), 'C':('E','F'), 'D':(), 'E':(), 'F':()} >>> visited = [] >>> list(_dfs({'A': ('B','C')}, rules, visited)) [('A', ('B', 'C')), ('B', ('D', 'E')), ('D', ()), ('E', ()), ('C', ('E', 'F')), ('F', ())] """ for rule, dep in start_nodes.items(): if rule not in visited: yield rule, dep visited.append(rule) for ii in dep: new_dep={ ii : rules[ii]} for dep in _dfs(new_dep, rules, visited): if dep not in visited: yield dep 
\$\endgroup\$
0

    2 Answers 2

    3
    \$\begingroup\$

    This code is not ready for review, as it's not working—it visits some rules more than once. If you need help writing a topological sort algorithm, Stack Overflow is the right place, not Code Review.

    But I'll make one important observation about your code: there is no documentation. It is not clear from your code what it is supposed to do and how to use it. When you have a function whose behaviour is complex, it's important to start by writing a specification for it: what inputs does it take, and what output does it produce?

    Once you've written the specification, stop and think. Is this specification clear? Is it easy for people to understand and use? If the answers are "no", then think about how you could change the specification.

    If you try to do this, then you'll immediately see the problem. I had a go myself and this is what I came up with:

    def _yield_name_dep(rules_deps): """Generate rules from rules_deps, together with their dependencies, in topological order: that is, each rule is generated before any of the rules it depends on. The argument rules_deps must be a dictionary mapping a rule object R to either (i) the name of the (single) rule that R depends on; or (ii) a sequence of names of rules that R depends on; or (iii) None if R has no dependencies. The names of the dependencies must be names of objects in the rules module, each of which must have a "depends" attribute containing the names of the rule's dependencies (as above). The generated output consists of pairs (rule, dependencies), with rule being a rule object (or the rule name, if the rule has no dependencies), and dependencies as above. """ 

    Did I get this right? Who knows? This is a terrible specification. Imagine being a programmer trying to read the documentation and figure out how to use the function!

    So you must start by simplifying and clarifying the specification. Design the interface so that is easy to understand and easy to use, and then fixing the code will be straightforward.

    \$\endgroup\$
    0
      1
      \$\begingroup\$

      Variable names

      You mean the dependencies of a rule with dep, so use that as variable name. Variable names are a very important part of the documentation of the code. dep and ii are not so good variable names

      visited

      To keep the visited nodes, I would use a set, since this is the container meant to hold distinct unordered entries.

      I would also make this a parameter with default value None, and initialize it to set() if it isn't passed in the function. This simplifies the initial calling of the method.

      start_nodes

      Why do you include the dependencies in this parameter? Just the node should suffice. The dependencies can be looked up in rules.

      Since you call this function recursively, the nodes you pass in on the second call will no longer be start nodes, so I renamed this parameter to nodes

      Control flow

      To limit the number of indents, instead of if rule not in visited, I would use if rule in visited: continue

      The second check if dep not in visited: is already incorporated in the cal to _dfs, so doesn't need to be repeated here.

      The fact you don't have to look up the dependent roles to call _dfs anymore also greatly simplifies the recursive calls.

      def _dfs(nodes, rules, visited=None): if visited is None: visited = set() for rule in nodes: if rule in visited: continue dependencies = rules[rule] yield rule, dependencies visited.add(rule) for item in _dfs(dependencies, rules, visited): yield item 
      \$\endgroup\$

        Start asking to get answers

        Find the answer to your question by asking.

        Ask question

        Explore related questions

        See similar questions with these tags.