How to extract the active code path from a complex algorithm

Question

I have been puzzled lately by an intruiging idea.

I wonder if there is a (known) method to extract the executed source code from a large complex algorithm. I will try to elaborate this question:

Scenario: There is this complex algorithm where a large amount of people have worked on for many years. The algorithm creates measurement descriptions for a complex measurement device.

The input for the algorithm is a large set of input parameters, lets call this the recipe. Based on this recipe, the algorithm is executed, and the recipe determines which functions, loops and if-then-else constructions are followed within the algorithm. When the algorithm is finished, a set of calculated measurement parameters will form the output. And with these output measurement parameters the device can perform it's measurement.

Now, there is a problem. Since the algorithm has become so complex and large over time, it is very very difficult to find your way in the algorithm when you want to add new functionality for the recipes. Basically a person wants to modify only the functions and code blocks that are affected by its recipe, but he/she has to dig in the whole algorithm and analyze the code to see which code is relevant for his or her recipe, and only after that process new functionality can be added in the right place. Even for simple additions, people tend to get lost in the huge amount of complex code.

Solution: Extract the active code path? I have been brainstorming on this problem, and I think it would be great if there was a way to process the algorithm with the input parameters (the recipe), and to only extract the active functions and codeblocks into a new set of source files or code structure. I'm actually talking about extracting real source code here.

When the active code is extracted and isolated, this will result in a subset of source code that is only a fraction of the original source code structure, and it will be much easier for the person to analyze the code, understand the code, and make his or her modifications. Eventually the changes could be merged back to the original source code of the algorithm, or maybe the modified extracted source code can also be executed on it's own, as if it is a 'lite' version of the original algorithm.

Extra information: We are talking about an algorithm with C and C++ code, about 200 files, and maybe 100K lines of code. The code is compiled and build with a custom Visual Studio based build environment.

So...: I really don't know if this idea is just naive and stupid, or if it is feasible with the right amount of software engineering. I can imagine that there have been more similar situations in the world of software engineering, but I just don't know.

I have quite some experience with software engineering, but definitely not on the level of designing large and complex systems.

I would appreciate any kind of answer, suggestion or comment.

Thanks in advance!

You need a code coverage tool. A good one will instrument your code and allow you to run a test (or a bunch of tests) and show you all of the code that was impacted (tested) by the test. There are a number of commercial and open source code coverage tools available. Personally, I would suggest you look for one that gives you good visual feedback on what was covered -- this usually means paying for the tool. — BobDalgleish, CommentedMar 13, 2014 at 21:38
Not completely a duplicate, but I guess there will be no better answer than the answers to this question: programmers.stackexchange.com/questions/155488/… — Doc Brown, CommentedMar 14, 2014 at 6:18
As far as I know, there's no magic way to do this. Michael Feathers wrote a book on the subject a while ago: amazon.com/Working-Effectively-Legacy-Michael-Feathers/dp/… Basically his strategies amount to "block-and-tackle" sorts of things mostly. Basically reenvision what the code would look like if it hadn't been written without unit tests, and start taking steps toward refactoring it in that direction. The real secret here is PROPER unit tests. Any code that doesn't have proper unit tests will rot. Unit tests tend to force you to write more maintainable code. — Calphool, CommentedApr 15, 2014 at 20:48
I fear that if you find a code path that is exercised by one input (recipe) that changing only that code path will break many others; convoluted code often has obnoxious inter-dependencies. There is also a whiff of undecidability which smells like an analysis for recipe A will not reveal all of the code paths needed for recipe B. — msw, CommentedJul 5, 2015 at 14:51

Community · Accepted Answer · 2017-05-23 12:40:24Z

I think it depends on what you want to achieve... Do you want to improve the code? parallelize the code? clean it? just understand it?

Besides the great comment given by @Calphool, what I've done in similar cases (but not with 100K lines code to be honest) is this:

Look for whoever wrote the code. Or has use it. Asked them what I needed to know, that saves you a lot of time. This may seem stupid, but is not.
I made a graph of the dependencies. Take a look at this for an example.
Depending on what you need to do, you could measure the execution time of some (or all) function.
Start playing with it... but with modern tools, like git. If possible, started to add some tests.

If you want to see what functions get called, you could just print the functions that are called (take a look at this question). You could add a printf at each function using a script, but I don't think that is a good idea. Also, you have to think how are you going to go through the generated output.

After you know what you want, and before making my implementations, I try to isolate the part I need to work on. Meaning, I cleanup the code a little, put in a different file if needed, compile it and test it. Only then I proceed to actually modify the code adding functionality or whatever needs to be done. This may also include port the code to use modern building tools if needed.

My two cents.

Gregor Ophey · Accepted Answer · 2015-07-07 16:58:34Z

The issue seems to be similar to writing test cases with good code coverage There are tools for automatic test case generation that are based on code analysis.

Here is a link to a paper on a tool for C programs:

Klee paper (also as pdf)

The tool generates test input data that is supposed to cover different branches in the code.

I never worked with it, but maybe it can be tweaked to do what you are looking for. I am not shure how this fits with Visual Studio ...

Community · Accepted Answer · 2017-05-23 12:40:13Z

Dynamic program slicing is a possible way, though not necessarly practical with 100k lines of codes.

For example, if your original code is:

if (recipe.has_option_x) { foo(); } else { bar(); }

And you slice your program assuming that recipe.has_option_x is false, you can reduce the size of the code being effectively called to bar(). As usual, theoritical limits will prevent a tool to always know with certainty whether a branch of your program can be safely removed.

The wikipedia page has some links to existing tools. See also Dynamic slicing in C/C++.

Stack Exchange Network

How to extract the active code path from a complex algorithm

3 Answers 3

Linked

Hot Network Questions

How to extract the active code path from a complex algorithm

3 Answers 3

Linked

Related

Hot Network Questions