1

Given the abstract syntax tree (AST) of each line of a function's code, I am asked to generate code for that function's corresponding unit tests, similar to what Microsoft's IntelliTest tool does here: https://docs.microsoft.com/en-us/visualstudio/test/generate-unit-tests-for-your-code-with-intellitest?view=vs-2019.

The issue is that I have to implement this from scratch instead of using built-in tools, since my project is implementing an ABAP to C# interpreter (ABAP is a programming language used for SAP's enterprise software), which executes ABAP code in C#, so I cannot just use an IDE's unit test generation tool.

I so far have decided that the first part of the algorithm is to generate the function's parameters, but I'm not sure how I'm going to generate the function's parameters exactly. Furthermore, I don't know how the unit test generator is going to compute the expected outputs of the function.

Note: This algorithm doesn't need to be an algorithm that works as well as an IDE's automatic unit test generation, I just need to be able to come up with an algorithm that is an initial working prototype. But so far, I am stumped, and haven't found any good online resources regarding this topic.

5
  • 1
    The question is a little vague. Generating code is a wide subject, can you be a bit more specific about what the problem is and what you have tried? If the unit tests are going to capture the existing behavior of the code (rather than a separate specification), then the expected output would be the actual output of running the code, so you calculate that by running the code.
    – JacquesB
    CommentedJul 27, 2021 at 8:22
  • Do you only want to generate unit tests which pass?CommentedJul 27, 2021 at 8:39
  • "I'm not sure how I'm going to generate the function's parameters exactly" is a quite vague problem description. Do you mean you don't which values to pick for those parameters, or you don't know how to generate the parameters in a syntactically correct way? Moreover, I think your main question title is missing the magic word useful - there is surely a way to generate trivial but useless unit tests, but I guess that is not what you are after.
    – Doc Brown
    CommentedJul 27, 2021 at 9:15
  • 2
  • In my opinion ... "no." In fact, I think that tests ought to be written by a different programmer than the one(s) who originally [designed and ...] wrote the code. I think that there is a very important pure-human(!) aspect to this step. The test should, as much as possible, treat the target as "a black box." But the actual complement of tests that may be required probably have to do with much more than "the source-code of this particular unit." An algorithm cannot be expected to "know" that. (Reliance on "the idea that it ever could" could be dangerous.)CommentedJul 27, 2021 at 19:48

2 Answers 2

3

With automatically generating test cases, there are three central problems:

  • Do the automatically generated tests capture the intent of the programmer?
  • How can input values be selected?
  • How can the correct output be determined?

The answer to most of these questions will be quite unsatisfying. The generated tests cannot infer programmer intent and just characterize the code as it is. Therefore, the outputs/behaviours will generally just be the current outputs/behaviours but will not detect mistakes in the code! This is useful for generating regression tests (e.g. prior to a refactoring), but is not very useful as part of quality assurance.

To derive interesting input values, consider two strategies.

First, you might consider symbolic execution of the unit under test. For example, when hitting a statement if (x < 5) { ... } then you have two input classes {x < 5, x ≥ 5} for which you can continue symbolic execution. Later, you can select representative values for each input class. Tricky aspects include dealing with the complex semantics of the target language, handling loops and recursion, handling relationships between variables, and other problems with the exponential explosion of the state space. There is substantial academic literature on symbolic execution. As a simple strategy, consider sampling random paths through the control flow graph.

Second, you could treat the unit under test as a black box and generate values without direct feedback, but possibly randomly. This is expensive, but typically works quite well if the code doesn't have code paths that only trigger for certain magic numbers. E.g. if (x == 30752474) { ... } is difficult to cover. If coverage data can be used for feedback, this turns into a search/optimization problem. There is substantial academic literature on generating interesting inputs, for example using evolutionary algorithms. There are also existing tools to exercise the input space of some code, e.g. libraries like QuickCheck or fuzzing tools like AFL.

As mentioned, the automatically generated tests will be largely useless for QA purposes since they can just verify that the code continues to do what it previously did. Things become more interesting if we define behaviour that should not occur, e.g. null pointer exceptions. Fuzz testing is a fairly brute force approach for looking at illegal states. With property-based testing (e.g. QuickCheck), the tester no longer provides specific examples to test, but states properties about the behaviour that should always hold. The testing library then uses a black-box strategy to generate values, with the goal of finding a counter-example to the stated property. For example, we might have a system under test and a property test as follows (pseudocode):

// doubles its argument int SystemUnderTest(int a) { return a * 2; } void Test(int a) { Assert(SystemUnderTest(a) / 2 == a); } 

A property-based testing library that knows how to generate interesting values for an int type will quickly find counterexamples such as a = int.MaxValue.

In some cases, you might be heuristically able to infer the programmer's intent and then programmatically state interesting properties for which you can then generate test inputs.

But in general, a cooperative approach between tooling and programmer might be more beneficial:

  • make property-based testing easy
  • use automated tools such as fuzzers to find example inputs that hit uncovered code
  • if necessary, use automatically generated test cases to characterize the status quo behaviour of the software
    1

    I like to think of test case generation as two challenges:

    • Input generation - how do you generate inputs to run the method/module/program on?
    • The test oracle - how do you tell whether the program's behavior on that input was acceptable or problematic?

    When a human developer writes a test case, normally their test case specifies both the input and the correct output, which solves both problems. In contrast, when we ask a computer to automate test case generation, there are a number of techniques to generate random inputs, but finding a test oracle is much harder.

    There are a number of approaches I've seen in the research literature for dealing with the "test oracle problem":

    • Does it crash/throw an exception? - The most basic test oracle is to run the code on the input, and if the code crashes or throws an uncaught/unexpected exception, then you have found a bug, otherwise you assume all is well. If your code contains copious assertions, this becomes more powerful, as it also detects an input that triggers an assertion violation.

    • Manual property specification - Another approach is to ask the developer to write a property or invariant that the code is supposed to satisfy. Then, the test oracle checks whether the output from the code satisfies that property or invariant. For instance, the invariant for a sorting algorithm might specified that the output is in sorted order, and is a permutation of the input. This can lead to more powerful tests, but also requires more effort from the developer to specify these properties, and in some cases, writing down such a property can be as hard (and as error-prone) as implementing the routine in the first place. If this style of approach interests you, see e.g. QuickCheck and successors.

    • Coverage - A third approach is to ignore checking for correctness, and just generate a bunch of inputs that hopefully achieve high coverage on the program. For instance, if you believe the current version of the program works correctly, you might generate a high-coverage suite of inputs, record the output of the program on each, treat those as the "golden" correct outputs for those inputs, and then your tests assert that the code continues to produce those same outputs on those inputs. This might be helpful for regression testing. There are many techniques for achieving high coverage, including random test case generation, grey-box test case generation (see, e.g., AFL), and white-box generation (e.g., symbolic execution, concolic testing).

    On the "input generation" problem. There is a long line of work in the research literature on input generation:

    • If you are interested in generating random inputs to the entire program, and the inputs take the form of a stream of bytes or the contents of a file, I suggest looking at work on fuzzing. Fuzzers solve exactly this problem. You might be especially interested in AFL.

    • If you are interested in generating random unit tests, which invoke a few APIs or methods, then there's a lot of work on this subject for Java. I suggest learning about tools that do this for Java, and then translating them to your setting. For Java, the core challenge is that we need to generate random instances of objects. This is needed so that we can use them as parameters to method calls, and so that we can call random methods on them. I suggest looking at Randoop (paper). This line of work will be particularly relevant if ABAP is an object-oriented language. You might also be interested in this case study: Applying Automated Test Case Generation in Industry: A Retrospective.

      Start asking to get answers

      Find the answer to your question by asking.

      Ask question

      Explore related questions

      See similar questions with these tags.