Testing and Debugging Techniques vs. Rice's Theorem

Question

As far as I got it correctly, as a result of Rice's theorem, the equivalence of two programs is generally not decidable.

Nevertheless, there exists a wide variety of testing and debugging techniques, which do change the program for their purpose - actually, I can barely think of a technique that does not. So in fact, you're never working with the program you want to investigate - best case it's pretty similar, but in principle you can receive an arbitrarily different program.

The general reasoning I've heard about this, goes similar to: "This is just statment xy, that does not do anything but helping with testing/debugging." But especially in the context of usually hard-to-find bugs like race conditions and other timing-related issues, any alteration of the program flow has been proven to be substantial.

While I'm aware that such considerations might be overkill for a wide variety of applications, they become significant as soon as the application is supposed to be deployed in an environment where lives depend on it, like medical equipment for example. The standards I'm aware of, like the IEC 62304, circumvent this issue by requiring more thorough testing for higher certification classes, but this seems to be just best practice.

Shouldn't there be a theoretical basis be used to argue about the extend of testing necessary despite the existence of Rice's theorem, to make as sure as possible, that the testing and debugging techniques used do not hide existing bugs? The scientific literature in this area seems to be extremely scarce, but what are the certification committees, designing those standards, basing their requirements on?

amon · Accepted Answer · 2018-04-22 18:04:50Z

Rice's theorem is a computer science/mathematical statement about computability, i.e. abstract Turing machines. There cannot be a general procedure to decide the equivalence of any two programs. But this doesn't preclude that some procedures could decide the equivalence for a small subset of programs. For example, many practical programs are not Turing-complete.

Also, it might be possible to prove that certain program transformations result in an equivalent program, with regard to certain properties of that program. For example, many compiler transformations might be provably correct.

However, that is still about abstract models of computation, and not a statement about actual computing hardware. It is therefore not helpful to invoke Rice's theorem in the context of non-deterministic behaviour such as race conditions.

You do however point to an important problem, that the software artefact being tested is often different from the software artefact being deployed. Fortunately, there are ways to reduce this impact, for example by leaving debug machinery largely activated in release mode (such as asserts and logging), and by writing code that is not sensitive to timing issues, e.g. by using suitable synchronization strategies.

It is possible to make a formal argument about the need for thorough testing: Testing samples the behaviour of the system under test. The space of possible behaviours is very large, so that it can't be sampled in its entirety. But taking more samples (= performing more tests) gives us better odds of finding more defects. Additionally, not all defects are equally likely. Some defects might be so rare that they are never observed by users. More thorough testing can cover the behaviour of more users, so that fewer defects will remain to be observed by users.

candied_orange · Accepted Answer · 2018-04-22 17:27:47Z

4

The difference between a bug and a feature is the same difference between a plant and a weed. It's either wanted or it's not.

The work that goes into proving code correctness and testing isn't about showing code to be perfect and bug free. It's about establishing how understandable the code is. Only once code is understood can you decide if it does what you want. But without knowing what you want no one can prove anything works.

Rice's theorem just points out that we can't change code and then prove that we've only refactored code without changing behavior. But since it's far more important to establish what behavior we want and show that to be unchanged as we refactor we don't need to worry about equivalence. After all, how did you know the old one had the behavior you wanted anyway?

answered Apr 22, 2018 at 17:27

candied_orange

118k27 gold badges224 silver badges362 bronze badges

I'm not sure we're having the same context in mind: Depending on certification context and criticality level of the application, even the correct behavior of the executable itself you get after compilation has to be evaluated, to make sure that the compiler has not introduced bugs or unwanted code. This is not directly related to the applications' source code, and noone can understand anymore what is written there, it's just a 'black box'. To check its behaviour, you monitor its execution. But in the end, this executable runs unmonitored, and therefore equivalence matters.
– Wanderer
CommentedApr 22, 2018 at 17:44
Yeah that's a bunch of nonsense. For one thing humans can read compiled executables, it's just a pain. For another this kind of thinking is why Java got stuck with type erasure. They wanted to add generics without changing the binaries because it made testing from one version to another a stupid easy file comparison. Because of that the rest of us suffer. They made what was easy to test. Not what was wanted. Thankfully C# didn't do that.
– candied_orange
CommentedApr 23, 2018 at 9:52
You seem to have a strong opinion, which is your good right, but the way you express it is offensive, and I see no justification for this. Apart from this, I have to disagree with you: Requiring humans to understand the semantics of hundreds of thousands of lines of generated assembly is absolutely infeasible. It requires too much time, too much manpower, too many specialists, and induces far too high costs. And yes, if an application is supposed to be tested very thoroughly, you might have to abandon features. Invite me for chat, if you want to continue the discussion.
– Wanderer
CommentedApr 23, 2018 at 15:20

Add a comment |

Stack Exchange Network

Testing and Debugging Techniques vs. Rice's Theorem

2 Answers 2

Hot Network Questions

Testing and Debugging Techniques vs. Rice's Theorem

2 Answers 2

Related

Hot Network Questions