How to maintain XML/JSON serialization tests?

Question

It is quite common to test JSON / XML producing methods against file-stored expected output (at least in Java world, but probably in other environments, too).

For instance there's a method that produces RSS feed of new offers in the service, which has different formats (for different RSS consumers) and different options (query params, specific dates, short/long format, etc).

After a while there are 20-30 files that are validated against generated output in unit tests. When new request comes from the backlog a new file is created to meet the expectations and a new unit test is written.

When the output of the XML changes (a new field mandatory for all formats appears, or date format changes) it must be changed in all files stored for tests. Which makes files useless and painful to maintain in the long run.

My question is: how to mitigate that?

Should unit tests be written purely for fragment of the generated XML only? For instance separate unit tests for date field, separate for new mandatory field, etc?

How to test different outputs/formats/query params to the method producing XML? How tests should be organised to make sure the result XML is in a valid format, but at the same time change of a single property doesn't require changing all of the unit tests?

It depends. If schemas change infrequently, an external file, maintained in your code repository, is a good way to proceed. If schemas change frequently, you may want a different strategy. It depends. — BobDalgleish, CommentedDec 24, 2019 at 20:44
This question is in no way Java specific, and if you have seen this only in a Java context, it was only coincidence. — Doc Brown, CommentedDec 24, 2019 at 22:11

Robert Harvey · Accepted Answer · 2019-12-24 23:11:14Z

Tests of the kind you describe are brittle. As you've correctly pointed out, small changes are likely to break a significant number of the existing tests. But that is by design; tests like these are meant to break over the smallest change to your code. They are the proverbial "canary in the coal mine," meant to detect breaking changes in your code before your users do.

Such tests are relatively easy to create, and you shouldn't need a lot of them, because they cover a broad swath. All you have to do is look at the output of the code under test (given a particular input), and if you like it, paste it into the unit test assertion (or your files). This still holds true if you break your tests with new code; simply review all of the outputs, and re-paste them into your unit tests.

But such tests don't tell you anything about what might specifically have broken, only that something did, in fact, break. It's now up to you to figure out what changed, how it affected the output, and whether or not the new output is desirable.

The alternative is to find a way to break your code and your unit tests into smaller, more independent chunks, in such a way that, if you make a change to the code, it only breaks one or a few unit tests. To do that, you will have to invest a significant amount of time and effort into rewriting your code and your tests. Ultimately, this is probably a better, more maintainable approach.

Is it worth the trouble? Only you can answer that.

Doc Brown · Accepted Answer · 2019-12-25 08:17:47Z

I am working with such automated tests since several years, and though we have several hundred of complex XML files in one test suite, they are very well maintainable, even if some of the tests break from time to time.

Our team accomplishes this in the following way:

We use custom "diff" algorithms to detect differences between different the actual and the expected file. These diff algorithms are usually tailored to the specific file format for not signaling differences when there are "acceptable" changes (like the change in a time stamp, or a floating point value which changes only by very some small value).
For each group of tests, we keep the expected files in one folder under version control (lets call it ExpectedData), and generate the actual output in a different folder ActualData which is not under version control. When files in those two folders become different, we check if the kind of difference was intended. If that is the case, we simply copy the changed files from ActualData into ExpectedData.

And yes, if a single property changes, we may have to update some test data files, maybe all within a group, but the effort for this is pretty low and we rely on version control that the new data files are stored in an efficient manner as difference to their predecessors. This works pretty well for us.

Let me add a final note: though we use our unit-testing framework for running the tests, I would not call them unit-tests. These kind of tests are better called "automated regression tests" or something like that. One can have these kinds of tests as well as real unit-tests which work without external files and finer granularity, both kinds of tests are useful.

user949300 · Accepted Answer · 2019-12-24 23:47:47Z

What is the Requirement you are testing?

If the requirement is that your XML fits a specific format, say, SOME-SPEC-1234, then yes, you'll need to test against that specific format.

If the requirement is that what you write can be read back (mainly by your SW) with no loss in data, then, IMO, a better test is to do that: create the data, write it as XML or JSON to a temp file, read it back in, and compare appropriate fields. This is, strictly speaking, not a "unit" test and you'll get some blowback. :-) OTOH, this is much simpler to write and maintain.

Stack Exchange Network

How to maintain XML/JSON serialization tests?

3 Answers 3

Hot Network Questions

How to maintain XML/JSON serialization tests?

3 Answers 3

Related

Hot Network Questions