IEA - A File based testing strategy for document transformations

by Douglas Campbell, Principal Engineer

I/E/A - Input/Expected/Actual

If you can provide implementations for the set of functions below,

// defines how to go from file to an instance of IN
abstract IN loadInput(File input);

// defines how to go from file to OUT instances
// throw FNF whenever there’s no expected file.
abstract OUT loadOutput(File expected) throws FileNotFoundException;

// invoked upon missing expected file or if actual != expected
abstract void storeResult(OUT result, File resultFile);

a lot can be done to take the drudgery out of testing document transformations and maintain a robust set of tests.

In short, it becomes trivial write test cases as files rather than functions.

I vastly prefer this approach to testing document transformation than mucking with test code. First, a little background.


So much of the code I’ve written over the years has at its core changing documents of one format into another. By documents, I’m talking about any variety of json, xml, avro, tab delimited, yaml, whatever - anything and everything to anything and everything and back again.

When asked to code stuff like this up, many dev’s eyes simply rollover at the horror of writing something so trivial as mapping one field to another, upper-casing it, fusing it with another value read from some external data source, and then outputting into another representation.

That new representation may have little resemblance to the initial doc. That’s the nature of the beast.

In short, the requirements driving data transformations are completely arbitrary!

I think this is the reason some devs loathe this stuff. They take short cuts and throw things into test code that look like this…

private static final String CLICK = "a horrendously ugly line of input from an access log";

And when it’s time to test an “impression” guess what happens. Ugh - test input inline as string variables. Spidey sense immediately activated upon suppression of checkstyle!

One thing I’ve learned is that wherever there’s dread and avoidance, that’s exactly where some excessively energetic laziness is needed. Step back. Make it dead simple and get rid of reams and reams of duplicate logic and or code. Write a function to run over an entire folder full of tests

// test folder of tests and fail entire run if any fail. 
public void testFolder(File testfolder, Function<IN, OUT> converter) {

    for (File test: testfolder.listFiles(testFileFilter)) {
        if (!testSingle(test, converter)) {
           fail("expected != actual - actual " +
           "results saved in .actual file");

So what of this testSingle function? Very straightforward with the three functions defined at the start of the blog.

// test a single test
private boolean testSingle(File test, Function<IN, OUT> converter) {
    IN input = loadInput(test);
    OUT actual = converter.apply(input);
    OUT expected = null;

    // only create if no expected file or result is different.
    File actualFile = new File(test.getParent(),
                               test.getName() + ".actual");

    try {
        expected = loadOutput(new File(test.getParent(),
                                       + ".expected"));

    } catch (FileNotFoundException ex) {
        // we haven't got expected file - no biggie
        // we can turn this into an expected file once
        // satisfied with it.
        storeResult(actual, actualFile);
    return false;

    // we've got something to compare
    if (!actual.equals(expected)) {
        // fail and save the actual file for command
        // line diffing
        storeResult(actual, actualFile);
        return false;

    return true;

Miscellaneous benefits of this approach

For text and or ascii based documents, the diff utility is immediately available to compare test failures

You don’t have to write test code to generate what your expected output is. Just add your sample record to the tests folder, run the tests, and the actual output is generated to a new file with .actual extension

You have an immediate way to run your code over records which have caused you trouble in production. Again, drop the problematic record into your test folder and run the test.

If your team decides to build a brand new converter or use a different json library, all your tests are expressed as language independent files. You can carry them forward.

Stuff to get right up front

First, be careful to make your transformation logic stateless. Essentially, document transformations which involve out of process calls need to be reworked in order to separate the retrieval of raw data from what’s done to it.

Second, start doing this early on in lifecycle of code module or app. Appetites are usually pretty meager for circling back and re-working old icky tests.

Next step

A next step for us is to package this up as an open sourced test package. Who knows, it may even happen sooner if we get a little interest :)