Getting started with HUnit

Leif Frenzel, March 2007

(Again, I would have rather posted this to my blog at http://cohatoe.blogspot.com, but that is too inconvenient with all the code listings. There is an associated post, however, where you can contact me or comment on this entry. All feedback is welcome :-)

Test-Driven-Development is a programming technique where you write small programs (test cases) in advance of coding the actual piece of software you want to write. If done well, this helps to clarify the various combinations of input/output/environment for your code, and makes it easier to implement it. As a nice extra, you speed up testing - all you have to do is to run all those test cases again, and they will point out whether everything still works as expected.

So far so good. (And if you're coming from Java development, you know that song by heart anyway ;-). But can we do this in Haskell? Sure. There are actually two different well-supported approaches in Haskell to do test-driven development. One is more oriented towards functional programming, focusing on declaring properties of functions - this one is supported by QuickCheck (and not covered in this post); the other is aligned with the 'classical' JUnit tool. There is an implementation of this approach in the form of HUnit. In this post, I'll give an introduction to unit testing with HUnit.

Basic Haskell programming skills are assumed in this post. All of this can be picked up from one of the many excellent tutorials or textbooks that are around. Familiarity with unit testing, on the other hand, is not assumed. (In fact, if you have done unit testing in another language already, most of this may sound a bit basic to you. In that case, just skim over the code listings to see how it is done in Haskell in particular.)

The scenario

Imagine the following scenario: we get a String as input, which is taken to contain the code of a Haskell module (it was probably read from a source file). As additional input we get a pair of integers that is interpreted as the 'cursor position' on the source string (that is, the line number and the column number). Now the task is to determine on which identifier (if any) the cursor is positioned, and return that identifier. So we want something like

findIdentifier :: String -> (Int, Int) -> Maybe String
Of course we don't want to just find words - we want to find identifiers. This means we want to return Nothing if the cursor is in comments, or inside a string literal, or on a keyword.

A basic test setup

A scenario like this suggest a basic skeleton for test cases: provide some input string (tailored to test a specific sort of input), run the tested code over it (in this case, our findIdentifier function), and then inspect the result and make some assertions. Assertions are statements (not used in the programming-language sense of 'statement') that express what we expect to see in the result, usually together with a message that indicates what we expected. The unit test framework will check whether the statement holds with respect to the actual results, and throw a failure, including our message, if it doesn't. In addition to failures, errors may occur in the tested code, and that is of course also a sort of failure. (We will see later how to test for expected errors.)

Typically test case code will reside in separate modules (often in an own source tree). For our example, then, let us suppose that the tested code (that is, the findIdentifier function) is in a module called FindIdentifier in a file called FindIdentifier.hs, and we put the test cases into a second module FindIdentifier_Test in the correspondingly named file.

The first thing we have to do then, is of course to import the tested code, and also the HUnit API (which is in Test.HUnit).

-- file FindIdentifier.hs
module FindIdentifier( findIdentifier ) where

findIdentifier :: String -> (Int, Int) -> Maybe String
findIdentifier = undefined -- let's leave it undefined for the moment

-- file FindIdentifier_Test.hs
module FindIdentifier_Test where

import FindIdentifier( findIdentifier )
import Test.HUnit
-- continued below

Next, we add a single test case. As I already said, the structure of a test case is always this: create some input, run the tested code on that input, make some assertions over the results. For a most simple case, then, let us use an empty string and the (1, 1) position as input - and let us state that we expect Nothing.

testEmpty = TestCase $ assertEqual 
  "Should get Nothing from an empty string" Nothing ( findIdentifier "" (1, 1) ) 

The TestCase constructor accepts an Assertion (which is an IO action, and has the type IO ()). The assertion which we make here is an assertion of equality - we tell HUnit that we expect the result of ( findIdentifier "" (1, 1) ) to equal Nothing.

The only thing missing now in our basic test scenario is to run the test case. In order to do so, we can use HUnit's text test runner, runTestTT, to execute it in the main function of our testing module:

main = runTestTT testEmpty

The FindIdentifier_Test module is now complete. You can load it into an interpreter and execute the main function:

C:\>ghci FindIdentifier_Test.hs
[omitting some ghci output]
Prelude FindIdentifier_Test> main
### Error in:   0
Prelude.undefined
Cases: 1  Tried: 1  Errors: 1  Failures: 0
Prelude FindIdentifier_Test>

I have highlighted the HUnit output in the session log above. (From now on, I will only give the passages that are actual HUnit output and omit the rest.) It tells us that it run one test case and encountered one error - the function that we left undefined of course. Right, so then let us 'implement' it ;-)

-- file FindIdentifier.hs
module FindIdentifier( findIdentifier ) where

findIdentifier :: String -> (Int, Int) -> Maybe String
findIdentifier _ _ = Nothing -- TODO this is not yet the desired functionality

-- file FindIdentifier_Test.hs
module FindIdentifier_Test where

import FindIdentifier( findIdentifier )
import Test.HUnit

testEmpty = TestCase $ assertEqual 
  "Should get Nothing from an empty string"
  Nothing 
  ( findIdentifier "" (1, 1 ) ) 

main = runTestTT testEmpty

Now if we run this, we get

Cases: 1  Tried: 1  Errors: 0  Failures: 0

which looks quite good.

Labeling and grouping tests

Some cases that you probably want to include in your test suite are border cases - for instance an empty input string, a negative cursor position and similar things. Let's do that now, and I'll introduce by the way how to label and group test cases.

-- file FindIdentifier_Test.hs
module FindIdentifier_Test where

import FindIdentifier( findIdentifier )
import Test.HUnit

testEmpty = TestCase $ assertEqual 
  "Should get Nothing from an empty string" Nothing ( findIdentifier "" (1, 1) ) 
testNegCursor = TestCase $ assertEqual 
  "Should get Nothing when cursor is negative" Nothing ( findIdentifier "a" (-1, -1) ) 
testComment = TestCase $ assertEqual 
  "Should get Nothing on comment" Nothing ( findIdentifier "-- a" (1, 3) )
testMinimal = TestCase $ assertEqual 
  "Minimal program" (Just "main") ( findIdentifier "main = print 42" (1, 2) ) 

main = runTestTT $ TestList [testEmpty, testNegCursor, testComment, testMinimal]

You see that we have added some more test cases, and stuffed them into a test list before throwing them to the test runner. Apart from that detail, this is not different from what we did before. But note that the last test in our list fails with our current 'implementation' (which always returns Nothing) - it should return the identifier "main" now.

user error (HUnit:Minimal program
expected: Just "main"
 but got: Nothing)
Cases: 4  Tried: 4  Errors: 1  Failures: 0

Actually, we have now left the field of border cases (such as empty input strings or cursors that are out of range) and started to test some simple, but serious cases. Sometimes it makes sense to capture a difference like this in grouping test cases. Let's make one group for border test cases and one for simple valid cases. In the code listing below, you can see how this is done using the TestList constructor.

borderCases = TestList [ testEmpty, testNegCursor, testComment ]

testEmpty = TestCase $ assertEqual 
  "Should get Nothing from an empty string"
  Nothing 
  ( findIdentifier "" (1, 1) ) 
-- ... omitting the other test cases

simpleCases = TestList [ testMinimal ] 

testMinimal = TestCase $ assertEqual 
  "Minimal program"
  ( Just "main" ) 
  ( findIdentifier "main = print 42" (1, 2) ) 

main = runTestTT $ TestList [ borderCases, simpleCases ]

TestList is (in addition to TestCase) another constructor of the Test type. It is a composite test case that is constructed that way, i.e. one that consists of several other test cases (those which we have stuffed into the list). But the result is a Test again, meaning that we can run it with the test runner exactly the same way as we were running single test cases, and we can also put such a composite test cases into a TestList again.

If you run these grouped tests, you will find an output like this:

### Error in:   1:0
user error (HUnit:Minimal program
expected: Just "main"
 but got: Nothing)
Cases: 4  Tried: 4  Errors: 1  Failures: 0

The grouping is now visible in the source code of the tests, which helps already when reading it; but we can do better with the test output. We have seen that we can give test cases a message string, which is helpful in the test output. Is there a similar way to label test groups? Indeed there is. HUnit provides a third constructor for Tests that can be used to attach a label string to any Test (and therefore, to any TestList. Here's how it works:

borderCases = TestLabel "Border test cases" ( TestList [ 
    testEmpty, testNegCursor, testComment 
  ] )

testEmpty = TestCase $ assertEqual 
  "Should get Nothing from an empty string" 
  Nothing 
  ( findIdentifier "" (1, 1) ) 
-- ... omitting the other test cases

simpleCases = TestLabel "Simple, but serious cases" ( TestList [ 
    testMinimal 
  ] )

testMinimal = TestCase $ assertEqual 
  "Minimal program"
  ( Just "main" ) 
  ( findIdentifier "main = print 42" (1, 2) )
   
main = runTestTT $ TestList [ borderCases, simpleCases ]

Now the output reflects our grouping in the source code:

### Error in:   1:Simple, but serious cases:0
user error (HUnit:Minimal program
expected: Just "main"
 but got: Nothing)
Cases: 4  Tried: 4  Errors: 1  Failures: 0

To sum up the various ways to construct test cases, group them by test lists, and label them, here is the complete Test data type from HUnit:

data Test =    TestCase Assertion
             | TestList [Test]
             | TestLabel String Test

The implementation of FindIdentifier

Well, sooner or later we will have to implement the actual code that makes our test cases happy. I will not describe this directly in this post, but if you are interested you can download a bunch of source files with an implementation of the FindIdentifier module. In what follows, I will assume an implementation like that. (The download file contains also the final testing module that contains all the code snippets quoted in this post.)

Multiple assertions in one test case

Each of our test cases so far consisted in just a single assertion. This has not necessarily to be so; you can also put several assertions into a single test case. Typically, you will want to do so if you want to assert several things about a result from a computation. For example, if you are testing a parser, you may want to run it over different inputs all right, and that makes different test cases. But after a parse you may want to assert many things about the result, e.g. that the parsed data tree has a certain size, that some list has a given number of elements, and the first one is equal to a certain value etc.

Another scenario is where you want to execute a sequence of steps during the test case, each of which results in an intermediate result that you want to make assertions about. In that case you would also want to do multiple assertions inside one test case.

In our simple scenario, I couldn't find a really good example for this, so I'm just making something up.(By this I don't mean that the test case and assertions I'm going to describe in this section are useless; however, in reality one would have probably preferred to write multiple test cases here instead of just one with multiple assertions. I'm doing the latter only because multiple assertions are just what I intend to demonstrate :-).

Have a look at the following test case. We have a small module with a data type declaration here. The specified cursor position is such that the cursor is located in the middle of the identifier. We expect thus that we get a string with the identifier returned.

testData = TestCase $ assertEqual 
    "Data declaration" 
    ( Just "Bli" ) 
    ( findIdentifier "main = print 42\ndata Bli = Bla | Blubb" (2, 7) )

Now we are certainly not interested in testing this for each possible cursor position, but there are two cases that seem to be a good idea to check. One is a cursor that is positioned right before the identifier, and the other is one that is positioned right after it. So in this case, we could do multiple assertions in one test case, thereby re-using the input string.

testData = TestCase $ do
  let code = "main = print 42\ndata Bli = Bla | Blubb"
  assertEqual 
    "Data declaration - on identifier" 
    ( Just "Bli" ) 
    ( findIdentifier code (2, 7) ) 
  assertEqual 
    "Data declaration - before identifier" 
    ( Just "Bli" ) 
    ( findIdentifier code (2, 6) ) 
  assertEqual 
    "Data declaration - after identifier" 
    ( Just "Bli" ) 
    ( findIdentifier code (2, 9) )

Testing for expected errors

In some cases you might want to assert that a certain function call must not succeed, but throw an error. Remember that we have specified, via one of our test cases above, that passing a negative cursor position should result in Nothing? Suppose we wanted instead to make sure that a specific exception is thrown from the findIdentifier function.

There are two things about this that we have to make sure. First, we must catch the exception that we expect so that it does not create a test-failure-by-error. (If tested code throws an error, then the test is considered as failed by HUnit, naturally - it can't know that the error is desired in this case :-). So if we bring our tested code into a situation where it correctly fails, our test case must actually succeed. Conversely (and this is the second point) we must make sure that we get a test failure when the code should break but doesn't. In our example, if we let findIdentifier run with a negative cursor position and there is no error thrown, then our implementation does not behave as it should, and we therefore want to see a test failure.

Here's how we do it:

testNegCursor = TestCase $ do
  handleJust errorCalls (\_ -> return ()) performCall where
    performCall = do
      evaluate ( findIdentifier "a" (-1,-1) )
      assertFailure "Cursor position (-1,-1) must throw an error"

Our call to findIdentifier is forced to be evaluated using the evaluate function from Control.Exception (you have to import Control.Exception for this test case to compile). In the case where an exception occurs, it will be 'handled' by handleJust. The handling itself is trivial - we do nothing, because we just wanted to make sure that the exception occurs. (By the way, the code as it stands here will only handle calls to the error function from the Prelude, as you can tell from the occurrence of errorCalls. Any other exceptions will still get through to HUnit, which will count them as breaking the test.)

On the other hand, if there is no exception during the execution of findIdentifier, then nothing will be handled. In that case, the next statement will be executed. Since we expected an exception, we know then that the test has to fail - so we call assertFailure. This will just cause the test case to fail unconditionally.

Further reading

The HUnit user guide makes a good companion (and describes a few details that I have left out, for instance some overloaded operators that can be used to write assertions in a very compact notation). For more general information about unit testing, junit.org is a good starting point.

As I already mentioned above, there is a second standard testing framework for Haskell: QuickCheck . You'll find more information about QuickCheck on its homepage.

Concluding remarks

I'd like to add a few remarks about my own practice and views here. I have been using test-driven development for years, although not exclusively. I consider it as one coding style (out of many), and I think it is a good skill to be able to work in that style. It generally makes me feel much better about my code if I have developed it that way. True, probably the code quality (maintainability) is better that way, but for me, the psychological effect is perhaps even more important as motivator :-)

But there are situations where a different way of getting code written (or getting old code fixed) are in order - usually when there is not much time before a deadline, or when the code in question is old and its parts strongly entangled, or the code runs only after complicated initialisations are performed. In these situations writing unit tests is more costly than would be acceptable. Making it a condition to have them will either impose unreasonable (for that situation) costs, or it will lead to token tests - tests which create the impression of well-tested code but don't actually do sensible tests (e.g. they cover hundreds of unimportant, but easy to produce, peripheral situations). Unfortunately there is a trend to make a certain test coverage a condition of delivering any developed code - which makes writing test cases a hated, necessary task, and results (unsurprisingly) in bad testing code, adding to the amount of code, but only seemingly increasing the overall quality.

As always, good judgement is a key to writing test cases that are worth the investment. Good judgment, however, is hardly encoded in test coverage percentages or policies that force developers to produce test cases by number. Instead, it makes more sense to view writing test cases as an additional means to communicate something about your program. (Additional, because there should be some communication already be going on in the code itself - the more self-explaining it is, the better.) It can be read, by others who have to deal with your code later, or even by your own later self, as a straightforward explanation how your code is expected to behave in certain situations; it also says something about the sort of situations one should have an eye on.

I found, years ago, a nice formulation somewhere (I've forgotten where): In running a computer program, we are letting a piece of the programmer's past mind do things for us that we would not know how to do. If we extend this to things that we have forgotten how to do, things that are too tedious to be done over an over again and things that are so complicated that we are prone to forget a detail or two of them often enough, then this applies wonderfully to test cases. It is often our own past mind that we employ here; and indeed, isn't this a great re-use that we can make of our past minds?