Leif Frenzel, March 2007
(Again, I would have rather posted this to my blog at http://cohatoe.blogspot.com, but that is too inconvenient with all the code listings. There is an associated post, however, where you can contact me or comment on this entry. All feedback is welcome :-)
Test-Driven-Development is a programming technique where you write small programs (test cases) in advance of coding the actual piece of software you want to write. If done well, this helps to clarify the various combinations of input/output/environment for your code, and makes it easier to implement it. As a nice extra, you speed up testing - all you have to do is to run all those test cases again, and they will point out whether everything still works as expected.
So far so good. (And if you're coming from Java development, you know that song by heart anyway ;-). But can we do this in Haskell? Sure. There are actually two different well-supported approaches in Haskell to do test-driven development. One is more oriented towards functional programming, focusing on declaring properties of functions - this one is supported by QuickCheck (and not covered in this post); the other is aligned with the 'classical' JUnit tool. There is an implementation of this approach in the form of HUnit. In this post, I'll give an introduction to unit testing with HUnit.
Basic Haskell programming skills are assumed in this post. All of this can be picked up from one of the many excellent tutorials or textbooks that are around. Familiarity with unit testing, on the other hand, is not assumed. (In fact, if you have done unit testing in another language already, most of this may sound a bit basic to you. In that case, just skim over the code listings to see how it is done in Haskell in particular.)
Imagine the following scenario: we get a String as input, which is taken to contain the code of a Haskell module (it was probably read from a source file). As additional input we get a pair of integers that is interpreted as the 'cursor position' on the source string (that is, the line number and the column number). Now the task is to determine on which identifier (if any) the cursor is positioned, and return that identifier. So we want something like
findIdentifier :: String -> (Int, Int) -> Maybe String
Of course we don't want to just find words - we want to find
identifiers. This means we want to return Nothing if the
cursor is in comments, or inside a string literal, or on a keyword.
A scenario like this suggest a basic skeleton for test cases:
provide some input string (tailored to test a specific sort of input), run the
tested code over it (in this case, our findIdentifier function),
and then inspect the result and make some assertions. Assertions are
statements (not used in the programming-language sense of 'statement') that
express what we expect to see in the result, usually together with a message
that indicates what we expected. The unit test framework will check whether the
statement holds with respect to the actual results, and throw a failure,
including our message, if it doesn't. In addition to failures, errors may
occur in the tested code, and that is of course also a sort of failure. (We
will see later how to test for expected errors.)
Typically test case code will reside in separate modules (often in an own
source tree). For our example, then, let us suppose that the tested code (that
is, the findIdentifier function) is in a module called
FindIdentifier in a file called FindIdentifier.hs,
and we put the test cases into a second module FindIdentifier_Test
in the correspondingly named file.
The first thing we have to do then, is of course to import the tested code,
and also the HUnit API (which is in Test.HUnit).
-- file FindIdentifier.hs
module FindIdentifier( findIdentifier ) where
findIdentifier :: String -> (Int, Int) -> Maybe String
findIdentifier = undefined -- let's leave it undefined for the moment
-- file FindIdentifier_Test.hs
module FindIdentifier_Test where
import FindIdentifier( findIdentifier )
import Test.HUnit
-- continued below
Next, we add a single test case. As I already said, the structure of a test
case is always this: create some input, run the tested code on that input, make
some assertions over the results. For a most simple case, then, let us use an
empty string and the (1, 1) position as input - and let us state that we expect
Nothing.
testEmpty = TestCase $ assertEqual
"Should get Nothing from an empty string" Nothing ( findIdentifier "" (1, 1) )
The TestCase constructor accepts an Assertion
(which is an IO action, and has the type IO ()). The assertion
which we make here is an assertion of equality - we tell HUnit that we expect
the result of ( findIdentifier "" (1, 1) ) to equal
Nothing.
The only thing missing now in our basic test scenario is to run the test
case. In order to do so, we can use HUnit's text test runner,
runTestTT, to execute it in the main function of our testing
module:
main = runTestTT testEmpty
The FindIdentifier_Test module is now complete. You can load it
into an interpreter and execute the main function:
C:\>ghci FindIdentifier_Test.hs
[omitting some ghci output]
Prelude FindIdentifier_Test> main
### Error in: 0
Prelude.undefined
Cases: 1 Tried: 1 Errors: 1 Failures: 0
Prelude FindIdentifier_Test>
I have highlighted the HUnit output in the session log above. (From now on, I will only give the passages that are actual HUnit output and omit the rest.) It tells us that it run one test case and encountered one error - the function that we left undefined of course. Right, so then let us 'implement' it ;-)
-- file FindIdentifier.hs
module FindIdentifier( findIdentifier ) where
findIdentifier :: String -> (Int, Int) -> Maybe String
findIdentifier _ _ = Nothing -- TODO this is not yet the desired functionality
-- file FindIdentifier_Test.hs
module FindIdentifier_Test where
import FindIdentifier( findIdentifier )
import Test.HUnit
testEmpty = TestCase $ assertEqual
"Should get Nothing from an empty string"
Nothing
( findIdentifier "" (1, 1 ) )
main = runTestTT testEmpty
Now if we run this, we get
Cases: 1 Tried: 1 Errors: 0 Failures: 0
which looks quite good.
Some cases that you probably want to include in your test suite are border cases - for instance an empty input string, a negative cursor position and similar things. Let's do that now, and I'll introduce by the way how to label and group test cases.
-- file FindIdentifier_Test.hs
module FindIdentifier_Test where
import FindIdentifier( findIdentifier )
import Test.HUnit
testEmpty = TestCase $ assertEqual
"Should get Nothing from an empty string" Nothing ( findIdentifier "" (1, 1) )
testNegCursor = TestCase $ assertEqual
"Should get Nothing when cursor is negative" Nothing ( findIdentifier "a" (-1, -1) )
testComment = TestCase $ assertEqual
"Should get Nothing on comment" Nothing ( findIdentifier "-- a" (1, 3) )
testMinimal = TestCase $ assertEqual
"Minimal program" (Just "main") ( findIdentifier "main = print 42" (1, 2) )
main = runTestTT $ TestList [testEmpty, testNegCursor, testComment, testMinimal]
You see that we have added some more test cases, and stuffed them into a
test list before throwing them to the test runner. Apart from that detail,
this is not different from what we did before. But note that the last test
in our list fails with our current 'implementation' (which always returns
Nothing) - it should return the identifier "main" now.
user error (HUnit:Minimal program
expected: Just "main"
but got: Nothing)
Cases: 4 Tried: 4 Errors: 1 Failures: 0
Actually, we have now left the field of border cases (such
as empty input strings or cursors that are out of range) and started to test
some simple, but serious cases. Sometimes it makes sense to capture a
difference like this in grouping test cases. Let's make one group for border
test cases and one for simple valid cases. In the code listing below, you
can see how this is done using the TestList constructor.
borderCases = TestList [ testEmpty, testNegCursor, testComment ]
testEmpty = TestCase $ assertEqual
"Should get Nothing from an empty string"
Nothing
( findIdentifier "" (1, 1) )
-- ... omitting the other test cases
simpleCases = TestList [ testMinimal ]
testMinimal = TestCase $ assertEqual
"Minimal program"
( Just "main" )
( findIdentifier "main = print 42" (1, 2) )
main = runTestTT $ TestList [ borderCases, simpleCases ]
TestList is (in addition to TestCase) another
constructor of the Test type. It is a composite test
case that is constructed that way, i.e. one that consists of several other
test cases (those which we have stuffed into the list). But the result is
a Test again, meaning that we can run it with the test runner
exactly the same way as we were running single test cases, and we can also
put such a composite test cases into a TestList again.
If you run these grouped tests, you will find an output like this:
### Error in: 1:0
user error (HUnit:Minimal program
expected: Just "main"
but got: Nothing)
Cases: 4 Tried: 4 Errors: 1 Failures: 0
The grouping is now visible in the source code of the tests, which helps
already when reading it; but we can do better with the test output. We have
seen that we can give test cases a message string, which is helpful in the
test output. Is there a similar way to label test groups? Indeed there is.
HUnit provides a third constructor for Tests that can be used
to attach a label string to any Test (and therefore, to any
TestList. Here's how it works:
borderCases = TestLabel "Border test cases" ( TestList [
testEmpty, testNegCursor, testComment
] )
testEmpty = TestCase $ assertEqual
"Should get Nothing from an empty string"
Nothing
( findIdentifier "" (1, 1) )
-- ... omitting the other test cases
simpleCases = TestLabel "Simple, but serious cases" ( TestList [
testMinimal
] )
testMinimal = TestCase $ assertEqual
"Minimal program"
( Just "main" )
( findIdentifier "main = print 42" (1, 2) )
main = runTestTT $ TestList [ borderCases, simpleCases ]
Now the output reflects our grouping in the source code:
### Error in: 1:Simple, but serious cases:0
user error (HUnit:Minimal program
expected: Just "main"
but got: Nothing)
Cases: 4 Tried: 4 Errors: 1 Failures: 0
To sum up the various ways to construct test cases, group them by test lists,
and label them, here is the complete Test data type from HUnit:
data Test = TestCase Assertion
| TestList [Test]
| TestLabel String Test
FindIdentifierWell, sooner or later we will have to implement the actual code that
makes our test cases happy. I will not describe this directly in this post,
but if you are interested you can
download a bunch of source files with an implementation of the
FindIdentifier module. In what follows, I will assume an
implementation like that. (The download file contains also the final
testing module that contains all the code snippets quoted in this post.)
Each of our test cases so far consisted in just a single assertion. This has not necessarily to be so; you can also put several assertions into a single test case. Typically, you will want to do so if you want to assert several things about a result from a computation. For example, if you are testing a parser, you may want to run it over different inputs all right, and that makes different test cases. But after a parse you may want to assert many things about the result, e.g. that the parsed data tree has a certain size, that some list has a given number of elements, and the first one is equal to a certain value etc.
Another scenario is where you want to execute a sequence of steps during the test case, each of which results in an intermediate result that you want to make assertions about. In that case you would also want to do multiple assertions inside one test case.
In our simple scenario, I couldn't find a really good example for this, so I'm just making something up.(By this I don't mean that the test case and assertions I'm going to describe in this section are useless; however, in reality one would have probably preferred to write multiple test cases here instead of just one with multiple assertions. I'm doing the latter only because multiple assertions are just what I intend to demonstrate :-).
Have a look at the following test case. We have a small module with a data type declaration here. The specified cursor position is such that the cursor is located in the middle of the identifier. We expect thus that we get a string with the identifier returned.
testData = TestCase $ assertEqual
"Data declaration"
( Just "Bli" )
( findIdentifier "main = print 42\ndata Bli = Bla | Blubb" (2, 7) )
Now we are certainly not interested in testing this for each possible cursor position, but there are two cases that seem to be a good idea to check. One is a cursor that is positioned right before the identifier, and the other is one that is positioned right after it. So in this case, we could do multiple assertions in one test case, thereby re-using the input string.
testData = TestCase $ do
let code = "main = print 42\ndata Bli = Bla | Blubb"
assertEqual
"Data declaration - on identifier"
( Just "Bli" )
( findIdentifier code (2, 7) )
assertEqual
"Data declaration - before identifier"
( Just "Bli" )
( findIdentifier code (2, 6) )
assertEqual
"Data declaration - after identifier"
( Just "Bli" )
( findIdentifier code (2, 9) )
In some cases you might want to assert that a certain function call must not
succeed, but throw an error. Remember that we have specified, via one of our
test cases above, that passing a negative cursor position should result in
Nothing? Suppose we wanted instead to make sure that a
specific exception is thrown from the findIdentifier function.
There are two things about this that we have to make sure. First, we must
catch the exception that we expect so that it does not create a
test-failure-by-error. (If tested code throws an error, then the test is
considered as failed by HUnit, naturally - it can't know that the error is
desired in this case :-). So if we bring our tested code into a situation where
it correctly fails, our test case must actually succeed. Conversely (and this
is the second point) we must make sure that we get a test failure when the code
should break but doesn't. In our example, if we let findIdentifier
run with a negative cursor position and there is no error thrown, then our
implementation does not behave as it should, and we therefore want to see a
test failure.
Here's how we do it:
testNegCursor = TestCase $ do
handleJust errorCalls (\_ -> return ()) performCall where
performCall = do
evaluate ( findIdentifier "a" (-1,-1) )
assertFailure "Cursor position (-1,-1) must throw an error"
Our call to findIdentifier is forced to be evaluated using the
evaluate function from
Control.Exception (you have to import Control.Exception for
this test case to compile). In the case where an exception occurs, it will
be 'handled' by handleJust. The handling itself is trivial - we
do nothing, because we just wanted to make sure that the exception occurs. (By
the way, the code as it stands here will only handle calls to the
error function from the Prelude, as you can tell from
the occurrence of errorCalls. Any other exceptions will still get
through to HUnit, which will count them as breaking the test.)
On the other hand, if there is no exception during the execution of
findIdentifier, then nothing will be handled. In that case, the next
statement will be executed. Since we expected an exception, we know then that
the test has to fail - so we call assertFailure. This will just
cause the test case to fail unconditionally.
The HUnit user guide makes a good companion (and describes a few details that I have left out, for instance some overloaded operators that can be used to write assertions in a very compact notation). For more general information about unit testing, junit.org is a good starting point.
As I already mentioned above, there is a second standard testing framework for Haskell: QuickCheck . You'll find more information about QuickCheck on its homepage.
I'd like to add a few remarks about my own practice and views here. I have been using test-driven development for years, although not exclusively. I consider it as one coding style (out of many), and I think it is a good skill to be able to work in that style. It generally makes me feel much better about my code if I have developed it that way. True, probably the code quality (maintainability) is better that way, but for me, the psychological effect is perhaps even more important as motivator :-)
But there are situations where a different way of getting code written (or getting old code fixed) are in order - usually when there is not much time before a deadline, or when the code in question is old and its parts strongly entangled, or the code runs only after complicated initialisations are performed. In these situations writing unit tests is more costly than would be acceptable. Making it a condition to have them will either impose unreasonable (for that situation) costs, or it will lead to token tests - tests which create the impression of well-tested code but don't actually do sensible tests (e.g. they cover hundreds of unimportant, but easy to produce, peripheral situations). Unfortunately there is a trend to make a certain test coverage a condition of delivering any developed code - which makes writing test cases a hated, necessary task, and results (unsurprisingly) in bad testing code, adding to the amount of code, but only seemingly increasing the overall quality.
As always, good judgement is a key to writing test cases that are worth the investment. Good judgment, however, is hardly encoded in test coverage percentages or policies that force developers to produce test cases by number. Instead, it makes more sense to view writing test cases as an additional means to communicate something about your program. (Additional, because there should be some communication already be going on in the code itself - the more self-explaining it is, the better.) It can be read, by others who have to deal with your code later, or even by your own later self, as a straightforward explanation how your code is expected to behave in certain situations; it also says something about the sort of situations one should have an eye on.
I found, years ago, a nice formulation somewhere (I've forgotten where): In running a computer program, we are letting a piece of the programmer's past mind do things for us that we would not know how to do. If we extend this to things that we have forgotten how to do, things that are too tedious to be done over an over again and things that are so complicated that we are prone to forget a detail or two of them often enough, then this applies wonderfully to test cases. It is often our own past mind that we employ here; and indeed, isn't this a great re-use that we can make of our past minds?