Command line options in Haskell

Leif Frenzel, March 2007

(This was originally intended as a post to my blog at http://cohatoe.blogspot.com, but I couldn't trick their in-browser-WYSIWIG-editor into formatting the Haskell code as I wanted it. So I put it on a separate page. There is still a post, however, where you can contact me or comment on this entry. Any feedback is welcome :-)

I'm currently writing a small command-line application, and one of the tasks that needs to be done in that context is handling of command line options. So I've learned the basics of handling command line options in Haskell, and I'm logging my results here. Nothing in this post is my own idea, I have just collected what I found when looking around :-)

(If this seems all too basic, you can just scroll down to the last passages; the snippets are cumulative, and the last few contain pretty much all that one needs.)

Getting the program arguments

First of all, how can we get the arguments that were passed to our program? They come as a list of Strings from the standard System.getArgs function, thus

module Main( main ) where

import System( getArgs )

main = do
  args <- getArgs
  print $ show args
gives us an output like:
>main.exe bli bla blubb
"[\"bli\",\"bla\",\"blubb\"]"
(Note that the name of the executable itself is not part of this list - there is a separate function System.getProgName for this, if you need it.)

Recognizing program arguments as options

Now we want to make something out of the program arguments, i.e. recognize them as valid options, such as -V or --version, or even as options that require additional arguments, e.g. --outfile with some file name after it. This is supported by a library called System.Console.GetOpt in the base package of the standard libs. It provides

getOpt :: ArgOrder a -> [OptDescr a] -> [String] -> ([a], [String], [String])
Looks complicated? Let's go through it one-by-one.

The function takes three arguments and returns a triple. The interesting bits are those which the type parameter a stands for - these represent the options as recognized by us. All the strings, on the other hand, keep the stuff that could not be recognized.

By the first argument, of type ArgOrder a, we specify how getOpt should treat the order of our options (basically: should the options be in a specific order or can they be supplied to the program in an arbitrary order?). To keep things simple, just let us pass RequireOrder to indicate that we want our options strictly supplied in order.

As the second argument we have to pass a description of the options we want to be recognized. That's a list of OptDescrs.

The third argument to getOpt, finally, is the list of strings that we received from getArgs, that is, the list of actual program parameters that the user has entered on the command line.

In the returned triple, the first element is the list of options that were actually recognized. We'll look closer at it in a moment; for now, let us just take the length of that list as an indicator of what happens in our program.

Fine, so let us create a skeleton that compiles:

module Main( main ) where

import System( getArgs )
import System.Console.GetOpt

main = do
  args <- getArgs
  let ( flags, nonOpts, msgs ) = getOpt RequireOrder options args
  print $ length flags

-- as a placeholder, we have an empty list of option descriptions here
options :: [OptDescr a]
options = []
When we run this, we get the length of the list of recognized options, that is
>main.exe
0

>main.exe bla
0

>main.exe -X
0
because we don't have any options that can be recognized so far ;-) So let's add one in the next step. The simplest case would be something like the --version option. So we introduce a data declaration and an option description for it:
module Main( main ) where

import System( getArgs )
import System.Console.GetOpt

main = do
  args <- getArgs
  let (flags, nonOpts, msgs) = getOpt RequireOrder options args
  print $ length flags
  
data Flag = Version

options :: [OptDescr Flag] 
options = [ Option ['V'] ["version"] (NoArg Version) "show version number" ]
First of all, note that we have specified [OptDescr Flag] as the type of our options list now. That way, we have made our options handling code more type-safe. We will now get from getOpt a list of Flags, not a 'list of whatever' anymore. We have also introduced a single constructor Version for Flag. In the options list, we have specified the Version option, with its short and long option name, a descriptive text, and with NoArg, which is an argument description that means: 'there is no further argument to this option'. Voila.

>main.exe
0

>main.exe bla
0

>main.exe -V
1

>main.exe --version
1

>main.exe -X
0

Handling invalid input

We should now handle the cases where the user supplies wrong or insufficient options and give a friendly message. In the example above, calling main.exe without options should give a usage message, and calling it with anything unrecognized should give us an error message.

Remember that we have not yet really discussed what all the stuff in the triple returned by getOpt actually is? The time for doing that has come now. Have a look once more at the signature of getOpt:

getOpt :: ArgOrder a -> [OptDescr a] -> [String] -> ([a], [String], [String])

In the returned triple, [a] is the list of recognized options. Since we are actually passing our options list to getOpt, which is of type [OptDescr Flag], this means that we have the actual type Flag substituted for the type variable a here, and therefore we get [Flag] as first element in that triple.

The second element is a list of not recognized strings from the command line arguments. So in our example, when we pass 'bla' to our program, then we should expect a one-element list here as second element.

The third element is a list of error messages. Errors happen for instance if the user enters something that looks like an option, but could not be recognized, such as -X in our example.

module Main( main ) where

import System( getArgs )
import System.Console.GetOpt

main = do
  args <- getArgs
  case getOpt RequireOrder options args of
    (flags, [],      [])     -> print $ length flags
    (_,     nonOpts, [])     -> error $ "unrecognized arguments: " ++ unwords nonOpts
    (_,     _,       msgs)   -> error $ concat msgs ++ usageInfo header options

data Flag = Version

options :: [OptDescr Flag]
options = [ Option ['V'] ["version"] (NoArg Version) "show version number" ]

header = "Usage: main [OPTION...]"

Now we got:

>main.exe
0

>main.exe bli bla blubb
main.exe: unrecognized arguments: bli bla blubb

>main.exe -V
1

>main.exe --version
1

>main.exe -X
main.exe: unrecognized option `-X'
Usage: main [OPTION...]
-V  --version  show version number

More than one option

In a more realistic program, we would have more than just one option, and probably some of them require (and some of them optionally accept) option parameters, typically file names. So let us add two more options to our Flag data type and our options list:

module Main( main ) where

import System( getArgs )
import System.Console.GetOpt
import Data.Maybe( fromMaybe )

main = do
  args <- getArgs
  case getOpt RequireOrder options args of
    (flags, [],      [])     -> print $ length flags
    (_,     nonOpts, [])     -> error $ "unrecognized arguments: " ++ unwords nonOpts
    (_,     _,       msgs)   -> error $ concat msgs ++ usageInfo header options

data Flag = Version | Input String | Output String

options :: [OptDescr Flag]
options = [
    Option ['V'] ["version"] (NoArg Version)            "show version number",
    Option ['i'] ["input"]   (ReqArg Input "FILE")      "some option that requires a file argument",
    Option ['o'] ["output"]  (OptArg makeOutput "FILE") "some option with an optional file argument"
  ]

-- one possibility for handling optional file args:
-- if no file is provided as argument, read from stdin
makeOutput :: Maybe String -> Flag
makeOutput ms = Output ( fromMaybe "stdin" ms )

header = "Usage: main [OPTION...]"

I'm skipping the output of the main.exe for the various inputs this time - you can try it yourself :-)

Towards a higher level

So far we have been dealing with pretty low-level stuff. We still end up with a list of recognized options that we have to process. How are we going to do that? So far we have just printed the length of that list, but clearly we would want to do something more useful. Are we on our own from that point onwards? Or are there some more clever techniques that can help us further?

One suggestion comes from a post by Tomasz Zielonka to the Haskell mailing list (see also this tutorial). What follows is based on his suggestion in that post and the follow-up discussion.

Instead of modeling the known options as in the Flag data type, what we would like to use in the main program is rather some easily accessable, well-typed data structure that holds the actual values, or even some functions or actions based on them. So we first re-work the Flag data type into an Option type with record fields:

data Flag = Version | Input String | Output String
becomes
data Options = Options  {
    optInput  :: IO String,
    optOutput :: String -> IO ()
  }
This will hold the recognized options. It does not contain a field for the -v parameter, btw., because that is not needed - if this parameter is specified, we will not store the command line parameter value at all, but just print a version and quit. (We'll see in a minute where that is done.)

Next, we specify a record of default options:

defaultOptions :: Options
defaultOptions = Options {
    optInput  = getContents,
    optOutput = putStr
  }

This is where it starts to become really cool. So we have specified a data type for our options as a record with two members, one for the -i option, which represents some input file (if specified, we require a file argument), and one for the -o option, which represents an output file. (The example has changed a bit here, in the variant above the parameter for -o was optional, from now it is required). Now in the default options, we have some sensible functionality that is executed by default: reading input from the standard input (that is what getContents does), and writing to standard output (putStr). So for the case where no options are supplied, this holds already the actions that we can use in our program.

What if the user specifies these options on the command line? In this case they are recognized by getOpt, and all we have to do is to supply some actions for that case too:

options :: [OptDescr (Options -> IO Options)]
options = [
    Option ['V'] ["version"] (NoArg showVersion)         "show version number",
    Option ['i'] ["input"]   (ReqArg readInput "FILE")   "some option that requires a file argument",
    Option ['o'] ["output"]  (ReqArg writeOutput "FILE") "some option with an optional file argument"
  ]

showVersion :: Options -> IO Options
showVersion _ = do
  putStrLn "Commandline example 0.1"
  exitWith ExitSuccess

readInput :: Options -> IO Options
readInput arg opt = return opt { optInput = readFile arg }

writeOutput :: Options -> IO Options
writeOutput arg opt = return opt { optOutput = writeFile arg }

Thus if we recognize the -v option, we supply an action that just prints a version and exits (the showVersion function). If an input file name is given, we create an action (readInput) and put it into the options record, where it can be used then in the main program. If no input file is given, that is, if the -i option has not been used by the user, then our options record will fall back the default action, which is, as we have seen above, reading from standard input. For writing to the output, we provide writeOutput in an analogous manner.

The only thing that remains to be done is to tie the two ends together. We have our program to start out with the default options, recognize the actually provided options and apply them to our options record accordingly, and then do something useful with them. Since we have an action associated with each of the options, we can just execute them as their corresponding option is recognized from the command line. Here is the complete example:

module Main( main ) where

import System
import System.Console.GetOpt
import Data.Maybe( fromMaybe )

main = do
  args <- getArgs
  let ( actions, nonOpts, msgs ) = getOpt RequireOrder options args
  opts <- foldl (>>=) (return defaultOptions) actions
  let Options { optInput = input,
                optOutput = output } = opts
  input >>= output

data Options = Options  {
    optInput  :: IO String,
    optOutput :: String -> IO ()
  }

defaultOptions :: Options
defaultOptions = Options {
    optInput  = getContents,
    optOutput = putStr
  }

options :: [OptDescr (Options -> IO Options)]
options = [
    Option ['V'] ["version"] (NoArg showVersion)         "show version number",
    Option ['i'] ["input"]   (ReqArg readInput "FILE")   "input file to read",
    Option ['o'] ["output"]  (ReqArg writeOutput "FILE") "output file to write"
  ]

showVersion _ = do
  putStrLn "Commandline example 0.1"
  exitWith ExitSuccess

readInput arg opt = return opt { optInput = readFile arg }
writeOutput arg opt = return opt { optOutput = writeFile arg }

What happens here is that our record of default options is one-by-one filled in with the actions that work on the actually supplied options. After this, all that has to be done is contained in opts. In this example, then, the input is just streamed into the output. If you run the executable, passing some file names, the input file content is copied into the output file. Of course, that's where your logic has to come in and do something, presumably different :-)