↑ Writing ↑

GEONius.com
22-May-2016
 E-mail 

Enhanced Command Line Processing

Published in The C Users Journal, June 1991

[Magazine cover]

Command-line processing doesn't seem like a terribly important subject for an article, but option handling was often done in an ad-hoc manner back then. Several people contacted me by phone to request a copy of getopt.c on floppy disk and the function was used in a COBOL to C translator (Wayback Machine).

The article was probably written in 1989 or 1990. I remember finding a bookstore that had the AT&T System V manuals and studying up, in the store, on the AT&T Command Syntax Standard—an impressive title for such a minimal standard!

I had a companion function, sgetopt(), that parsed the command-line options from a string. Our software ran on both UNIX and VxWorks. From the VxWorks command line, it was simplest to spawn a program and pass it the command-line arguments in a string,  'sp program, "option(s)"' , and then use sgetopt() to parse the options. In 1992, all this was superseded by my opt_util package which handles full-word (possibly abbreviated) command-line options as well as single-letter options, and can also break a string into an argc/argv[] array of arguments.

UNIX command lines, with all their dots and dashes, sometimes approach Morse code in their unreadability; nevertheless, UNIX's concise method of specifying and passing parameters to programs has proven very user-friendly to frequent UNIX users. Unfortunately, the proliferation of UNIX programmers produced a multitude of command line processing styles.

The AT&T Command Syntax Standard and the Standard C Library function, getopt(3)*, attempted to bring some order into this chaos by providing a consistent, easily-programmed, command line structure. Even more generalized and powerful command line processing can be achieved by broadening the command syntax standard and implementing an enhanced, portable version of getopt(3).

(Note: The "3" in getopt(3) indicates the UNIX manual, volume 3 in this case, that documents the getopt function. In this article, getopt(3) refers to the standard UNIX implementation of getopt; getopt() with no number refers to my enhanced version of the same function.)

The new getopt() should be useable on any system, UNIX or non-UNIX, that supports C and the argc/argv program interface. It has been successfully compiled and tested on a Sun workstation (UNIX), an IBM PC RT (AIX), and an IBM PC XT (MS-DOS and Turbo-C).

AT&T Command Syntax Standard

The AT&T Command Syntax Standard specifies that commands will adhere to the following basic format:

             command_name  [options]  [other_arguments]

Command_name identifies the program to be executed to the operating system; options and other_arguments are both optional.

Options are introduced by hyphens ("-") and can be one of two types: single-letter options and argument options. Single-letter options are simply that; for example, our C compiler has options "-g" for debug, "-c" for object module generation, etc.

An argument option is a single-letter option followed by exactly one argument; the argument that follows the option letter is not optional! Our C compiler also uses argument options, e.g., a "-o filename" option to specify an output file name and a "-Ddefinition" option that defines symbols for the C Pre-Processor. The argument may be flush up against the option letter or separated from it by white space (blanks and tabs); the latter is preferred (and mandated by the standard).

Options may be grouped after a single hyphen and their order is not important, so

                      -a -l -e -x -o filename

can be rewritten as "-alex -o filename", "-xeo filename -la", and so forth.

Other_arguments are non-option arguments, i.e., those arguments not prefixed with a hyphen and not associated with an argument option. (If you need to specify arguments beginning with hyphens, the special end-of-options indicator, "--", can be used to separate options from other_arguments.) Examples of non-option arguments include filenames, such as the list of ".o" object modules passed to the C compiler:

    cc -g -DCPU=4004 -o prog  main.o func1.o func2.o func3.o ...

and positional arguments, such as the source and destination files in a copy command:

    cp  source  destination

getopt(3)

The Standard C Library function, getopt(3), provides a simple mechanism for processing command line options. The inputs to getopt(3) are the argc count of command line arguments, the argv array of arguments, and an options string. The options string is composed of all the legal options recognized by a program; a colon (":") following an option letter indicates an argument option that expects an argument. The grouping/ordering example given earlier, "-alex -o filename", would be coded as "aelo:x"; "o" expects a filename and the others are single-letter options.

Each call to getopt(3) returns the next option letter from the command line, the index of the current argument (in global variable optind), and, for argument options, the option's argument (in global variable optarg). '?' is the option letter returned in the case of an illegal option; -1 is returned when there are no more options.

Programs that follow the AT&T Command Syntax Standard typically scan the command line options using getopt(3) and then manually increment optind through the non-option arguments that remain:

    while ((option = getopt (argc, argv, optstring)) != -1) {
        switch (option) {
        case 'a': ... ;  break ;		/* Options */
        case 'b': ... ;  break ;
        ...
        case '?': ... error ...
        default:  break ;
        }
    }
    while (optind < argc) {			/* Non-option arguments */
        ... process argv[optind++] ...
    }

Command Syntax Standard II

While the manual pages for getopt(3) don't explicitly reference the AT&T Command Syntax Standard, the usage notes constrain the programmer to complying with the standard and limit the utility of the function. The restrictions imposed by getopt(3)'s major shortcoming, the inability to manipulate global variable optind, point up the need for relaxing the command standard and writing a new, enhanced getopt().

The effect of modifying optind in between calls to getopt(3) is implementation-dependent and such changes are strongly discouraged. Consequently, the programmer has little freedom in altering getopt(3)'s argument scan. In particular, the user cannot alternate options and non-option arguments in a command and a program cannot make multiple passes over its arguments when processing a command line.

Allowing the user to mix options and arguments on the command line adds to the user-friendliness of a program. In the simplest case, it lets you easily recall a previous command and append a forgotten option. I frequently type in an option-laden command line to compile a program,

    cc -g -I/usr/alex/prog/include -o prog  prog.c

only to realize that I forgot a "-I" option for the project's include directory. Fortunately, our C compiler never heard of AT&T's Command Syntax Standard, so I don't need to retype the whole line. In UNIX,

    !! -I/usr/proj/include

will recall the previous command, append the missing include option, and resubmit the command to the operating system.

A more complex situation occurs with the UNIX ipcrm(1) command. ipcrm(1) has three options for deleting interprocess communication (IPC) resources: "-m id" for shared memories, "-q id" for message queues, and "-s id" for semaphores. Thanks to its strict adherence to the Command Syntax Standard, ipcrm(1) requires a lot of awkward typing to clean up a trail of IPC objects:

    ipcrm -m 200 -m 205 -m 307 -q 201 -q 11 -s 430 -s 431 -s 432

An improved ipcrm(1) would also have three options: "-m", "-q", and "-s". Each option specifies the type of the zero or more IPC identifiers that follow, resulting in a concise, easier-to-type command line:

    ipcrm -m 200 205 307 -q 201 11 -s 430 431 432

Applications that find it necessary to scan their command lines more than once are probably few and far between. One example that comes to mind is that of a print job spooler. A novelist, faced with the task of printing out 10 copies of his/her 30-chapter, 30-file book, would waste little time choosing between: (i) a print command that scanned the command line once, making 10 copies of each individual file it encountered, and (ii) a print command that scanned its command line 10 times to print out 10 collated copies of the entire set of files.

Enhanced getopt()

Writing a new and improved version of getopt(3) reinforced two basic lessons of programming. First, a given function is usually less trivial than it first appears to be. Coding up the enhanced getopt() was not difficult, but close attention had to be paid to detail. Second, hindsight is an unflagging source of "better" ideas. Suggestions for improvement kept cropping up while programming getopt(), but things always look easier the second time around. Furthermore, compatibility considerations limited the extent of any changes.

The new getopt() is fully-compatible with the original getopt(3), so no changes to existing software are required. New applications can take advantage of the getopt.h header file, which provides the external definitions for getopt() and its global variables. Also defined is a constant, NONOPT, for the non-option or end-of-options flag; this value is hardcoded in getopt(3) as -1.

The inputs to getopt() are identical to those of getopt(3); changes to global variable optind, however, will advance or reset the argument scan. The outputs of getopt() are functionally equivalent to those of getopt(3), although global variable optarg has acquired some additional meanings in certain cases. Table 1 compares the outputs of getopt(3) and getopt(). Optind, not shown, indexes the current command line argument in both versions of getopt.

                     Table 1: Outputs of getopt(3) and getopt()

            getopt(3)         getopt()
         option  optarg    option  optarg    Interpretation

         letter            letter   NULL     Single-letter option
         letter  string    letter  string    Option plus its argument
          '?'               '?'    error     Illegal option/missing argument
          -1               NONOPT  string    Non-option argument
          -1               NONOPT   NULL     Command line scan completed

In the case of the question mark ('?') option, getopt()'s optarg returns the trailing portion of the command line argument that contains the offending option. For example, if illegal option 'Q' is detected in "prog -abQcde", getopt() returns '?' with optarg set to "Qcde".

The traditional getopt(3) approach to command line processing handled options and non-options in separate sections of code. The new method of command line processing uses the enhanced getopt() to scan the entire command line:

    while ((option = getopt (argc, argv, optstring)) != NONOPT) ||
           (optarg != NULL)) {
        switch (option) {
        case 'a': ...
        case 'b': ...
        ...
        case '?': ... error ...
        case NONOPT: ... process optarg ...
        default:  break ;
        }
    }

Thanks to the expanded role assumed by optarg, options and non-options alike can be processed within a single loop. A new NONOPT case in the switch statement picks up the non-option arguments returned in optarg. As an added bonus, optind is now only a vestige of the former getopt(3); unless you're doing multi-pass command line processing, optind can be dispensed with.

Example Usage

Using getopt() is fairly simple the first time and extremely simple afterwards - just cut and paste your original "template" and delete or add the appropriate options. To start you off, Listing 5 contains the command line processing code from ffc (Format File in Columns), a program that outputs one or more files in multiple columns. ffc is invoked as follows:

    ffc  [-c num] [-d] [-h num] [-l num]
         [-o output_file] [-p]  [input_file(s)]

The meanings of the options are explained in the prolog in Listing 5.

Options "-c", "-h", and "-l" each take a numeric argument; library function atoi(3) performs the text-to-integer conversion on the option argument returned in optarg. The "-o" option expects a file name; the character string pointer returned in optarg is simply saved in a local variable. "-d" and "-p" are switches that set boolean flags in the program. The non-option arguments are the input files; each file name encountered is added to the list of files to be processed. Note that the various page parameters, the flags, and the file table have to be properly initialized (e.g., in the variable declarations) before the command line is scanned.

End-of-Options

Computer users and programmers alike should be indebted to the UNIX designers for developing an easy-to-use and easy-to- program command line interface. The enhancements suggested in this article detract in no way from the simplicity and power of the original Command Syntax Standard and getopt(3). They do, however, provide a portable, well-defined, command line processing function and, if you choose to use it, a more user-oriented command line syntax.

 

Function getopt()

/**************************************************************************

    getopt ()


    Function GETOPT gets the next option letter from the command line.
    GETOPT is an enhanced version of the C Library function, GETOPT(3).


    Invocation:

        option = getopt (argc, argv, optstring) ;

    where

        <argc>
            is the number of arguments in the argument value array.
        <argv>
            is the argument value array, i.e., an array of pointers to
            the "words" extracted from the command line.
        <optstring>
            is the set of recognized options.  Each character in the
            string is a legal option; any other character encountered
            as an option in the command line is an illegal option and
            an error message is displayed.  If a character is followed
            by a colon in OPTSTRING, the option expects an argument.
        <option>
            returns the next option letter from the command line.  If
            the option expects an argument, OPTARG is set to point to
            the argument.  '?' is returned in the cases of an illegal
            option letter or a missing option argument.  Constant NONOPT
            is returned if a non-option argument is encountered or the
            command line scan is completed (also see OPTARG below for
            both cases).


    Public Variables:

        OPTARG - returns the text of an option's argument or of a
            non-option argument.  NULL is returned if an option
            has no argument or if the command line scan is complete.
            For illegal options or missing option arguments, OPTARG
            returns a pointer to the trailing portion of the defective
            ARGV.

        OPTERR - controls whether or not GETOPT prints out an
            error message upon detecting an illegal option or
            a missing option argument.  A non-zero value enables
            error messages; zero disables them.

        OPTIND - is the index in ARGV of the command line argument
            that GETOPT will examine next.  GETOPT recognizes changes
            to this variable.  Arguments can be skipped by incrementing
            OPTIND outside of GETOPT and the command line scan can be
            restarted by resetting OPTIND to either 0 or 1.

**************************************************************************/


#include  <stdio.h>                     /* Standard I/O definitions. */
#define  USE_INDEX  0                   /* Set to 1 if your C Library uses
                                           "index" instead of "strchr". */
#if  USE_INDEX
#    include  <strings.h>               /* C Library string functions. */
#    define  strchr  index
#else
#    include  <string.h>                /* C Library string functions. */
#endif
#include  "getopt.h"                    /* GETOPT(3) definitions. */

                                        /* Public variables. */
char  *optarg = NULL ;
int  opterr = -1 ;
int  optind = 0 ;
                                        /* Private variables. */
static  int  end_optind = 0 ;
static  int  last_optind = 0 ;
static  int  offset_in_group = 1 ;



int  getopt (argc, argv, optstring)

    int  argc ;
    char  **argv ;
    char  *optstring ;

{    /* Local variables. */

    char  *group, option, *s ;



/* Did the caller restart or advance the scan by modifying OPTIND? */

    if (optind <= 0) {
        end_optind = 0 ;  last_optind = 0 ;  optind = 1 ;
    }
    if (optind != last_optind)  offset_in_group = 1 ;


/**************************************************************************

    Scan the command line and return the next option or, if none, the
    next non-option argument.  At the start of each loop iteration,
    OPTIND is the index of the command line argument currently under
    examination and OFFSET_IN_GROUP is the offset within the current
    ARGV string of the next option (i.e., to be examined in this
    iteration).

**************************************************************************/


    for (option = ' ', optarg = NULL ;
         optind < argc ;
         optind++, offset_in_group = 1, option = ' ') {

        group = argv[optind] ;

/* Is this a non-option argument?  If it is and it's the same one
   GETOPT returned on the last call, then loop and try the next
   command line argument.  If it's a new, non-option argument,
   then return the argument to the calling routine. */

        if ((group[0] != '-') ||
            ((end_optind > 0) && (optind > end_optind))) {
            if (optind == last_optind)  continue ;
            optarg = group ;        /* Return NONOPT and argument. */
            break ;
        }

/* Are we at the end of the current options group?  If so, loop and
   try the next command line argument. */

        if (offset_in_group >= strlen (group))  continue ;

/* If the current option is the end-of-options indicator, remember
   its position and move on to the next command line argument. */

        option = group[offset_in_group++] ;
        if (option == '-') {
            end_optind = optind ;   /* Mark end-of-options position. */
            continue ;
        }

/* If the current option is an illegal option, print an error message
   and return '?' to the calling routine. */

        s = strchr (optstring, option) ;
        if (s == NULL) {
            if (opterr)
                (void) fprintf (stderr, "%s: illegal option -- %c\n",
                                argv[0], option) ;
            option = '?' ;  optarg = &group[offset_in_group-1] ;
            break ;
        }

/* Does the option expect an argument?  If yes, return the option and
   its argument to the calling routine.  The option's argument may be
   flush up against the option (i.e., the argument is the remainder of
   the current ARGV) or it may be separated from the option by white
   space (i.e., the argument is the whole of the next ARGV). */

        if (*++s == ':') {
            if (offset_in_group < strlen (group)) {
                optarg = &group[offset_in_group] ;
                offset_in_group = strlen (group) ;
            }
            else {
                if ((++optind < argc) && (*argv[optind] != '-')) {
                    optarg = argv[optind] ;
                } else {
                    if (opterr)
                        (void) fprintf (stderr,
                              "%s: option requires an argument -- %c\n",
                                        argv[0], option) ;
                    option = '?' ;  optarg = &group[offset_in_group-1] ;
                    offset_in_group = 1 ;
                }
            }
            break ;
        }

/* It must be a single-letter option without an argument. */

        break ;

    }


/* Return the option and (optionally) its argument. */

    last_optind = optind ;
    return ((option == ' ') ? NONOPT : (int) option) ;

}
 

Function getopt.h

#ifndef getopt_h_DEFINED
#define getopt_h_DEFINED


/**************************************************************************
    This INCLUDE file contains the external definitions for the GETOPT(3)
    function and its global variables.
**************************************************************************/


extern  int  getopt () ;        /* Function to get command line options. */

extern  char  *optarg ;         /* Set by GETOPT for options expecting
                                   arguments. */
extern  int  optind ;           /* Set by GETOPT: index of next ARGV to
                                   be processed. */
extern  int  opterr ;           /* Disable (== 0) or enable (!= 0) error
                                   messages written to standard error. */

#define  NONOPT  (-1)           /* Non-Option - returned by GETOPT when
                                   it encounters a non-option argument. */

#endif getopt_h_DEFINED
 

Command Line Processing Example

/**************************************************************************

    ffc.c

    Format File in Columns.

    Invocation:

        % ffc [-c num] [-d] [-h num] [-l num]
              [-o output_file] [-p]  [input_file(s)]

    where

        "-c num"
            specifies the number of columns (default = 2).
        "-d"
            turns debug output on.
        "-h num"
            specifies the number of blank lines at the top of each
            page (default = 0).
        "-l num"
            specifies the number of lines per page (default = 66).
        "-o output_file"
            specifies the name of the output file (default =
            standard output).
        "-p"
            invokes page numbering on output.
        input_file(s)
            are the files to be read and formatted in columns on
            output (default = standard input).


    Compilation:

        The program should be compiled and linked with the
        GETOPT function:

            % cc ffc.c getopt.c -o ffc

**************************************************************************/


#include  <stdio.h>                     /* Standard I/O definitions. */
#include  "getopt.h"                    /* GETOPT(3) definitions. */

                                        /* List of input file names. */
#define  MAX_FILES  1024
static  char  *file_table[MAX_FILES] ;
static  int  num_input_files = 0 ;

                                        /* Page dimensions, etc. */
static  int  debug = 0 ;                /* 0 = no, -1 = yes. */
static  int  num_columns = 2 ;
static  int  num_header_lines = 0 ;
static  int  page_length = 66 ;
static  int  page_numbering = 0 ;       /* 0 = no, -1 = yes. */



main (argc, argv)

    int  argc ;
    char  *argv[] ;

{  /* Local variables. */

    char  *output_file = NULL ;
    int  errflg, option ;



/* Scan the command line arguments. */

    errflg = 0 ;

    while (((option = getopt (argc, argv, "c:dh:l:o:p")) != NONOPT) ||
           (optarg != NULL)) {

        switch (option) {
        case 'c':  num_columns = atoi (optarg) ;  break ;
        case 'd':  debug = -1 ;  break ;
        case 'h':  num_header_lines = atoi (optarg) ;  break ;
        case 'l':  page_length = atoi (optarg) ;  break ;
        case 'o':  output_file = optarg ;  break ;
        case 'p':  page_numbering = -1 ;  break ;
        case '?':  errflg++ ;  break ;
        case NONOPT:
            if (num_input_files < MAX_FILES) {
                file_table[num_input_files++] = optarg ;
            }
            break ;
        default :  break ;
        }

    }


/* If an invalid option was detected, print out a command usage message. */

    if (errflg) {
        fprintf (stderr, "Usage:  ffc  [-c num] [-d] [-h num]\n") ;
        fprintf (stderr, "             [-l num] [-o output_file]\n") ;
        fprintf (stderr, "             [-p]  [input_file(s)]\n") ;
        exit (-1) ;
    }


/* Print out the files in multiple columns. */

    ... the remainder of the program ...

}

©1991  /  Charles A. Measday  /  E-mail