Phil's C Course


Version 1.1 © Phil Ottewell 1995,1998


These notes formed part of an internal course on the C programming which I was asked to give to my colleagues at Yezerski Roper Ltd.

§0 Aims of this Course

This course is intended to help a good programmer (pause for mass exodus), particularly someone familiar with DEC Fortran, start programming in C. There are a number of program examples, copies of which you can download as phils_c_examples.zip. A detached PGP signature file (which it is not necessary to download unless you know what it's for) is provided so you can be sure the archive has not been altered. When you unzip the archive on a Windows machine, note the workspace file phils_c_examples.dsw. You should open this with Microsoft Visual Studio 98 and Microsoft Visual C++ 6.0, then "batch build" everything. VMS users should use unzip -a phils_c_examples.zip to get the correct file attributes for the text files, like the .c source files and MAKE.COM, the VMS command file which you can use to build the programs.

There are also several programming challenges. Have a go at these, nicking as much code as you can from the examples ! Using C is the best way to learn it, and making mistakes is definitely the best way to find out how it really works. I mention the ANSI C standard, ANSI/ISO 9899-1990, a lot in this document. Always try to adhere to the standard; experience has shown that it pays off in the long term. Some of the points I make are stylistic. However, many of these suggestions are made for one of two reasons; either the majority of the C programming world has reached consensus that the style is good (which will make it easier for you to read and learn from other peoples' code) or I have found that you can avoid errors by doing things in a particular way. I reckon that you can "learn" C in about an hour, then spend the next year wishing you hadn't done things in a particular way the hour after that. This course should help you avoid some of the pitfalls that are so easy to fall into (and, in fact, dig for yourself) because of the total control, power, and 0 to 60 ACCVIOS in under 10 seconds that C can deliver to the programmer.


#define SAYS =
char * Clarkson SAYS "It's sexy enough to snap knicker elastic at 50 paces";

§1 Why is the language called C ?

C was developed under Unix on the PDP-11 in 1972 by Dennis Ritchie, building on the language B, written by Ken Thompson in 1970 on a PDP-7, also running Unix. B was in turn, based on BCPL which was developed in 1967 by Martin Richards (and which is still available for the BBC Micro).

BCPL and B were typeless languages - variables were all multiples of byte or word sized bits of memory. C is more strongly typed than B, BCPL or Fortran. Its basic types are char, int, float and double, which are characters, integers and single and double precision floating point numbers. An important addition, compared to Fortran, is the pointer type, which points to the other types (including other pointers). All these types can be combined in structures or unions to provide composite types.

The main shock to Fortran programmers is the fact that C has no built-in string type, and consequently you have to make a function call to compare two strings, or assign one string to another. Luckily, the ANSI standard describes a set of string manipulation routines that MUST be present if an implementation is described as ANSI C. Similarly, a good set of standard IO, time manipulation and even sorting routines exist. HELP CC RUN-TIME_FUNCTIONS will give you information on all of these, and even tell you which header files you should include to use them. For example HELP CC RUN PRINTF will inform you that you need the header file stdio.h .

In the early days of C, different compiler vendors all had their own flavours of C, usually based on the book, The C Programming Language, by Brian Kernighan and Dennis Ritchie. These older compilers are often referred to as "Classic C" or "K&R C". As C gained in popularity, the need to standardize certain features became apparent, and in 1983 the American National Standards Institute established the X3J11 technical committee, which published the ANSI C standard in 1988.

If you only buy one book on C, get the second edition of the K&R book. If you want to buy two books add Expert C Programming: Deep C Secrets by Peter van der Linden. If you really want to be a language lawyer and contribute to threads like "Is i = i++ + --i legal ?" in the comp.lang.c newsgroup, then get "The Annotated ANSI C Standard", annotated by Herbert Schildt. Personally I think a line of code like i = i++ + --i should be taken out and shot.

The DEC C compiler is a good ANSI compiler, and any code you write should pass through this compiler (with its default qualifiers) without so much as an informational murmer. If it doesn't you are storing up big trouble and intermittent bugs for the future. Even if you decide to do nonstandard things, there are techniques to do them in a standard way (!), which will be explained later.

OK, enough waffle. Let's look at a "Hello World" program in C.


/*---- Hello World C Example ("hello.c") -------------------------------------*/

/* ANSI C Headers */
#include <stdio.h>
#include <stdlib.h>

/* Main Program starts here */
int main( int argc, char *argv[] )
{
    int i;
/*  End of declarations ... */

    for ( i = 0; i < 10; i++ ) {
      printf("%d Hello World !\n",i);
    }

    exit(EXIT_SUCCESS);
}

As you have probably gathered, comments in C are delimited by /* and */, and comments must NOT be nested, or you will get some very interesting bugs. The perceived need for nested comments is usually for commenting out (say) a debug piece of code, and this can be done in a better way, which will be explained later. Some C compilers let you use the trailing C++ style comments //, which are a like a trailing ! in DEC Fortran. NEVER USE THESE IN C PROGRAMS. It is not ANSI standard, and immediately confuses people as to whether they are looking a C or C++ code (and some meanings can subtly change).

To compile this program under DEC C (both Alpha's and VAX should be using DEC C now. VAX C was retired around 1993, and you really should switch to DEC C for both platforms now)

$ CC HELLO
$ LINK HELLO

Alternatively, you can use the MAKE.COM DCL command file, as shown below. On Alphas the resulting executable will have file type .EXE_ALPHA, and on VAX machines it will be .EXE.

$ @MAKE HELLO
  DEV$DISK:[PHIL.PHILS_C_EXAMPLES]
CC/PREFIX=ALL HELLO.C  -> HELLO.OBJ_ALPHA
LINK HELLO  -> HELLO.EXE_ALPHA
Exiting

If you must use VAX C (and you mustn't :-) the link step will whinge about unresolved symbols, so change the line to

$ LINK HELLO, VAXCRTL/OPT

where the VAXCRTL.OPT options file contains the line

SYS$SHARE:VAXCRTL/SHARE

You are now ready to RUN HELLO . Not too many surprises there. Note the form of the code. The main entry point in a standard C is always called main, though you can override this on VMS platforms, as we will discover. The main program in C is declared as int main(some funny stuff). This is because the main program should always return a value (usually to DCL or the Unix shell) indicating how things went. This is done by the call to exit(EXIT_SUCCESS). There are two ANSI standard return codes, EXIT_SUCCESS and EXIT_FAILURE, both defined in <stdlib.h> . Always use these values, and don't do what a lot of Unix programmers do which is exit(0) or some other magic number just because "everybody knows that exit(0) means success". You can return VMS condition codes, e.g. exit(SS$_NORMAL), but this should be avoided unless really necessary, and even then there are ways to fall back to the standard return codes if your code is compiled on a non-VMS machine.

The (some funny stuff) is the argument list, or the "formal parameters" of function main. Imagine main as a function called from your command shell (DCL on VMS, or the DOS window on Windows NT). The declaration int main( int argc, char *argv[] ) means that main is a function returning an integer, which takes two arguments. The first is an integer, and is the number of arguments passed to main by DCL, and the second is a pointer to arrays of characters. The latter are, in fact, any command line arguments, as will be demonstrated in args.c, a demo programming coming soon to a disk near you. The body of a function is delimited by { and }. Because C is largely a free format language, the whole function can be on one line if you really want, but that tends to be unreadable and confusing. I like to start the function with a { in column one, just after the function declaration (which I can then nick for prototyping), and end the function with a } in the same column.

Notice how each statement ends with a semicolon. The ";" is known as a statement terminator. It is also a "sequence point", as are the comma operator, and various other logical comparison operators, and the standard guarantees that side effects of expressions will be over once a sequence point is reached. This basically means that all the things you made happen in one statement will have happened by the time you start on the next statement or expression.

The printf(...) statement is a call to a routine defined in <stdio.h>, and enables formatted output to the stdout stream. In C, three default output streams are defined. These are stdin, stdout and stderr, and they correspond to SYS$INPUT, SYS$OUTPUT and SYS$ERROR under VMS. The first argument is a format string containing conversions characters, each preceded by the % sign (use %% if you actually want a % sign), which tell the routine how to interpret the variable number of arguments to be printed. In this case the integer i is to be printed in decimal, so "%d" is used. There are corresponding functions, sprintf to write directly into a character string array, and fprintf to write to a file. Similar formatted input routines, sscanf and fscanf are also available. The table below, nicked off the network, summarizes the conversion characters:


                    Clive Feather's Excellent Table:
 
     Types of arguments for the various fprintf and fscanf conversions
      
         Conversion        fprintf          fscanf
         ----------------------------------------------------
         d  i              int              int *
         o  u  x  X        unsigned int     unsigned int *
         hd hi             int              short *
         ho hu hx hX       [see note 1]     unsigned short *
         ld li             long             long *
         lo lu lx lX       unsigned long    unsigned long *
         e  E  f  g  G     double           float *
         le lE lf lg lG    [invalid]        double *
         Le LE Lf Lg LG    long double      long double *
         c                 int              [see note 2]
         s                 [see note 2]     [see note 2]
         p                 void *           void **
         n                 int *            int *
         hn                short *          short *
         ln                long *           long *
         [                 [invalid]        [see note 2]
 
     Note 1: the type that (unsigned short) is promoted to by the integral
             promotions. This is (int) if USHORT_MAX <= INT_MAX, and
             (unsigned int) otherwise.
     Note 2: any of (char *), (signed char *), or (unsigned char *).

Don't worry about the "*"s for now. They can be read as "pointer to thing named before them", so int * means pointer to int. Similar summaries can be found in K &R II pages 154 and 158.

      Programming Challenge 1
      _______________________
      
        Have a  go at adapting "hello.c"  to print out the value of i in
      hexadecimal.  Fiddle about with the format string -  remove the "\n"
      for example, and see what happens to your output.

Unlike Fortran, whitespace is significant in C, and there are reserved keywords. These reserved keywords should not appear as any type of identifier, even a structure member (YRL people - don't forget LID files). The list below shows both C and C++ reserved keywords.


    asm1         continue    float       new1         signed      try1 
    auto        default     for         operator1    sizeof      typedef
    break       delete1      friend1      private1     static      union
    case        do          goto        protected1   struct      unsigned
    catch1       double      if          public1      switch      virtual1 
    char        else        inline1      register    template1    void
    class1       enum        int         return      this1        volatile
    const       extern      long        short       throw1       while

The items marked like this1 are C++, not C keywords, but it makes sense to avoid both. Avoid using a language name like Fortran too.

§2 Variables, Types and Functions

Variables are like little boxes with numbers on them, a bit like houses, and inside the boxes ... naaahh ! Just kidding. You all know what variables are, and I think we all understand that x = x + 1 isn't a contradictory algebraic statement.

C, unlike Fortran, has case sensitive variable and other identifier names. Therefore the variable NextPage is completely different to nextpage. The same is true for functions. Some people like to use the capitalized-first-letter form of naming, others prefer underbars, e.g. GetNextPage() or get_next_page() . Many professional library packages tend towards TheCapitalizedFormat. Some people like Microsoft's Hungarian Notation which involves prefixing variable names with their type, e.g. uiCount for an unsigned int counter variable . It all depends how good you are with the Shift key :-) Whatever method you choose, try and be clear and consistent.

In C, local variable definitions can be at the start of any {block}, and aren't restricted to the top of the module as in Fortran. Be careful if you take advantage of this feature, because you may run into scoping problems where the innermost variable definition hides an outer one. If you are used to C++, remember that the variable definitions can only be at the start of the {block} before the first statement (e.g. expression, function call or flow control statement). If you try and intersperse definitions, C++ style, the C compiler will issue some sort of "bad statement" warning.

Variables declared at the beginning of the {function body} are local to the function, variables declared at the "top" of the file (or compilation unit to be pedantic), in the header files, or outside any function bodies, are global to the compilation unit (and are externally visible symbols, unless declared as static). More will be said about this later. For now, suffice it to say that you should avoid using global variables wherever possible.

A brief example will illustrate the scope of variables:


/*---- Variable Scope Example ( "scope.c" ) ----------------------------------*/
/* ANSI C Headers */
#include <stdio.h>
#include <stdlib.h>

/* Global variables, visible externally too (i.e. to things linked  */
/* against this) Generally they should be avoided as far as */
/* possible, because it can be very difficult to discover which */
/* routine changes their value, and they introduce "hidden" dependencies */
int some_counter;
double double_result;

/* Function prototypes */
void set_double_result(void);

/* Main Program starts here */
int main( int argc, char *argv[] )
{
    int j;
    int i_am_local; /* .. to main */
/*  End of declarations ... */

    i_am_local = 1;
    printf("i_am_local = %d (in main)\n\n", i_am_local );
    
    for ( j = 0; j < 10; j++ ) {
      int i_am_local; /* .. to this loop - Not necessarily a good idea */
                      /* because it can cause confusion as to which */
                      /* variable we actually want to access */
      i_am_local = j;
      printf("i_am_local = %d (inside loop)\n", i_am_local );
    }

    printf("\ni_am_local = %d (in main)\n\n", i_am_local );

    /* Now let's look at the default initialization values of the globals */
    printf("nsome_counter = %d (in main)\n", some_counter);
    printf("double_result = %f (in main)\n\n", double_result);

    /* Call a function that changes the global variables .. */
    set_double_result(); 

    /* .. and look at them again */
    printf("some_counter = %d (in main)\n", some_counter);
    printf("double_result = %f (in main)\n", double_result);

    exit(EXIT_SUCCESS);
}

void set_double_result(void)
{
    ++some_counter;
    double_result = 3.141;
    printf("some_counter = %d (in set_double_result)\n", some_counter);
    printf("double_result = %f (in set_double_result)\n\n", double_result);
}

The basic types in C are:


     char - this defines a byte, which must be able to hold one character
            in the local character set (normally, but not necessarily 8 bits);
      int - holds an integer, usually in the machine's natural size.
            They are 32 bits on both VAX and Alpha.
    float - holds a single precision floating point number.
            They are 32 bits on both VAX and Alpha.
   double - Double precision floating point number, 64 bits on the VAX
            and Alpha.

These bit sizes are just to give you an idea. They should not be relied on, and you should code independently of them, unless you are addressing hardware registers or some equally hardware-specific task.

Some of these basic types can be modified with various qualifiers:


     char - can be signed or unsigned;
      int - can be long or short, signed or unsigned;
   double - can be long, for (possibly) even more precision;

The long modifier normally gives larger integers, but the compiler vendor is free to ignore it provided that


             short <= int <= long
             16 bits <= short/int 
             32 bits <= long

Assignment to variables of these basic types is fairly intuitive, and can be done in the definition, rather like using the DATA statement in Fortran or the DEC Fortran extension


/*  C Example */                  |*     DEC  Fortran Example
                                  |
    int x, y; /* Not initialized*/|      INTEGER     X, Y
    int counter = 0;              |      INTEGER     COUNTER /0/
    float total = 0.0;            |      REAL        TOTAL   /0.0/
    char c = 'A';                 |      CHARACTER*1 C       /'A'/

Note that C uses single quote ' for character constants. The double quotes are used for strings. There are escape sequences for getting nonprintable characters. These are listed on page 38 of K&R II. A few useful ones are '\n' to get a new line (C doesn't automatically add line feeds when you use printf() ), '\a' to get a bell (alert) sound, and '\0' to get the null character (which is NOT THE SAME as the NULL pointer) used to terminate strings (arrays of characters). The initialization of non-static (discussed later) int and float variables is necessary before use. It doesn't have to be done in the definition, but you can't rely on their value being anything sensible, so whilst the initialization of COUNTER and TOTAL is redundant in the Fortran example (assuming non-recursive compilation), you do need to initialize the variables before use in C.


      .
{
    int i;
    int j;
      .
    i = 0; /* i's value could be anything up to this point */
      .
    j = i*OFFSET; /* j's value could be anything up to this point */
      .
}

Global variables are guaranteed to be initialized to 0 (or 0.0 if floating type) but you can override this by specifying an initial value.

Similar rules apply to float, double, and long double. There are two standard header files, <limits.h> and <float.h> which tell you the maximum and minimum values that can be stored in a particular type; for example INT_MAX is 2147483647, and FLT_MAX is 1.7014117e+38 on the VAX.

The signed or unsigned modifiers are fairly self explanatory. The default for int is signed, so it is rarely specified. Signed integer arithmetic is usually done in Two's Complement form, but this need not be the case. Characters can be signed or unsigned by default - it is implementation defined. I find it best just to use char with no qualifiers, and let the compiler do what it will.

This is probably a good point to introduce the sizeof(thing) operator. It is an operator, not a function, and is evaluated at compile time. It returns the size of the argument, where the size of char is defined to be 1. To be pedantic, it returns an unsigned integer type, size_t, defined in <stddef.h>, but is not often used in a way that requires a size_t declaration. Here are some examples of its use (this is a "programming fragment" not a complete program).


    size_t s;
    int fred;        /* Integer */
    char bob;        /* Character */
    char *c_ptr;     /* Pointer to character */
    char bloggs[6];  /* Array of 6 characters */
         .
    s = sizeof( fred );
    s = sizeof( bob );
    s = sizeof( c_ptr );
    s = sizeof( long double ); /* Allowed to use types instead of variables */
         .
/*  Safe string copy, checks size of destination and allows for terminating */
/*  null character (not to be confused with the NULL pointer discussed later) */
    strncpy( bloggs, "Bloggs", sizeof(bloggs)-1 ); 
         .

You can leave the brackets off after sizeof, e.g. sizeof int is quite legal, but I think that the bracketed form is clearer.

      Programming Challenge 2
      _______________________
      
        Have a  go at adapting "hello.c"  to print  out  the size  of some
      commonly  used types, e.g. int,  short int, long  int, float, double
      and  so  on.  Try  some arithmetic to familiarize yourself  with the
      basic  operators,  +, -,  *,  /,  and  one  that  doesn't appear  in
      Fortran,  the modulus operator, %,  which acts on  integer types  to
      yield the  remainder after division.  Use this  to determine whether
      the year 2000 is a leap year.  The rule is that it is a leap year if
      the year is divisible by 4, except if it is a multiple of 100 years,
      unless it is also divisible by 400.

In addition to the integer and floating point types, there is a type called void. The meaning of void changes according to context ! If you declare a function returning void, you mean that it returns no value, like a Fortran subroutine. A void in the argument list means that the function takes no arguments (you can have a void function that does take arguments by declaring arguments in the usual way, and you can have a function that does return a value but takes no arguments). Below is an example of a Fortran subroutine and C function:


/*     C Version */                    |*        Fortran Version
                                       |
void initialize_things( void )         |      SUBROUTINE INITIALIZE_THINGS
{                                      |*
/*  Do cunning setup procedure     */  |*     Do cunning setup procedure  
/*  No need for a return statement */  |*
}                                      |      END
      .                                |       .
/*  Call it */                         |*     Call it
    initialize_things(); /* Note () */ |      CALL INITIALIZE_THINGS
      .                                |       .

The void qualifier also has yet another meaning, which will be discussed when we look at pointers.

The void function above demonstrates the general form of functions in C. They have a function definition with the formal parameters, then a {body} enclosed by the {} brackets. Function arguments are always passed by value in C. The actual arguments are copied to the (local) function formal arguments, as if by assignment. The arguments may be expressions, or even calls to other functions. The order of evaluation of arguments is unspecified, so don't rely on it ! Here is a C function example, with a similar Fortran routine for comparison.


/*     C Version */                    |*     Fortran Version
                                       |
int funcy( int i )                     |      INTEGER FUNCTION FUNCY( I )
{                                      |      INTEGER I
                                       |*
    int j;                             |      INTEGER J
/*  End of declarations ... */         |*     End of declarations ...
    j = i;                             |      J = I
    i = i + 1;/* Only local i changed*/|      I = I + 1 ! Calling arg changed
    j = i*j;                           |      J = I*J
                                       |      FUNCY = J
    return( j );                       |      RETURN
}                                      |      END
      .                                |       .
/*  Call it */                         |*     Call it
    k = 3;                             |      K = 3
    ival = funcy(k); /* ival is 12 */  |      IVAL = FUNCY( K ) ! IVAL is 12
      .              /* k is still 3 */|       .                ! K is 4

Notice that changing the function parameter in the C function does not alter the actual argument, only the local copy. To change an actual argument, you would pass it by address, using the address operator, &, and declare the function argument as a pointer to type int. More will be said about this in the pointers section. Generally, you should avoid writing functions in C that change the actual arguments. It is better to return a function value instead, where possible.


/*     C Version */                    |*     Fortran Version
                                       |
    myval = funcy( gibbon );           |      CALL FUNCY( GIBBON, MYVAL )

      Programming Challenge 3
      _______________________
      
        Hack  your copy of the "hello.c" to call some sort  of  arithmetic
      function,  perhaps to return the square  of the argument.  Write the
      function, and add a "prototype"  (these are discussed  later) for it
      before the main program, e.g.


            .
      /*  Function prototype */
      int funcy( int myarg ); /* semicolon where function body would be */
            .
      /* Main Program starts here */
      int main( int argc, char *argv[] )
      {
            .
      }

      /* The real McCoy - "Dammit Jim, I'm a function not a prototype" */
      int funcy( int myarg )
      {
            .
        /*  Do something and return() an int value */
            .
      }


      If  you are  feeling  really  cocky,  write a recursive  factorial()
      function that calls itself. Hint:


               .
          if ( n > 0) {
            factorial = n * factorial( n-1 );
          } else {
            factorial = 1;
          }
               .


      Call it from you main program  and step through with the debugger to
      convince yourself that it really is recursive.

When you write your own functions, try to avoid interpositioning, i.e. naming your function with the same name as a standard library (or system/Motif/X11/Xt library) function. Use

$ HELP CC RUN-TIME_FUNCTIONS your_function_name

to check for the existence of a similarly named DEC C RTL function. Or look in a book. It is a very bad idea to replace a standard function. If you need to write something with the same purpose as a standard function, but maybe with better accuracy or speed, call it something different, e.g. my_fast_qsort() .

Three other modifiers I haven't yet explained are static, const and extern. The static modifier is another one that changes meaning depending on its context. If you declare a global variable or function as static, it will still be visible throughout the same compilation unit (file to us), but will NOT be visible externally to programs linked against our routines. This is often used as a neat way of storing data that has to be visible to a number of related functions, but must not be accessible from outside. Some code fragments below illustrate this.


/*---- C Fragments -----------------------------------------------------------*/
/* Global Vars, NOT visible externally (i.e. to things linked against this)   */
static int number_of_things;

int AddToThings( int a_thing )
{
        .
    number_of_things = number_of_things + 1;
    return( number_of_things );
}

int GetNumberOfThings(void)
{
    return( number_of_things );
}

int RemoveThing( int a_thing )
{
        .
    number_of_things = number_of_things - 1;
    return( number_of_things );
}



*      Fortran (sort of) Equivalent
*-----------------------------------------------------------------------
      INTEGER FUNCTION ADD_TO_THINGS( A_THING )
         .
      INTEGER NUMBER_OF_THINGS
      SAVE    NUMBER_OF_THINGS
         .
      NUMBER_OF_THINGS = NUMBER_OF_THINGS + 1
      ADD_TO_THINGS_ = NUMBER_OF_THINGS
      RETURN
*
      ENTRY FUNCTION GET_NUMBER_OF_THINGS()
      GET_NUMBER_OF_THINGS = NUMBER_OF_THINGS
      RETURN
*
      ENTRY REMOVE_THING( A_THING )
         .
      NUMBER_OF_THINGS = NUMBER_OF_THINGS - 1
      REMOVE_THING = NUMBER_OF_THINGS
      RETURN
*
      END

Another use of static is with variables that are local to a function. In this case it is similar to the Fortran SAVE statement, i.e. the variable will retain its value across function calls, and WILL BE INITIALIZED to 0 if it is an integer type, or 0.0 if a floating point type (even if the floating point representation of 0 on your machine is not all bits set to 0), or NULL (pointer to nothing) if it is a pointer.


/*---- C Example -------------------------------------------------------------*/
int log_error( int code )
{
    static int total_number_of_errors;
/*  End of declarations ... */

/*  ++ is the same as  total_number_of_errors = total_number_of_errors + 1 */
    return( ++total_number_of_errors );
}



*     Fortran Equivalent
*-----------------------------------------------------------------------
      SUBROUTINE LOG_ERROR( CODE )
        .
      INTEGER TOTAL_NUMBER_OF_ERRORS
*     Not required for non-recursive DEC Fortran, but it documents your intent
      SAVE    TOTAL_NUMBER_OF_ERRORS
        .
      TOTAL_NUMBER_OF_ERRORS = TOTAL_NUMBER_OF_EBRORS + 1
      END

The const modifier is used to flag a read only quantity. For example,


    const double pi = 3.14159265358979;
      .
/*  Arizona ? */
    pi = 3.0;     /* Gives compiler error - try it in your test program */

The const modifier is useful for function prototype arguments which are passed by pointer, where you want to indicate that your function will not change the object pointed to. More will be said about function prototypes later.

      Programming Challenge 4
      _______________________
      
        Look   at   the   Fortran   example  above.  Spot  the  deliberate
      mistake. The compiler would probably flag an error for it, but think
      of  another instance where  perhaps you wanted to increment an array
      element indexed by a  non-trivial expression. Using the ++  operator
      in C  helps  avoid  typographical errors, and looks less clumsy (and
      saves valuable bytes ;-) ).  There is a similar  operator, --, which
      decrements by one. Read K&R II, pages 46-48, and pages 105-106. Make
      sure  you  understand  the  difference  between prefix  and  postfix
      versions  of ++ and --, and  try to rewrite the AddToThings() set of
      functions using these operators. Great - that's saved  me  having to
      explain it all.

The extern qualifier is rather like EXTERN in Fortran, and basically gives type information for a reference that is to be resolved by the linker. You DO NOT need to use extern with function declarations - int funcy( int i ); is the same as extern int funcy( int i); . It is usually used when declaring global variables to indicate that they are referenced in the particular compilation unit, but not defined in it.

What is the difference between "definition" and "declaration" ? In short, a definition actually ALLOCATES SPACE for the entity, whereas a declaration tells the compiler what the entity is and what it is called, but leaves it up to the linker to find space for it ! A global variable, structure or function can have many declarations, but only one definition. This is explained in more details in the "Header Files" section which follows.

Three less commonly used modifiers are volatile, auto and register. The volatile modifier tells the compiler not to perform any optimization tricks with the variable, and is most often used with locations that refer to hardware, like memory-mapped IO, or shared memory regions which might change in a way the compiler cannot predict. The auto qualifier may only be used for variables at function scope (inside {}) and is in fact the default. Auto variables are usually allocated off the stack (but this is up to the implementation). They will certainly not be retained across function calls. NEVER return the ADDRESS of an automatic variable from a function call (once you know about pointers). Because new automatic variables are "created" every time you go into a function, this allows C functions to be called recursively. The register qualifier is really obsolete. It is a hint to the compiler that a variable is frequently used and should be placed in a register. The compiler is quite free to ignore this hint, and frequently does, because it generally knows far more about optimizing than you do (Microsoft Visual C++ or DEC C for example). Don't bother using register.

Enumerated types, enum, are similar to Fortran integer PARAMETERs, but nicer to use. The general form is enum identifier { enumerator_list }, where "identifier" is optional but recommended. The comma-separated list of enumerated values starts at zero by default, but you can override this as shown in the example.


      C Example

/*----------------------------------------------------------------------------*/
    enum timer_state_e { TPending, TExpired, TCancelled};
    enum timer_trn_e { TmrSet=4401, TCancel=4414};
         .
    enum timer_state_e t_state;
    enum timer_trn_e t_trn;
         .
    t_state = TExpired; /* t_state now contains 1 */
    t_trn = TCancel;    /* t_trn now contains 4414 */



*     Fortran Example
*------------------------------------------------------------------------
      INTEGER TPENDING, TEXPIRED, TCANCELLED
      INTEGER TSET, TCANCEL
      PARAMETER (TPENDING = 0, TEXPIRED = 1, TCANCELLED = 2)
      PARAMETER (TSET = 4401, TCANCEL = 4414)
          .
      INTEGER T_STATE, T_TRN
          .
      T_STATE = TEXPIRED
      T_TRN = TCANCEL

When examining t_state or t_trn in the C program with the DEC debugger, the integer value will be converted to a name, e.g.

DBG> EXAMINE t_trn
PROG\main\t_trn:   TCancel

which is handy. Unfortunately, because the enumerated types are really type int, you can assign any integer value to t_trn without a compiler whinge ! Types and storage class modifiers are discussed in more detail in K&R II, page 209 onwards, if you still thirst for knowledge.

§3 Loop and Flow Control Constructs

C has three basic loop constructs. These are for loops, while loops and do loops. An example is worth a thousand words:

*     Fortran Loops Example
           .
      INTEGER I
      LOGICAL FIRST
           .
        PRINT *, I
      ENDDO
*
      I = 0
      DO WHILE ( I .LT. LIMIT )
        I = I + 1
        PRINT *, I
      ENDDO
*
      FIRST = .TRUE.
      DO WHILE ( FIRST  .OR.  I .LT. LIMIT )
        IF ( FIRST ) FIRST = .FALSE.
        PRINT *, I
        I = I + 1
      ENDDO

/*---- C Loops Example ("loops.c") -------------------------------------------*/

/* ANSI C Headers */
#include <stdio.h>
#include <stdlib.h>

/* Defines and Macros */
#define LMT 5

/* Main Program starts here */
int main( int argc, char *argv[] )
{
    int i;
/*  End of declarations ... */
    printf("LMT = %d\n", LMT);

    printf("\n'for' loop - for ( i = 1; i <= LMT; i++ ) {...}\n");
    for ( i = 1; i <= LMT; i++ ) { /* More usual in C would be  i = 0; i < LMT; i++ */
      printf("%d\n", i );
    }

    printf("\ni = 0\n");
    printf("'while' loop - while ( i++ < LMT ) {...}\n");
    i = 0;
    while ( i++ < LMT ) {
      printf("%d\n", i );
    }

    printf("\ni = LMT\n");
    printf("'do' loop - do {...} while ( ++i < LMT ); - always executes at least once\n");
    i = LMT;
    do {
      printf("%d\n", i );
    } while ( ++i < LMT );

    exit(EXIT_SUCCESS);
}

All these constructs are explained in detail in K&R II, chapter 3. The for loop has the following general form:


    for ( expression1; terminate_if_false_expression2; expression3 ) {
      .
    }

If "terminate_if_false_expression2" is missed out it is taken as being true, so an infinite loop results, for (;;) {ever}. The "expression1" is evaluated once before the loop starts and is most often used to initialize the loop count, whereas "expression3" is evaluated on every pass through the loop, just before starting the next loop, and is frequently used to modify the loop counter. It is quite legal, in C, to modify the loop counter within the loop, and the loop control variable retains its value when the loop terminates. Obviously "terminate_if_false_expression2" causes the loop to end if it is false, and is used to test the termination condition.

The "while" looks like this:


    while ( expression ) {
      .
    }

and keeps going for as long as "expression" is true. It zero trips (that is, the code in it is never executed) if "expression" is false on the first encounter. The for loop above could be written using while.

    expression1;
    while ( terminate_if_false_expression2 ) {
      .
      expression3;
    }

It isn't a good idea to do this though, because someone will spend ages looking at your code wondering why you didn't write a for loop, expecting some cunning algorithm.

Finally, before time, the old enemy, makes us leave Loopsville City Limits, let's look at the "do-while" construct. The loop body is always executed at least once


    do {
      .
    } while ( expression ); /* Semicolon needed */

and the loop will be repeated if "expression" is true at the end of the current loop. There is a keyword, break, which lets you leave the innermost loop early, transferring control to the statement immediately after the loop.


    for ( i = 0; i < strlen(string); i++) {
      if ( string[i] == '$' ) {
        found_dollar = TRUE;
/*      Once we've found the dollar no need to search rest of string */
        break;
      }
    }
/*  Jump to here on "break" */
     .

A related keyword, continue, skips to the end of the loop and continues with the next loop iteration.


    for ( i = 0; i < strlen(string); i++) {
/*    Don't  bother trying to upcase spaces */
      if ( string[i] == ' ' ) continue; /* Move on to next character */
/*      It wasn't a space so have a go */
        string[i] = toupper( string[i] );
    }
/*  Jump to here on "break" */
     .

This is most often used to avoid complex indenting and "if" tests. Don't use it like I just did, which was a silly example.

You have already met the "if" construct. Here it is again, with the "else if" demonstrated too.


    if ( expression ) {
       .
/*    Do something */
       .
    } else if ( other_expression ) {    
       .
/*    Do something else */
       .
    } else if ( final_expression ) {    
       .
/*    Do something different */
       .
    } else {
       .
/*    Catch all if none of above expressions are true */
       .
    }

It is legal to write this kind of thing

    if ( expression )       /* Avoid this form */
      i = 1;
    else
      i = 2;

The problem arises if you do this

    if ( expression )       /* This is probably not what was intended */
      i = 1;
    else
      i = 2;
      dont_forget_this = 3;

You might think that if "expression" is true (i.e. non-zero) then you would set i to 1, and if it were false you would set i to 2 and dont_forget_this to 3. In fact you will always set dont_forget_this to 3, because only the first statement after the "else" is grouped with the "else". I never use this form, other than for a one liner like


    if ( expression ) expression_was_true = TRUE;

where the meaning is clear. Use the bracketed form which makes it totally unambiguous, and is easier to use with the debugger.

C provides an alternative to lots of if - else if tests. This is the "switch" statement. The "expression_yielding_integer" is calculated, and matched against the "case" "const-int-expression"s. When one matches, the statements following are executed, or if none match, the statements following "default" are executed

    switch ( expression_yielding_integer ) {
      case const-int-expression1:
        statements1;
      case const-int-expression2:
        statements2;
      case const-int-expression3:
        statements3;
          .
          .
      default:
        statementsN;
    }

Unfortunately a bad default behaviour was chosen for this. Each "case" drops through to the next one by default, so if, say, "expression_yielding_integer" matched "const-int-expression2", then "statements2" through to "statementsN" would ALL be executed. This is solved by using "break" again.

    switch ( expression_yielding_integer ) {
      case const-int-expression1:
        statements1;
        break;                      /* Always use break by default */
      case const-int-expression2:
        statements2;
        break;
      case const-int-expression3:
        statements3;
        break;
          .
          .
      default:
        statementsN;
        break;
    }

The default behaviour is rarely what is required in practise, and it would have been far better to have a default "break" before each case, and maybe use "continue" to indicate fall-through. Remember that chars can be used as small integers, so the following is quite legal.


    char command_line_option;
     .
    switch ( command_line_option ) {
      case 'v':
        verbose_mode = TRUE;
        break;
      case 'l':
        produce_listing = TRUE;
        break;
      case '?':               /* Following two cases deliberately fall thru */
      case 'h':
        display_help = TRUE;
        break;
      default:
        use_default_options = TRUE:
        break;
    }

§4 Arrays

Arrays in C always start at the 0 element rather than 1, and there is NO ARRAY BOUND CHECKING (gasps of horror). Here is a one-dimensional example array:

    int job[20]; /* job[0], job[1] .. job[19] */

and the dimension must be an integer greater than zero. This is how to declare a two-dimensional array [rows][columns]


    int job[4][20]; /* Like 4 job[20] 's, job[0][0], job[0][1] .. job[3][19] */
          .
    i = job[2][0];  /* Good */
          .
    i = job[2,0];   /* Bad - don't ever do this */
          .

Multi-dimensional arrays are column major; that is, the right-most subscript varies fastest, unlike Fortran. Notice that you can't use commas to separate the indices. Separate pairs of square brackets are needed for each index. There is no limit to the number of dimensions other than those imposed by your compiler and the amount of memory available. In practice, multi-dimensional arrays are rarely used. Unfortunately, you can't (in C) use const int's as array bounds. You have to use #define, like this:


#define MAX_SIZE
      .
    float floaty[MAX_SIZE];

More will be said about #define later. Arrays can be initialized when they are defined:


    int days_in_month[12] = { 31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31 };
    int matrix[2][3] = { { 0, 1, 2 }, { 3, 4, 5 } };

Remember that uninitialized arrays can contain anything at all, so don't expect them to be full of zeros. In addition, initialized arrays can't be "demand zero compressed". You can leave out the size of an array and have it use the number of initializers, like this


    int array_initialization_pages_in_K_and_R_II[] = { 86, 112, 113, 219 };

which produces an array of 4 integers. You would probably want to use sizeof() to determine the size of the array in this case


     nelements =   sizeof( array_initialization_pages_in_K_and_R_II )    /
/*               -----------------------------------------------------        */
                 sizeof( array_initialization_pages_in_K_and_R_II[0] );
        .
     for ( i = 0; i < nelements; i++) {
        .
     }

Notice that you index up to LESS THAN the number of elements, because the last element is (nelements-1) .

§5 Strings

Character arrays containing a contiguous sequence of characters terminated with the null character, '\0', are known as strings in C. Initialized character arrays become static by default, i.e. retain their value across function calls, unless you change them. The initializer, or "string literal" is delimited by double quotes "like this" . You can split a string initializer over several lines, each part being in " quotes, and they will be concatenated together. The resultant string has the null character, represented by the escape sequence '\0', appended to the end of it.

    char random[80]; /* Could contain anything */
    char title[] = "Phil's Ramblings"; /* Takes 17 bytes due to '\0' at end */

    char longer_string[] = "Here is quite a long string split up over"
                           "two lines. VAX C doesn't allow this, though."
                           "Another good reason to switch to DEC C on"
                           "VAX or Alpha VMS, or Visual C++ for Windows."

    char string_with_quote[] = "Here is the quote \" character";

The name that you give to an array can be used as a pointer to the zeroeth element of the array. More will be said about this in the "Pointers" section. There are many functions in the standard library for manipulating character strings, and these all begin with "str". You will need to include <string.h> to use them. Look in K&R II pages 249-250. These functions expect an array, or a pointer to characters, as their arguments.

Finally, note that an empty string is not really empty.


    char is_it_empty[] = ""; /* No, it contains one character, '\0' */

Always bear in mind that the string functions often copy trailing '\0' characters, so you must ensure that you allow space for this. It is a good idea to always use the "strn" versions of the calls, with sizeof(destination) as the character limit, because that way you will avoid runaway (and hard to detect) memory overwriting. Remember to terminate the destination string, e.g.


    strncpy( destination, source, sizeof(destination) );
    destination[sizeof(destination)-1] = '\0';    

else you'll end up avoiding potential overwrites, but leave a potentially unterminated string to catch you out later !

§6 Pointers

Pointers are declared using the *, or "dereference operator".

    int *i_ptr;

declares a pointer to type int. As declared above, i_ptr is most likely not yet pointing at a valid location. In order to make it point somewhere valid, you generally use the "address operator", &, like this


    int i;
    int j;
    int *i_ptr;
     .
    i_ptr = &i;
     .

You can then change or read the value of i by using the "dereference operator", and change the object pointed to, providing it is an object of the correct type.


    *i_ptr = 3; /* Set the int pointed to by i_ptr to 3 */
    printf("%d\n", i );      /* i will be 3 */
     .
    i_ptr = &j; /* Set the i_ptr to point to j now */
     .
    *i_ptr = 3; /* Set the int pointed to by i_ptr to 3 */
    printf("%d\n", j );  /* j will be 3 */
     .

This is a rather silly example, because you would obviously just use i or j directly. A more realistic use of pointers is with arrays:


    char string[] = "Here is a string with a $ in it";
    char *sptr;
    int contains_dollar;
     .
    sptr = string; /* Remember that the array name is the same as &array[0] */
    contains_dollar = FALSE;

    while ( *sptr ) {  /* While thing pointed to is not 0 i.e. null character */
      if ( *sptr == '$' ) {
        contains_dollar = TRUE;
        break; /* Leave the while loop early and safely */
      }
      ++sptr;
    }    
     .

When you increment pointers, they automatically increment the address they point to by the size of one of the objects to which they point. In the example above, that is one character, i.e. a byte. If the array was an array of int, then the pointer would increment by sizeof(int) bytes. Just to frighten you, this loop could be written


    while ( *sptr && !( contains_dollar = *sptr++ == '$' ) );

      Programming Challenge 5
      _______________________
      
        You  guessed it. Figure out what is happening in the scary "while"
      loop above. Now write your own (differently named) version of strcpy
      using similar techniques to make it as short as possible.

Arrays and pointers are closely related. They can be used in identical ways in many situations. For example:


      .
    char string[80];
    char *sptr;
      .
    sptr = &string[0]; /* This could be written as sptr = string */
      .
    *string       = 'A'; /* Using array name like pointer */
    *(string+10)  = 'B'; /* Using array name like pointer */
      .
    sptr[0]       = 'A'; /* Using pointer like array */
    sptr[10]      = 'B'; /* Using pointer like array */
      .

This is because, in expressions or function calls, arrays and pointers are both converted to the form "*(pointer + index-offset)". The main thing to remember is that pointers are variables, and can be changed to point to different objects, whereas array names are not variables. The index-offset is automatically scaled according to the type of data pointed to. In this case, we are dealing with char which, by definition, has a size of 1, but if the pointers were pointers to int, then on the VAX or Alpha, the index-offset would be automatically scaled by 4.


      .
    int array[20];
    int another_array[20];
    int *i_ptr;
      .
    i_ptr = array;         /* Legal */
    i_ptr[12] = 3;
      .
    i_ptr = another_array; /* Legal */
    i_ptr[2] = 4;
      .
    array = another_array; /* Illegal !  */
      .

Even multi-dimensional arrays get decomposed to the "*(pointer + index-offset)" by the compiler in say, a function call, which gives you no special knowledge of how they fold. Hence if you are using a pointer to a multi-dimensional array where the dimensions could vary, it is up to you to calculate the offset correctly, e.g.


    int mda[ROWS][COLS];
     .
    i = funcy( mda, ROWS, COLS );
     .
int funcy( int *array, rows, cols )
{
     .
    for ( i = 0; i < rows; i++) {
      for ( j = 0; j < cols; j++) {
        total += *(array + i*cols + j);
      }
    }
     .
}

Of course, if the function was only expected to deal with arrays of set dimensions, you could just declare those in funcy().


    int mda[ROWS][COLS];
     .
    i = funcy( mda );
     .
int funcy( int array[ROWS][COLS] )
{
     .
    for ( i = 0; i < ROWS; i++) {
      for ( j = 0; j < COLS; j++) {
        total += array[i][j];
      }
    }
     .
}

The strange += assignment operator isn't a misprint. It is shorthand, so that


    x = x + 4;

can be written


    x += 4;

Similarly

    y = y - 10;

becomes


    y -= 10;

There must be NO SPACE between the operator and the = sign, and the operator comes immediately before the =. This notation is handy for more complex expressions, such as


    array[ hash_value[index]*k + offset[i] ] += 4;

so you only need maintain the expression in one place. Many binary operators have a similar assignment operator. Check K&R II page 50 and page 48 for the bitwise operators that can also be used in this way.

The other thing to remember is that whereas arrays allocate space, and hence the array name points to something valid, pointers MUST NEVER BE USED UNTIL THEY HAVE BEEN INITIALIZED TO POINT TO SOMETHING VALID.

There is a special pointer value defined by the standard, called the NULL pointer, which is used to indicate that the pointer doesn't point to anything. Normally, you cannot directly assign integers to pointers, but the NULL pointer is an exception. Both the following lines make p point to "nothing" (well, a guaranteed "not valid location" really).


    i_ptr = 0;    /* Legal but not recommended */    
    i_ptr = NULL; /* Recommended - it is clear that you refer to a pointer */

The NULL macro (see Macros section later on), defined identically in <stddef.h> and <stdio.h> among other places, is often defined as


#define NULL     ((void *) 0)

even though "0" would do. This discourages its use as an integer, which you should never do. People often make the mistake of writing


    string[i] = NULL; /* Never do this - you really want '\0' */
    i = NULL;         /* Never do this if i is integer and you really mean 0 */

when what they actually mean is


    string[i] = '\0'; /* The null character - that's more like it */
    i = 0;            /* Integer zero */

A pointer of type (void *) is a special type of pointer that is guaranteed to be able to point to any type of object, hence the NULL pointer can be assigned to any pointer type. The NULL pointer need not have all bits set to zero, so don't rely on this.

Pointers are very useful as function arguments for routines that manipulate strings of unknown (at compile time) length.


int how_long( const char *s )
{
    int i;
/*  End of declarations ... */
    i = 0;
    while ( *s++ ) {
      i++; /* Increment i until '\0' found */
    }
    return( i );
}

Even though the thing pointed to by s is const, note that it is quite legal to increment the pointer s in the function, because s is a local, variable pointer, pointing to whatever the calling argument to how_long() was. Hence if you call how_long(string), you don't change string, you assign string to s then increment s. Any expression using array subscripting, for example array[index], is exactly the same in C as its pointer equivalent, in this case *(array+index). You have to be careful when using the const modifier with pointers. The following examples should illustrate the point.


    int i;
    const int *i_ptr;       /* i_ptr points to a const int */
    int * const i_ptr = &i; /* i_ptr is const, points to variable int */

Another important difference between pointers and arrays relates to the sizeof() operator.


    int array[20];
    int *i_ptr;
    size_t s;
       .
    i_ptr = array;
       .
    s = sizeof(array);  /* s is 20*sizeof(int), which is 100 on the VAX */
    s = sizeof(i_ptr);  /* s is sizeof(int *), which is 4 on the VAX    */
       .

You can't deduce the size of an array from a pointer, only the size of the pointer. Because arrays as function arguments are treated the same as pointers, then even if you declare the function arguments as "func( int array[10] )" array is still treated like a pointer in the function body, so sizeof(array) in the function will give you the size of pointer to int, not 10 times size of int.

It is quite legal to write a pointer definition like this:


    int* i_ptr; /* Not recommended */

This is best avoided, because it can be confusing. Consider


    int* i_ptr1, i_ptr2; /* Probably not what you intended */

At first glance it looks like you have just declared two pointers to int. In fact, i_ptr1 is a pointer, but i_ptr2 is an int.


    int *i_ptr1, *i_ptr2; /* Better */

The second example keeps the * with the variable to which it relates, and is considered better style (by me at any rate) !

There are two standard library functions often used with pointer. They are declared in <stdlib.h>, and are malloc() and free(). Both are AST reentrant under DEC C. The malloc() function allocates an area of memory specified in bytes, and is declared as


void *malloc(size_t size);

and would be used like this


     .
    int *i_ptr;
     .
    i_ptr = (int *)malloc( sizeof(int)*nelements_wanted );
    if ( i_ptr != NULL ) {
       .
      i_ptr[i] = i;
       .
    } else {
       .
/*    Couldn't get the memory - do some cunning recovery */
       .
    }

It is good practise to "cast" the result of a malloc() to the correct type. This helps the compiler to indirectly check whether you are using the correct type in the sizeof() invocation too. If it complains about your cast, then (assuming the type is the same in the sizeof() ) you are probably using the wrong type in both places, and might have allocated too little memory. There is no check if you wander off the allocated memory, out into memory space no man has seen before ! The memory returned by malloc() can contain any values when you get it, i.e. it is not set to zero.

The free() function frees up the memory obtained from malloc(). It is declared as


void free(void *pointer);

and would be used like this to free the memory obtained in the previous example


    free( i_ptr );
    i_ptr = NULL;  /* Good practise */

I like to set the pointer to NULL immediately upon freeing the memory, because the pointer MUST NOT BE USED again after being free()ed. By setting it to NULL, you will (under VMS or Windows NT) get an ACCVIO if you try and dereference the pointer. This is safer than leaving it, having the memory reused elsewhere, then changing it via the duff pointer. This sort of mistake is very hard to track down. It is very important to always free malloc()-ed memory when you are done with it, or you will cause what is known as a "memory leak".

There are a couple of functions related to malloc(). One is calloc(), which allows you to allocate memory and initialize it's value in one go.


void *calloc(size_t number, size_t size);

The other function is realloc(), which allows you to expand a region of memory obtained by malloc(), whilst retaining its current contents.


void *realloc(void *pointer, size_t size);

The new, expanded region of memory need not be in the same place as the original


    new_i_ptr = (int *)realloc( i_ptr, sizeof(int)*larger_nelements_wanted );
    if ( new_i_ptr ) {
/*    Successfully expanded */
      i_ptr = new_i_ptr; /* Don't free anything here ! */
    } else {
/*    Couldn't get the extra memory, stick with the existing pointer */
    }

so in this example the memory may have changed location, but the original content will have been copied to the new location. Note how I use a new pointer, new_i_ptr, to check that the relocation was successful. This is essential because if you directly assigned to the pointer to the memory you were trying to realloc and the call failed (returning NULL) you would have no way to free the memory originally pointed to by i_ptr.


/*  Never do this - always assign the return value to a different pointer */
    i_ptr = (int *)realloc( i_ptr, sizeof(int)*larger_nelements_wanted );

A final couple of warnings about pointers. Firstly, the [] operator has a higher precedence than the *, so int *array[] means an array of pointers to int, not a pointer to an array of ints. Secondly, the following two statements are not equivalent:


extern int  is[]; /* This declares an int array, defined elsewhere */
extern int *is;   /* This declares a pointer to int, defined elsewhere */

The compiler will actually generate code you did not intend, and probably cause an ACCVIO if you confuse these. This is because an access via a pointer first looks at the address of the pointer, gets the pointer value stored there, and uses that as the base address for lookups. Access via an array name uses the address of the array itself as the base address for lookups. Draw a diagram if you are confused ! Using the EXT and DEFINE_GLOBALS macros, explained later, should stop this ever happening to you.

§7 Structures and Unions

Structures and unions in C are pretty much like their Fortran counterparts. The general form of a structure declaration is
struct optional_structure_identifier {what's in it} optional_instance;

I suggest that you always specify optional_structure_identifier, then declare the instances of the structure later in a manner similar to the way we used enum. Example:


/*     C Example */                    |*     Fortran Example
                                       |
struct oscar_location_s {              |      STRUCTURE /OSCAR_LOCATION_S/
  int x;                               |        INTEGER X
  int y;                               |        INTEGER Y
}; /* Note the semicolon ; */          |      END STRUCTURE
       .                               |           .
int main( int argc, char *argv[] )     |           .
{                                      |           .
    struct oscar_location_s loc;       |      RECORD /OSCAR_LOCATION_S/ LOC
       .                               |           .
    loc.x = 100;                       |      LOC.X = 100
    loc.y =  50;                       |      LOC.Y =  50
       .                               |           .
}                                      |

Similarly with unions, the following trivial example shows how they might be declared and used:


/*     C Example */                    |*     Fortran Example
                                       |
union hat_u {                          |      STRUCTURE /HAT_U/
  int   mileage;                       |        UNION
  float hotel_cost;                    |          MAP
};                                     |            INTEGER MILEAGE
       .                               |          END MAP
int main( int argc, char *argv[] )     |          MAP
{                                      |            REAL HOTEL_COST
    int was_tow;                       |          END MAP
    union hat_u cost;                  |        END UNION
       .                               |      END STRUCTURE
    if ( was_tow ) {                   |           .
      cost.mileage = 100;              |      IF ( WAS_TOW ) THEN
    } else {                           |        COST.MILEAGE = 100
      cost.hotel_cost =  45.50;        |      ELSE
    }                                  |        COST.HOTEL_COST =  45.50
       .                               |      ENDIF
       .                               |           .
}                                      |           .

Notice that you don't need the MAP - END MAP sequence in C that is used in DEC Fortran. Everything in the union { body } acts as though it is sandwiched between MAP - END MAP.

Structures may contain pointer references to themselves, which is very handy for implementing linked lists:


struct list_s {
  struct list_s *prev;
  struct list_s *next;
  void *data_ptr;
};

When you declare a pointer to a structure, let's call it p, there is a potential trap in using the pointer because the binding of the structure member operator, ., is higher than the * dereference operator. Hence *p.thing means lookup the member "thing" of p, and use that as an address for the dereference. What you really want is (*p).thing. This is a bit ugly, so C provides the -> operator.


      .
    struct my_struct_s my_struct;
    struct my_struct_s *struct_ptr;
      .
    struct_ptr = &my_struct;
    (*struct_ptr).thing = 1; /* "thing" = 1 in struct pointed to by struct_ptr*/
    struct_ptr->thing = 1;   /* Same as above */
      .

This is good place to introduce a program example kindly provided by Rob Cannings. This uses cunning (Cannings ?) pointer manipulation to create a binary sorted tree.


/*---- Illustration of pointer manipulation ("treesort.c") -------------------*/
/* Example provided by Rob Cannings:                                          */
/* (Excess white space removed by Phil O. ;-))                                */
/* We implement a sorting routine with the sorted list stored in a tree.      */

/* ANSI C Headers */
#include <stdlib.h>
#include <stdio.h>

/* Structures */
struct treeNode {
  int data;
  struct treeNode *pLeft;
  struct treeNode *pRight;
};

/* Function prototypes */
void AddNode(struct treeNode **ppNode,struct treeNode *pNewNode);
void Dump(struct treeNode *pNode);

/* Defines and macros */
#define NUMBER_OF_NUMBERS 4

/* Main Program starts here */
int main(int argc,char *argv[])
{
    int i;
    int toBeSorted[NUMBER_OF_NUMBERS] = { 93, 27, 15, 47};
    struct treeNode dataNode[NUMBER_OF_NUMBERS];
    struct treeNode *pSortedTree;
    struct treeNode *pNewNode;
/*  End of declarations ... */

/*  Initialise one node for each item of data */
    for (i = 0; i < NUMBER_OF_NUMBERS; i++) {
      dataNode[i].pLeft = NULL;
      dataNode[i].pRight = NULL;
      dataNode[i].data = toBeSorted[i];
    }

/*  Build a sorted tree out of the data nodes, printing it */
/*  out after each new node is added to the tree           */
    pSortedTree = NULL; /* the tree starts as just as a stump */

    for (i = 0; i < NUMBER_OF_NUMBERS; i++) {
      pNewNode = &dataNode[i];
      AddNode(&pSortedTree,pNewNode);
      printf("\nSorted list of %d items:\n",i + 1);
      Dump(pSortedTree);
    }
    exit(EXIT_SUCCESS);
}

void AddNode(struct treeNode **ppSortedTree,struct treeNode *pNewNode)
{
    struct treeNode *pCurrentNode;
/*  End of declarations ... */

    pCurrentNode = *ppSortedTree; /* ppSortedTree is a pointer to a pointer */

/*  Have we reached the end of a branch ? */
    if (pCurrentNode == NULL) {
      *ppSortedTree = pNewNode;
    } else {
/*    We have not reached the end of a branch */
      if (pCurrentNode->data > pNewNode->data) {
        AddNode(&(pCurrentNode->pRight),pNewNode);
      } else {
        AddNode(&(pCurrentNode->pLeft),pNewNode);
      }
    }
}

void Dump(struct treeNode *pNode)
{
/*  End of declarations ... */
    if (pNode != NULL) {
      Dump(pNode->pLeft);
      printf("%d\n",pNode->data);
      Dump(pNode->pRight);
    }
}

      Programming Challenge 6
      _______________________
      
        Compile and link  "treesort.c" with the debugger. Step through and
      experiment with looking at pointers,  and looking at the things they
      point to,  e.g. EXAMINE *pNode . Modify the  program so you  can add
      numbers with a single argument function call.

Sometimes it is useful to know what offset a structure member has from the start of the structure. There is a useful macro defined in <stddef.h> called offsetof which will calculate the offset of a structure member from that start of the structure.


    byte_offset = offsetof(struct my_struct_s, thing);

The first argument to the offsetof macro is a TYPE, not a variable name. An example of this is shown in the "key.c" example program later in the course.

§8 Typedef

The typedef statement lets you define a new name for a pre-existing type. It doesn't create a new type itself. An example should make the usage clear. Imagine you wanted to store coordinates, and initially you thought they could all fit in a short int. You might decide to typedef the coordinate declarations like this:


    typedef  short int  Coordinate_t;
      .
    Coordinate_t x[MAX_POINTS], y[MAX_POINTS];
      .

Later on it might transpire that increased resolution means that you need more than a short int. All you need do then is


    typedef  long int  Coordinate_t;

Be careful and sparing in your use of typedef. Don't use typedef for everything so that no-one can tell the true type of anything. Some people like to use typedef with structures,


    struct coord_s {
      int x;
      int y;
    };
    typedef  struct coord_s  Coordinate_t;
      .
    Coordinate_t points[MAX_POINTS];
        .
      points[i].x = 100;
      points[i].y =  50;
        .

whereas others argue that this masks the fact that coordinates are really structures and that it would be clearer to use


    struct  coord_s  points[MAX_POINTS];

I would suggest that you put all your structure and typedefs in one place, like in a header file, and use whatever makes the code uncluttered and easy to follow. One place where I think typedef does improve clarity is when defining pointers to functions.


typedef int (*verify_cb_func_ptr)( Bodget b, PxPointer cdata, PxCBstruct cbs );

declares verify_cb_func_ptr as a pointer to a function returning an int, with 3 arguments of the types shown. Note that the type returned by the functions themselves is int.


int verify_name( Bodget b, PxPointer cdata, PxCBstruct cbs );
      .
    verify_cb_func_ptr vcb;
      .
    vcb = verify_name;
    i = (*vcb)( b, cdata, cbs); /* Note how to call function thru pointer */
      .

The brackets around the (*vcb) are needed because the function brackets () take precedence over *.

§9 Header Files

I have cunningly tripled up the HELLO program to demonstrate the use of printf, which is a "stdio" function, the for (;;) loop, and the exit(EXIT_SUCCESS) end-your-program function from "stdlib". These functions, or others from these two libraries, are so commonly used that it is a good idea to always include the <stdio.h> and <stdlib.h> ANSI standard header files in all your programs. Header #include files in C can be specified in two ways:

#include <stdio.h>

and


#include "myheader.h"

The quoted "myheader.h" form starts searching in the same directory as the file from which it is included, then goes on to search in an implementation defined way. The angle bracketed <stdio.h> form follows "an implementation defined search path". In practise "implementation defined search path" tends to be the system libraries. Under VAX C, all the header files lived as .h files in SYS$LIBRARY: . Under DEC C, they live in text libraries like DECC$RTLDEF.TLB and SYS$STARLET_C.TLB. On Windows using Visual C++ 6.0 they are in C:\Program Files\DevStudio\VC98\Include , assuming that you installed Visual C++ on to your C: disk. If you want to know the full search rules for VMS, type

$ HELP CC LANGUAGE_TOPICS PREPROCESSOR #INCLUDE

You should always use the angle bracket <> form for ANSI header files, and use the quoted form for your own headers, e.g.


#include "src$par:trntyp.h"

The # symbol is known as the preprocessor operator. When you perform a C compilation, the first stage it goes through is preprocessing, where all the # directives are obeyed, and various inclusions and substitutions are made before the code is compiled. The # sign must always be the first non-whitespace character on the line, and is one of the few exceptions to the general free format of C code. You can have spaces after the #, and these are often useful when using #if constructs.

Another common preprocessor directive is #define . This can be used to define "parameters" which you might want to use as array bounds for example, but in addition it lets you define macros which take arguments and produce inline code using the arguments. For example,


/* Some defines and macro definitions */
#define PI          3.14159265358979
#define MAX(a,b) (((a)>(b))?(a):(b))
#define STRING_SIZE 16
          .
          .
{
    char string[STRING_SIZE]; /* Using a #define'd array bound */
          .
}

Notice that there are no semicolons at the end of the #define lines. Leading and trailing blanks before the "token sequence" (the body of the macro or definition) are discarded, although you can use \ at the end of a line to indicate that there is more of the macro on the next line. In the second form of macro shown above, you cannot have a space between the identifier, MAX, and the first "(", or the preprocessor will not know that the () delimit the parameter list for the macro expansion. Also notice that (if you are a beginner) you haven't got a clue what is going on with that MAX macro !

The #if, #else, #elif and #endif conditional preprocessor directives are used to include code selectively during preprocessing. They can be used to test if a particular macro name has been defined (even as an empty string). A common use for this is stopping the same header file contents being included more than once. For example, imagine you had created a header file called "utils.h".


/*---- My header file for my util routines, called "tla_utils.h" -------------*/
#if !defined( TLA_UTILS_H )   /* Could have used #ifndef TLA_UTILS_H */
#define TLA_UTILS_H
    .
#if defined( __VMS )          /* Could have used #ifdef __VMS */
# include "vms_specific_stuff.h"
#elif defined( UNIX )
# include "inferior_unix_alternative.h"
#else
# include "oh_dear_it_must_be_dos.h"
#endif
    .
/*  Do some stuff that should only be done once */
    .
#ifndef DEFINE_GLOBALS
# define EXT extern
#endif
    .
#define MY_PROGRAM_ARRAY_LIMIT 100
    .
EXT int   tla_global_int;
    .
EXT const float tla_global_pi
#ifdef DEFINE_GLOBALS
 = 3.14159265358979
#endif
;
    .
EXT char  tla_title_string[]
#ifdef DEFINE_GLOBALS
 = "Program Title"
#endif
;
    .
int MyFunction( int meaningful_name );  /* This is not a function definition  */
                                        /* it is a "function prototype" which */
                                        /* allows arg and return val checking */
    .
#endif  /* End of TLA_UTILS_H block */

This technique is widely used to enable selection of the correct code at compile time. Try

$ HELP CC  Language_topics  Predefined_Macros  System_Identification_Macros

which will give you some of the predefined (by the compiler) macros that let you switch code on and off depending on, say, whether you are on a VAX or Alpha. See K&R II pages 91 and 232 for more information on this subject.

The definition of the EXT macro is another useful technique for ensuring that you only DEFINE a variable once (ie. actually allocate space for, or initialize a variable with a value). Macros are explained in more detail below, but basically the text (if any) associated with the macro name is substituted wherever the macro appears, before compilation proper begins. In your main program, you #define DEFINE_GLOBALS and the header file then becomes


       .
int tla_global_int;
       .
const float tla_global_pi = 3.14159265358979;
       .

whereas any files of subroutines which don't #define DEFINE_GLOBALS will process the same header fragment as


       .
extern int tla_global_int;
       .
extern const float tla_global_pi;
       .

so the values are resolved at link time, and won't be contradictory to the main program. This technique saves having to have two versions of your header files (which inevitably get out of step).

§10 Macros and the ? operator

Macros are preprocessed by the C preprocessor, and the text of the macro, along with any parameters, are substituted for the macro itself. Hence if you invoked the MAX macro shown previously, the preprocessor would change the invocation

#define MAX(a,b) (((a)>(b))?(a):(b))
      .
    maxval = MAX( maxval, this);
      .

to this before the compiler proper ever saw it:


       maxval = (((maxval)>(this))?(maxval):(this));

Removing some of the "guard brackets" you get this slightly more readable version


       maxval = (maxval > this) ? maxval : this ;

The brackets around the parameters in the expansion are necessary to keep the meaning correct if, say, one of the arguments is a function call, or complex expression. Sometimes it is advisable to create a temporary variable to avoid "using" the parameters more than once, and this will be explained later. See page 229 - 231 of K&R II for a fuller explanation of defining macros. Convention dictates that macros should be totally uppercase. This is certainly the style used in the ANSI header files, and it is generally best to make all your macros uppercase.

The ? operator is a ternary operator, i.e. it takes three operands. It should be used sparingly, and is a shorthand as illustrated below:

   value = (expression_1) ? expression_2 : expression_3;

is (more or less) equivalent to

   if ( expression_1 ) {
     value = expression_2;
   } else {
     value = expression_3;
   };

The reason it is handy in macros is that it is best to avoid multiple ; separated statements in a macro, because that could well change the meaning of code. Macros tend to be invoked on the assumption that they are a single statement and code meaning could change if they weren't, e.g. if ( condition ) INVOKE_MACRO( bob );. By using the ? operator you can get a single statement that still has some switching logic in it. There is a trick to get round the single statement restriction, and still behave nicely:


#define MULTI_STATEMENT_MACRO( arg )   do { \
                                           first_thing; \
                                               .
                                           last_thing; \
                                       } while (0)  /* DONT put a ; at end ! */

In C, an expression is TRUE if it is ANY nonzero value, or in the case of pointers, if it doesn't compare equal to NULL. The results of logical comparisons or other built-in operators is guaranteed to be 0 or 1, so


   i = ( 2 > 1);  /* Sets i to be 1 */
   i = ( 1 > 2);  /* Sets i to be 0 */

So, in our MAX example ((a)>(b)) will be 1, i.e. TRUE, if a is greater than b, 0 otherwise. So "expression_1" is TRUE if a > b. Hence the value of "expression_2" i.e. a will be chosen. Otherwise "expression_3", in this case b will be used.

"Why define MAX as a macro at all ?" you might ask (pause until someone asks). Well the reason is that if you used a function, you would need to write a version for floating point numbers, another for ints, another for long ints and so on. Of course, a macro can circumvent type checking, which some people don't like very much, so in C++ macros have been effectively eliminated for most purposes by "templates" which you can learn about in my STL Course.

When using the #if test mentioned in the "Header Files" section, you can use relational tests on constant expressions. Here is an example of checking that you are using Motif 1.2 or greater


#if (XmVERSION >= 1 && XmREVISION > 1)
    XtSetArg( argl[narg], XmNtearOffModel, XmTEAR_OFF_ENABLED ); narg++;
#endif

The expression following the #if must either use the preprocessing operator defined(identifier) (which returns 1 if identifier has been #defined, else 0) or be a constant expression. This can be handy for defining a number of levels of debugging information. The #if is also the safest way to "comment out" unused code, rather than messing about making sure you haven't illegally nested comments. For example:


#ifdef NEW_CODE_IS_RELIABLE
/*   New code that should be faster but hasn't been tested as much as the old */
   .
#else
/*   Here is the old code that worked - don't want to remove it yet */
   .
#endif

Clearly the first #if test will always fail in our lifetime because the macro will never be defined, so the old code will not be compiled. This technique avoids problems caused by inadvertent comment nesting.

Macros can be undefined using the #undef directive.


#define DEBUG 1
.
#ifdef DEBUG
    printf("The value of x is %d in routine Funcy\n",x);/* Print out debug msg*/
#endif
.
#undef DEBUG
.
#ifdef DEBUG
    printf("The value of x is %d in routine Gibbon\n",x); /* Not printed */
#endif
.

You will need to #undef a macro if you want to use it again. Complete redefinitions aren't allowed. You can, however, define a macro more than once provided the tokens it expands to are the same, ignoring whitespace. This is known as a "benevolent redefinition" and is often used to get identical definitions of the NULL macro in several header files.

Avoid starting your macro names with _ and in particular __ because underbars are reserved for the implementations, and double underbars are use for macros predefined by the standard. For example, the standard reserves __LINE__,__FILE__,__DATE__, __TIME__ and __STDC__. Look in K&R II page 233 for the meanings of these.

Occasionally it is useful to be able to use the macro arguments as strings. This is done by using the # directly in front of the argument.


#define DEBUG_PRINT_INT( x ) (printf("int variable "#x" is %d",x))

#ifdef DEBUG
    DEBUG_PRINT_INT( i ); /* Prints "int variable i is 10" or whatever */
#endif

Concatenation of macro arguments is also possible using the ## directive. Some people like commas in big numbers, so you might use it like this:


#define NICKS_MEGA_INT(a,b,c) a##b##c
     .
    int i;
     .
    i = NICKS_MEGA_INT( 10,000,000 ); /* same as 10000000 after expansion */
     .

Then again, you might not. As a final thought for this section, I will demonstrate a couple of benign uses for the ? operator - it's not just there for the nasty things in life.


    got_space = GetSpace(how_much); /* Returns NULL if it fails */
    printf( got_space ? "Success\n" : "Failure\n");
      .
/*  Avoid ACCVIO if pointer is NULL */
    printf( "Name is %s\n",  name_ptr[i] ? name_ptr[i] : "**Unknown**" );
      .
/*  Handle plurals */
    printf( "Found %d item%s\n", nitems,  (nitems != 1) ? "s" : "" );

The ACCVIO avoidance works because the expression that is NOT selected is guaranteed to be "thrown away", so the NULL pointer is never dereferenced

Finally, remember my mentioning that it was a good idea to only reference macro arguments once if the macro was to be used like a function ? The X Toolkit Intrinsics macro, XtSetArg, doesn't follow this sound advice. It is defined like this:


#define XtSetArg(arg, n, d) \
    ((void)( (arg).name = (n), (arg).value = (XtArgVal)(d) ))

Notice that (arg) is referenced twice, but only appears once in the macro argument list. Hence the intuitive usage


    XtSetArg( argl[narg++], XmNtearOffModel, XmTEAR_OFF_ENABLED );

actually increments narg by two, not one. It therefore has to be used something like this


    XtSetArg( argl[narg], XmNtearOffModel, XmTEAR_OFF_ENABLED ); narg++;

If they had defined it like this


#define XtSetArg(arg, n, d) \
    do { Arg *_targ = &(arg); \
        ( (void)( _targ->name = (n), _targ->value = (XtArgVal)(d) ); ) \
    } while (0)

you would be able to use the argl[narg++] form. This is something to be aware of if your pre or post decrements seem to be behaving strangely. Obviously, you should not actually redefine standard macros, because this can lead to even more confusion. Create your own version, like SETARG if you feel the need.

§11.1 Logical and Relational Operators

At this point I will gratuitously introduce the relational operators.

      C        Means                      Fortran

      >    -  Greater than                ( .GT. )     -
      >=   -  Greater than or equal to    ( .GE. )      | Same precedence
      <    -  Less than                   ( .LT. )      | as each other,
      <=   -  Less than or equal to       ( .LE. )     -  below */+-

      ==   -  Equal                       ( .EQ. )     -  Same as each
                                                        | other, just
      !=   -  Not equal                   ( .NE. )     -  below < etc.

They are left to right associative, and represent sequence points by which side effects of expressions must be complete. E.g.


    if ( x*3 > y ) {
          .
    }

guarantees that x will have been multiplied by three before comparison with y.

A word of caution about the equality operator, == . It is very easy to miss out the second = and this will still be a legal expression. Example:


    if ( x = 3*12 ) {
          .
    }

will always be true. This is because, in C, expressions have a value, propagated right-to-left. So the value of ( x = 3 ), which calculates the right-hand side, 36, and assigns it to x, is 36, which is nonzero and hence always true. So that mistake will cause the if {} body to be always executed, and worse than that you will have unknowingly changed the value of x. To avoid this, some people like to write the test the other way round, e.g.


    if ( 3*12 == x ) {
          .
    }

Now, if you miss of the second = you have an illegal expressions because you cannot assign 3*12 = x, because 3*12 is not an lvalue (a modifiable location or symbol, which can be on the left-hand side of the = sign in an expression).

The logical operators are (in decreasing precedence)


      C        Means                      Fortran

      &&   -  Logical AND                ( .AND. )  

      ||   -  Logical OR                 ( .OR. )  

and are below the relational operators in precedence. Hence the expression


    if ( j > 0 && i*3 > 12  ||  i != k ) ...

is the same as


    if (    (  ( j > 0 )  &&  ( (i*3) > 12 )  )    ||    ( i != k )  ) ...

See K&R II page 52 for operator precedence. Most people don't remember these, but use brackets to make the meaning of more complex expressions quite clear. The ! as a unary operator is similar to Fortran .NOT., so ( !x ) is true if x is equal to zero.

§11.2 Bitwise Operators

There are 6 bit manipulation operators in C, which can only be used with integers (signed or unsigned). These operators are

      << - Left shift, bring in zero bits on right.
      >> - Right shift. Bring in 0s on left for unsigned integers,
           implementation defined for signed integers.
      ~  - One's complement. Unary operator, changes 0s to 1, 1's to 0 
      &  - Bitwise AND, do not confuse with relational &&
      |  - Bitwise INclusive OR, do not confuse with relational ||
      ^  - Bitwise EXclusive OR

Here are some examples:


    i = i << 2; /* Multiply by 4 */
    i <<= 2;    /* Same as above */

    mask |= MSK_RW; /* Set the bits in mask that are set in MSK_RW */

    valuemask = GCForeground|GCBackground; /* Set the bits that are the OR of */
                                           /* GCForeground and GCBackground   */

    mask = ~opposite; /* mask is complementary bit pattern to opposite */

    mask |= 1UL << MSK_R_V; /* Shift unsigned long 1 left MSK_R_V bits */
                            /* and set that bit in mask                */

These are very useful for setting and unsetting flag bits, but you must be aware of the size of object that you are dealing with. By their very nature, bitwise operators can make code more unportable.

      Programming Challenge 7
      _______________________
      
        Use the bitwise  operators  to  determine  what your machine  does
      with  a  right   shift   of  a  negative  integer.  Write  some  bit
      manipulation and checking  functions.  Check  the  priority  of  the
      bitwise  operators and see  how this affects the  bracketing of your
      tests and expressions.

§12 Function Prototypes

Here is an example program to determine whether signed char or unsigned char is the default on your machine. This example introduces function prototypes. These are very useful, and whenever you write a set of functions you should ALWAYS create a .h (header) file with the prototypes for those functions. The reason prototypes are so useful is that they allow the compiler to check that you are calling a function with the right number of arguments, and that the arguments themselves are of the correct type. You should NEVER ignore warnings about argument numbers or types, and you should only cast (see later, but briefly, the "cast" (float)3 is like the Fortran FLOAT(3) ) if you are absolutely sure what you are doing !

Notice that the function prototype (for power in this example) is exactly the same as the function header, but with a ; where the body of the function would go. The arguments named in the prototype are optional, so we could have declared "int power( int , int );" . Don't ever do this. Give the arguments either the same names as those in the function definition, or maybe a more verbose name, so that someone looking at your header file with your function prototypes can easily work out how they are meant to be called.


/*---- To sign or not to sign, that is the example ("charsign.c") ------------*/

/* ANSI C Headers */
#include <limits.h>
#include <stdio.h>
#include <stdlib.h>

/* Function prototypes */
int power( int base, int n );

/* Main Program starts here */
int main( int argc, char *argv[] )
{
    char c;
    unsigned char uc;
/*  End of declarations ... */

/*  Set the top bit of both characters */
    c = power( 2, (CHAR_BIT-1) );
    uc = power( 2, (CHAR_BIT-1) );
 
/*  Shift them both right by one bit, >> is the right shift operator */
    c >>= 1;
    uc >>= 1;
/*  Check for equality - check out the ? ternary operator ! */
    printf("Your computer has %ssigned char.\n", ( c == uc ) ? "un" : "" );

    exit(EXIT_SUCCESS);
}

/*---- Function to raise integer to a power, nicked from K&R II, page 25 -----*/
int power( int base, int n )
{
    int i, p;
/*  End of declarations ... */
    p = 1;
    for ( i = 1; i <= n; i++) {
      p = p * base;
    }
    return( p );
}

In this example, power is a function that returns an int. You can return any type except an array. However, you can return structures (which might contain an array). Similarly, you can pass structures as arguments. In general, it is best to avoid passing or returning structures, because there may be extra overhead due to structures being larger than machine registers, hence they are often passed on the stack. Return or pass a pointer instead. DON'T return a pointer to a function-local, automatic object ! Either make the user pass you a maximum size and some memory into which you can write your structure/array, or malloc() it and return that. In the latter case you should document somewhere that it is up to the user to free() the memory when they are done with it.

If your function doesn't actually return a value, like a Fortran SUBROUTINE, it is declared as void. The void keyword is also used to indicate that a function takes no arguments, for example:


    void initialize_something(void);

would be used like this

    initialize_something();

The brackets are necessary, even though there are no arguments, so the compiler can tell that you intend to call a function.

      Programming Challenge 8
      _______________________
      
        Hack the "charsign.c" example to  try  and call power()  with  the
      wrong type of  argument (you might  declare a float variable and use
      that). See what  compiler message you  get. Call it with  the  wrong
      number of  arguments (but  leave the prototype  unchanged). Create a
      new function,  powerf() that lets you raise a floating  point number
      to  any  power. Try  HELP  CC  RUN-TIME_FUNCTIONS LOG  and  HELP  CC
      RUN-TIME_FUNCTIONS  EXP  for  clues.  The  print  format  conversion
      character  for a floating point  number  in printf  is "%f". Compile
      your program and wonder why you get the error
      
        %CC-I-IMPLICITFUNC,  In  this statement, the  identifier "exp"  is
      implicitly declared as a function.
      
        Remember  that when  you typed HELP CC  RUN-TIME_FUNCTIONS EXP  it
      told you to stick "#include <math.h>" in your program. Put it in and
      the error  should  go  away. If  you  chose  to  make your  function
      something like
      

        float powerf( float base, float exp);

      
        Think about the fact that exp() didn't whinge when you passed it a
      float. This is because when arguments are passed (by value always in
      C), they are (if possible) converted,  AS IF  BY  ASSIGNMENT, to the
      type specified in the function prototype. The order of evaluation of
      arguments is unspecified, so never rely on it. See K&R II  pages  45
      and 201-202 for a  detailed description  of this behaviour.  Finally
      when you have your powerf function  working,  think what  a git I am
      for not mentioning the "double pow(double base, double exp);"  which
      also exists in <math.h>.

§13 Casting (without the couch)

It is possible, as mentioned earlier, to call a function or perform an assignment to "the wrong type". This is called casting, and the general form is
    (type_I_want_to_cast_to) expression_I_want_to_cast
For example

    int index;
    float realval;

    index = (int)realval;

Because, as explained in the example, this is done by default when calling functions for which good prototypes have been declared, it is generally only useful if calling older style "Classic C" functions where the arguments types have not been declared. E.g.


    float funcy(); /* We know that this actually takes a double argument */
      .
    float f;
      .
    f = funcy( 100 );         /* Unpredictable result */
    f = funcy( (double)100 ); /* f is 10.0 */

Declarations can be quite complicated, and you should read and understand K&R II, pages 122 to 126. There is a very good set of rules and a diagram for parsing declarations in "Expert C Programming", pages 75 to 78, and I strongly recommend everyone to read this.

Try to avoid casting, except in the circumstances defined above, and possibly when using the RTL function malloc().

§14 File IO Routines and Command Line Arguments

When your C program starts up, it automatically creates three file streams for you. These are known as stdin, stdout and stderr, which are usually the keyboard, the terminal and the terminal again respectively. If you have included <stdio.h> (which you should have) the symbols stdin, stdout and stderr are available for your use. The functions from <stdio.h> that have seen so far, like printf, write to stdout. Others, like scanf, read from stdin. Here is an example of using scanf to read keyboard input.

/*---- Keyboard Input C Example ("input.c") ----------------------------------*/

/* ANSI C Headers */
#include <stdio.h>
#include <stdlib.h>

/* Main Program starts here */
int main( int argc, char *argv[] )
{
    int i;
    float f;
    char string[80];
/*  End of declarations ... */

    printf("Enter a string, a decimal and a real number separated by spaces\n");
    scanf("%s %d %f", string, &i, &f); /* Not good - no string length check */
    printf("You entered \"%s\", %d and %f\n", string, i, f);

    exit(EXIT_SUCCESS);
}

Compile the program and enter some data. Here is some example input and output.

  Enter a string, a decimal and a real number separated by spaces
  Hello 4 3.14159
  You entered "Hello", 4 and 3.141590

Each item is delimited by whitespace (which includes new lines, of course), but you can use a scanset format specifier to overcome this, "%[characters_wanted]". See the DEC C Run-Time Library Reference Manual, Chapter 2, and Table 2-3, and K&R II page 246 for more information on this. The scanf function actually returns an integer value, which is the number of items successfully read in, or the predefined macro value EOF if an error occurred.

Because you can't safely limit string input with scanf (which means you could unintentionally overwrite important memory locations and cause your program to crash by entering a string longer than the memory allocated for it), it is far better to use fgets().

What you do is read a limited length string with fgets(), the prototype for which is char *fgets( char *str, int maxchar, FILE *file_ptr). So if we had a first argument, destination_string, declared as char destination_string[STRING_SIZE], we would use sizeof(destination_string) for maxchar, and stdin as the input FILE stream.


      .
    char destination_string[STRING_SIZE];
    char *pszResult;
      .
    pszResult = fgets( destination_string, sizeof(destination_string), stdin );
    if ( !pszResult ) {
      /* Error or EOF (End Of File) */
      .
    } else {
      if ( destination_string[strlen(destination_string)-1] != '\n' ) {
        /* Length limit means we didn't get all the input string */
        .
      } else {
        /* Got it all - do sscanf() or whatever */
        .
      }
    }

The fgets() function stops reading after the first newline character is encountered, or, if no newline is found, it reads in at most maxchar-1 characters. In either case the string is terminated with '\0'. You can tell whether the length limit cut in by checking if the last character in the string is '\n'. If it isn't, then you may need to read more input. This will require malloc()ing space enough for all the "segments" of the string and perhaps strncat()ing them together. This is left as an exercise for the reader. The resultant string is then parsed with sscanf(), which takes its input from a string; destination_string in this description.

Failure to limit the length of an input string led to the infamous "finger" bug. There is a function similar to fgets, called gets(), which was used in the original finger program (finger is a program which returns information about users on the target system). Never, repeat never, use gets. It reads characters into the string pointed to by its argument, with no length check. This was exploited to overwrite the return address on the stack, and make the "privileged" finger image execute some code (sent as a part of a long string) to create a command shell running with full privileges.

      Programming Challenge 9
      _______________________
      
        Write  a  program  using  scanf,  or  fgets and  sscanf.  Try  out
      different conversion characters,  check the  effects  of giving  bad
      data.

The standard C file system is based on streams, which are rather like unit numbers in Fortran. Streams connect to devices, such as files on disks, terminals or printers. By using file streams, you are shielded from having to do the low level IO. C recognises two types of stream which are text streams and binary streams. Binary streams are written to and read from without any mucking about ! Text streams can have, or not have, implementation defined line feeds and carriage returns. Binary streams are usually used for writing out data in the machine's internal representation, like an array of structures for example. The type is determined when the file is opened with fopen(). The following table show the access modes you can use with fopen().


  r   Open text file reading.
  w   Open or create text file for writing, discard previous contents.
      (creates new version under VMS)
  a   Open or create for appending, write at end of existing data.

  r+  Open text file reading AND writing.
  w+  Open or create text file for update, discard previous contents.
      (creates new version under VMS)
  a+  Open or create for appending, write at end of existing data.

Add a "b" after the access mode letter, and we are talking binary files. Here is an example program to write and read back an array of data structure, which will make everything remarkably clear.


/*---- File IO Example ("fileio.c") ------------------------------------------*/

/*---- Put all #include files here -------------------------------------------*/
/* ANSI Headers */
#include <ctype.h>    /* Character macros */
#include <errno.h>    /* errno error codes */
#include <stdio.h>    /* Standard I/O */
#include <stdlib.h>   /* Standard Library */
#include <string.h>   /* String Library */
#include <time.h>     /* Time Library */

/*---- Put all #define statements here ---------------------------------------*/
#define PROGRAM_VERSION "1.6"
#define TYPE_OF_FILE "TEST_DATA"
#define NDATA_POINTS 10

/*---- Put all structure definitions here ------------------------------------*/
/* Following structures MUST be packed tightly ie. no member alignment -------*/
#ifdef __DECC
#pragma member_alignment save
# pragma nomember_alignment
#endif

struct file_header_s {
  char type[32];
  char version[8];
  char creator[20];
  time_t time;
};
struct data_s {
  short x;
  short y;
  char name[8];
};

#ifdef __DECC
# pragma member_alignment restore
#endif

int main(int argc, char *argv[])
{
    struct data_s *data_ptr;
    struct file_header_s file_header;
    FILE *outfile, *infile;
    int  i, got_answer;
    char answer[8], yeno[4], node_user[20], filename[128];
    char *c_ptr;
    long int ndata = 0, nitems = 0;
/*  End of declarations ... */

/*  Set up a default namne in case user hasn't specified one */
    if (argc < 2) {
      strcpy(filename,"MYDATA.DAT");
    } else {
      strcpy(filename,argv[1]);
    }

/*  Get node name and user name */
#if !defined( _WIN32 )
    sprintf( node_user, "%s%s", getenv("SYS$NODE"), getenv("USER") ); /* VMS */
#else
    sprintf( node_user,"\\\\%s\\%s",getenv("COMPUTERNAME"),getenv("USERNAME"));
#endif
/*  Convert to uppercase */
    c_ptr = node_user;
    while ( *c_ptr = toupper(*c_ptr) ) ++c_ptr; /* toupper lives in <ctype.h> */

/*  Allocate data space and initialize it */
    data_ptr = (struct data_s *)malloc( sizeof(struct data_s)*NDATA_POINTS );
    if ( data_ptr ) {

      ndata = NDATA_POINTS;
      for ( i = 0; i < ndata; i++) {
        data_ptr[i].x = data_ptr[i].y = i;
        sprintf(data_ptr[i].name, "%3.3d,%3.3d", data_ptr[i].x, data_ptr[i].y );
      }

/*    Open the file for writing in binary mode */
      outfile = fopen(filename,"wb");
      if ( outfile != NULL ) {

/*      Set up the header */
        strcpy(file_header.type,TYPE_OF_FILE);
        strcpy(file_header.version,PROGRAM_VERSION);
        sprintf(file_header.creator,"%s",node_user);
        (void)time(&file_header.time);

        printf("Writing out the header\n");

/*      Items Written   Data Pointer   Size in bytes     No. of items  stream */
/*        |                   |             |                |            |   */
/*        v                   v             v                v            v   */
        nitems = fwrite( &file_header, sizeof(file_header),  1,      outfile);
        if ( ferror(outfile) ) {
          fprintf(stderr,"Error writing file 'header':\n%s",strerror(errno));
        }

        printf("Writing out the number of data items, %d\n", ndata);
        nitems = fwrite( &ndata, sizeof(ndata), 1, outfile);
        if ( ferror(outfile) ) {
          fprintf(stderr,"Error writing number of data items:\n%s",
                  strerror(errno));
        }

        printf("Writing out the actual data data all in one chunk\n");
        nitems = fwrite( data_ptr, sizeof(struct data_s)*ndata, 1, outfile);
        if ( ferror(outfile) ) {
          fprintf(stderr,"Error writing data:\n%s",strerror(errno));
        }

        printf("Closing output file\n");
        fclose(outfile);

      } else {
        fprintf(stderr,"Error creating data file %s:\n%s", filename,
                strerror(errno));
      }
    } else {
      fprintf(stderr, "Couldn't allocate space for %d data structures\n",ndata);
    }    

/*  Now optionally read the data back in and format */

    do {
      printf("\nRead data back in ? [Y/N]: ");
      fgets( yeno, sizeof(yeno) , stdin); /* Reads in sizeof(yeno)-1 chars */
      got_answer = sscanf( yeno, "%[YyNnTtFf]", answer);
    } while ( !got_answer );

    if ( answer[0] == 'Y' ||  answer[0] == 'y' ||
         answer[0] == 'T' ||  answer[0] == 't' ) {
      printf( "Here we go ..\n" );

/*    Zero out the structures just to show there's no cheating */
      ndata = 0;
      memset( &file_header, 0 , sizeof( file_header ) );
      memset( data_ptr, 0 , sizeof(struct data_s)*NDATA_POINTS );

/*    Open the file for reading in binary mode */
      infile = fopen(filename,"rb");
      if ( infile != NULL ) {

        printf("Reading in the header\n");
        nitems = fread(&file_header,sizeof(file_header),1,infile);
        if ( ferror(infile) ) {
          fprintf(stderr,"Error reading file 'header':\n%s",strerror(errno));
        }

        printf("Header information:  file type %s\n", file_header.type ); 
        printf("                       version %s\n", file_header.version ); 
        printf("                    created by %s\n", file_header.creator ); 
        printf("                            on %s\n", ctime( &file_header.time ) ); 

        nitems = fread( &ndata, sizeof(ndata), 1, infile);
        printf("Read in the number of data items, %d\n", ndata);
        if ( ferror(infile) ) {
          fprintf(stderr,"Error reading number of data items:\n%s",
                  strerror(errno));
        }

        printf("Reading in the actual data data all in one chunk\n");
        nitems = fread( data_ptr, sizeof(struct data_s)*ndata, 1, infile);
        if ( ferror(infile) ) {
          fprintf(stderr,"Error reading data:\n%s",strerror(errno));
        }

        printf("Closing intput file\n\n");
        fclose(infile);

        printf("Read in %d data items\n", ndata);
        for ( i = 0; i < ndata; i++) {
          printf("%3d) x:%3d, y:%3d, Label: %s\n",
                 i, data_ptr[i].x, data_ptr[i].y, data_ptr[i].name );
        }

      } else {
        fprintf(stderr,"Error opening data file %s:\n%s", filename,
                strerror(errno));
      }

    } else {
      printf( "OK - be like that\n" );
    }

    exit(EXIT_SUCCESS);
}

The strange #pragma directive is a standard way to do non-standard things, like instruct the compiler to close-pack the data (i.e. don't use natural alignment) and is explained in §17. The "f" routines you can look up yourself in a book. Several DEC system and library routines are used, so the VMS headers <ssdef.h>, <starlet.h> and <lib$routines.h> are included. Notice that the function calls are lowercase, and there are no prototypes defined yet, so you are on your own there if you get an argument wrong ! This should change with future releases of DEC C. The strerror routine is useful for getting a text error message. After many library calls, not just stdio calls, an integer expression, errno, defined in <errno.h>, yields a non-zero value if an error occurs. Be very careful making assumptions about errno, because in many cases it isn't actually a variable but a macro, which allows it, for example, to behave in a thread-safe manner. This means, however, that it isn't safe to treat it as a global integer variable and take it's address and so forth.

The strerror function from <string.h> converts the error number into a text string, and the function value is a pointer to this string. You must not modify this string, and it will be overwritten by later calls to strerror. This program reads in many bytes at a time with each read, but there is a function, fgetc, to read a single character, and a matching function, ungetc which returns the last read character to the input stream to be read by the next fgetc. This provides a kind of look-ahead function which is exploited by the next example program, "calc.c", provided by Neill Clift. In this program, getchar is used instead of fgetc. It is equivalent to fgetc except that it reads from stdin. This sturdy example is adapted from the very expression parser used by LID (a Y.R.L. replacement for DEC's CDD; contact [email protected] if you are interested).


/*---- Calculator expression evaluator example ("calc.c") --------------------*/
/*
        History:
        Version         Name                    Date
        V01-001         Neill Clift             16-Mar-1995
                        Initial version
*/

/* ANSI Headers */
#include <ctype.h>
#include <stdio.h>
#include <stdlib.h>

/* Function prototypes */
int parse_expression(void);
int parse_expression_factor(void);
int parse_expression_term(void);
int parse_literal(void);
int getnonwhite(void);
int match_token(int tomatch);

/* Start of main program */
int main( int argc, char *argv[])
{
  int val;
/*  End of declarations ... */

  printf("Enter expression terminated by ;\n");
  printf("Calc> ");
  val = parse_expression();
  if (match_token(';')) {
    printf("Result is %d\n",val);
  } else {
    printf("Expression seems bust\n");
  }
  exit(EXIT_SUCCESS);
}

/*---- Get the next character skipping white space ---------------------------*/
int getnonwhite(void)
{
    int c;
/*  End of declarations ... */

    while (1) {
      c = getchar();
      if (c == EOF) {
        break;
      } else if (!isspace(c)) {
        break;
      }
    }
    return( c );
}

/*--- Match single character against next input char. If match then gobble ---*/
/*--- it up. If we don't match it then push it back for future matches -------*/
int match_token(int tomatch)
{
    int c;
/*  End of declarations ... */

    c = getnonwhite();
    if (c == tomatch) {
      return( 1 );
    } else {
      ungetc(c,stdin); /* Put character back on input stream to be read again */
      return( 0 );
    }
}

/*---- Parse a single number from input +/- nnnnn  ---------------------------*/
int parse_literal(void)
{
    int retval,st;
/*  End of declarations ... */

    retval = 0;

    st = scanf("%d",&retval);
    if (st == EOF) {
      printf("Hit EOF looking for literal\n");
    } else if (st == 0) {
      printf("Missing literal\n");
    }

    return( retval );
}

/*---- Syntax is:   Literal    or:   (expression)  ---------------------------*/
int parse_expression_factor(void)
{
  int retval;
/*  End of declarations ... */

  if (match_token('(')) {
    retval = parse_expression();
    if (!match_token(')'))
      printf("Missing close bracket\n");
  } else {
    retval = parse_literal();
  }

  return( retval );
}

/*---- Parse an expression term,  Syntax: <factor>{<multiplying_op><factor>} -*/
int parse_expression_term(void)
{
  int tmp,mul,opr;
/*  End of declarations ... */

  tmp = parse_expression_factor();

  while (1) {
    if (match_token('*')) {
      opr = 1;
    } else if (match_token('/')) {
      opr = -1;
    } else {
      break;
    }

    mul = parse_expression_factor();
    if (opr == 1) {
      tmp = tmp * mul;
    } else if (mul == 0) {
      printf("Division by zero!\n");
    } else {
      tmp = tmp / mul;
    }
  }
  return( tmp );
}

/*---- Parse an expression, Syntax: [+/-]<term>{<adding_op><term>} -----------*/
int parse_expression(void)
{
  int tmp,mul,add;
/*  End of declarations ... */

  /* Check for leading + or -. None means plus */

  if (match_token('+')) {
    mul = 1;
  } else if (match_token('-')) {
    mul = -1;
  } else {
    mul = 1;
  }

  tmp = parse_expression_term();
  tmp = tmp * mul;

  while (1) {
    if (match_token('+')) {
      mul = 1;
    } else if (match_token('-')) {
      mul = -1;
    } else {
      break;
    }
    add = parse_expression_term();
    tmp = tmp + mul * add;
  };

  return( tmp );
}

A few other file routines are worth mentioning. These are fgetpos, fsetpos and fseek, which generally apply to files open in binary mode. They allow you to position to a particular byte within a file, specified from the current position, or the beginning or end of the file. The fgetpos function returns the position in an object of type fpos_t, which is only meaningful when used with fsetpos. See K&R II page 248 for more details. The fflush function allows you to flush cached data on an output stream if you want to do this before fclose, which flushes anyway, as does exit(). To make sure that data is actually written to disk you must call a non-standard function like fsync after fflush - fflush on it's own doesn't guarantee that the data has actually been written to permanent storage. Streams can be redirected using freopen, and this is a commonly used method for making stdout get written to a file without having to change printf calls. Testing for end of file is achieved by using the feof function.

A standard method for getting command line arguments is provided in C. These are the arguments to the main program, argc and argv. The integer argc is the number of command line arguments, and must be greater than or equal to zero. The second argument, argv, is an array of pointers to characters. If argc is zero, then argv[0] must be the NULL pointer. On most implementations, it will be greater than zero, and argv[0] points to the program name. On some machines this will be a string like "myprog". On VMS or Windows NT systems it is the full file specification. If the program name is not available, argv[0] must point to the null character, '\0'. The elements argv[1] to argv[argc-1], if they exist, point to strings which are implementation defined. In practise, these are usually the whitespace separated (unless "quoted") arguments supplied on the command line. Under VMS, command line arguments are converted to lowercase, unless quoted. The following example, "args.c", shows how to get the command line arguments.


/*---- Getting Command Line Arguments C Example ("args.c") -------------------*/

/* ANSI C Headers */
#include <stdio.h>
#include <stdlib.h>

/* Main Program starts here */
int main( int argc, char *argv[] )
{
    int i;
/*  End of declarations ... */

    for ( i = 0; i < argc; i++ ) {
      printf("Argument %d = \"%s\"\n", i, argv[i] );
    }

    exit(EXIT_SUCCESS);
}

In order to make it work, you must either run it like this (assuming you are in the directory containing the image)

$ MC SYS$DISK:[]ARGS HELLO WORLD

or define a symbol, and invoke it as a "foreign command"

$ args:==$SYS$DISK:[]ARGS
$ args hello world again
      Programming Challenge 10
      ________________________
      
        Try the args  program.  With your  new-found skills, modify Neill's
      calc program to take a command line argument expression, or to behave
      in the existing manner if one is not supplied.

On VMS systems, we often want to access keyed indexed files. This is slightly more difficult than using the standard file functions, because you have to set up the RMS (Record Management Structures) yourself. If you want to do this, you should really read the DEC C documentation about using RMS from C. Alternatively you can ask your friendly local Clift for an example. Here's one we prepared before the course, "key.c"


/*---- Keyed Index File C Demonstration Program ("key.c") --------------------*/
/*
        History:
        Version         Name                    Date
        V01-001         Neill Clift             09-Mar-1995
                        Initial version
*/

/* ANSI Headers */
#include <stddef.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>

/* VMS Headers */
#include <rab.h>                /* RMS RAB */
#include <rms.h>                /* RMS access blocks etc */
#include <ssdef.h>              /* System service completion codes */
#include <starlet.h>
#include <lib$routines>

/* Defines and Macros */
#define NAME_SIZE  32        /* Size of person field */
#define PHONE_SIZE 20        /* Size of the phone number field */

/* Structure declarations */

struct phone_r {
/* Define the structure that will be the record of the keyed file. */
/* It is indexed  with two keys for each of the structures fields. */
   char name[NAME_SIZE];
   char phone[PHONE_SIZE];
};

/* Global Variables - not externally visible */
static struct FAB    fab;        /* FAB for file  */
static struct RAB    rab;        /* RAB for file */
static struct NAM    nam;        /* NAM block to report I/O errors nicely */
static struct XABKEY xabkey1, xabkey2;     /* XAB to define keys structure */

static char essbuf[NAM$C_MAXRSS];   /* Expanded file name */
static char rssbuf[NAM$C_MAXRSS];   /* Resultant file name */

static char keyname1[32] = "Person"; /* Name of first key */
static char keyname2[32] = "Phone";  /* Name of second key */

/*---- Routine to close the RMS file -----------------------------------------*/
int close_file( void )
{
   long int status;
/* End of declarations ... */

   status = sys$close( &fab );
   return( status );
}

/*---- Open/Create the keyed index file --------------------------------------*/
int create_file( char *filename )
{
   long int status, status1;
/* End of declarations ... */

   fab = cc$rms_fab;                    /* initialise the FAB */
   fab.fab$l_alq = 100;                 /* Preallocate space */
   fab.fab$w_deq = 100;
   fab.fab$b_fac = FAB$M_PUT|FAB$M_DEL|FAB$M_UPD;
   fab.fab$l_fop = FAB$M_DFW|FAB$M_CIF;
   fab.fab$b_org = FAB$C_IDX;
   fab.fab$b_rfm = FAB$C_VAR;
   fab.fab$l_fna = filename;
   fab.fab$b_fns = strlen(filename);
   fab.fab$b_rat = FAB$M_CR;
   fab.fab$l_xab = (char *) &xabkey1;

   /* Init XABKEY to define key for name key */
   xabkey1 = cc$rms_xabkey;             /* Initialise XABKEY structure */
   xabkey1.xab$b_bln  = XAB$C_KEYLEN;
   xabkey1.xab$b_cod  = XAB$C_KEY;
   xabkey1.xab$b_dtp  = XAB$C_STG;
   xabkey1.xab$b_ref  = 0;              /* Key zero */
   xabkey1.xab$l_knm  = (char *) &keyname1; /* Key name */
   xabkey1.xab$l_nxt  = (char *) &xabkey2;  /* Next XAB in chain */
   /*
      The next two fields describe the section of the record that contain the
      key.
   */
   xabkey1.xab$w_pos0 = offsetof(struct phone_r, name);
   xabkey1.xab$b_siz0 = NAME_SIZE;

   /* Init XABKEY to define key for phone kek */
   xabkey2 = cc$rms_xabkey;             /* Initialise XABKEY structure */
   xabkey2.xab$b_bln  = XAB$C_KEYLEN;
   xabkey2.xab$b_cod  = XAB$C_KEY;
   xabkey2.xab$b_dtp  = XAB$C_STG;
   xabkey2.xab$b_ref  = 1;              /* Key one */
   xabkey2.xab$l_knm  = (char *) &keyname2;
   xabkey2.xab$l_nxt  = 0;
   xabkey2.xab$w_pos0 = offsetof(struct phone_r, phone);
   xabkey2.xab$b_siz0 = PHONE_SIZE;

   /*
      Init NAM block just for good file I/O error reporting. We won't use it
      thought!
   */

   nam = cc$rms_nam;
   fab.fab$l_nam = &nam;
   nam.nam$b_rss = sizeof( rssbuf );
   nam.nam$l_rsa = (char *) &rssbuf;
   nam.nam$b_ess = sizeof( essbuf );
   nam.nam$l_esa = (char *) &essbuf;
   status = sys$create( &fab );
   if (!(status&SS$_NORMAL))
      return( status );

   rab = cc$rms_rab;                    /* initialise the RAB */
   rab.rab$b_mbf = 127;
   rab.rab$b_mbc = 127;
   rab.rab$l_rop = RAB$M_WBH|RAB$M_RAH;;
   rab.rab$l_fab = &fab;
   status1 = sys$connect( &rab );
   if (!(status1&SS$_NORMAL)) {
      status = status1;
      sys$close( &fab );
   };
   return( status );
}

/*---- Write a record to the file --------------------------------------------*/
int put_record( char *name, char *phone )
{
   long int status;
   struct phone_r phonerec;
/* End of declarations ... */

   strncpy( phonerec.name,  name,  sizeof(phonerec.name) );
   strncpy( phonerec.phone, phone, sizeof(phonerec.phone) );
   rab.rab$w_rsz = sizeof( phonerec );
   rab.rab$l_rbf = (char *) &phonerec;
   rab.rab$b_rac = RAB$C_KEY;
   rab.rab$l_rop |= RAB$M_UIF;

   status = sys$put( &rab );

   return( status );
}

/*---- Look a record up by name ----------------------------------------------*/
int get_record( char *name, struct phone_r *phonerec )
{
   long int status;
/* End of declarations ... */

   rab.rab$w_usz = sizeof( *phonerec );
   rab.rab$l_ubf = (char *) phonerec;
   rab.rab$b_ksz = strlen( name );
   rab.rab$l_kbf = (char *) name;
   rab.rab$b_krf = 0;
   rab.rab$b_rac = RAB$C_KEY;
   rab.rab$l_rop |= RAB$M_UIF;

   status = sys$get( &rab );

   return( status );
}

/*---- Main Program starts here ----------------------------------------------*/
int main( int argc, char *argv[] )
{
   long int status;
   struct phone_r phn;
/* End of declarations ... */
   
   printf("Creating phone.dat ...\n");
   status = create_file("phone.dat");
   if (!(status&SS$_NORMAL)) lib$signal( status );

   printf("Add record NEILL - 555 555 1417\n");
   status = put_record ("NEILL", "555 555 1417");
   if (!(status&SS$_NORMAL)) lib$signal( status );

   printf("Add record PHIL - 555 555 6506\n");
   status = put_record ("PHIL", "555 555 6506");
   if (!(status&SS$_NORMAL)) lib$signal( status );

   printf("Look record for PHIL\n");
   status = get_record ("PHIL", &phn);
   if (!(status&SS$_NORMAL)) {
      lib$signal( status );
   } else {
      printf("Found %s - %s\n", phn.name, phn.phone );
   }

   status = close_file();
   if (!(status&SS$_NORMAL)) lib$signal (status);

   exit(EXIT_SUCCESS);
}

This will write out a data file, PHONE.DAT, then do a lookup and find a record keyed on the name. Try expanding the file with a few more records, and experiment with the lookup. Use this file to create your own database, with different types of keys.

§15 Miscellaneous Library Routines

The C standard specifies that the following headers and their related library routines must be available:

       <assert.h>        <locale.h>        <stddef.h>
       <ctype.h>         <math.h>          <stdio.h>
       <errno.h>         <setjmp.h>        <stdlib.h>
       <float.h>         <signal.h>        <string.h>
       <limits.h>        <stdarg.h>        <time.h>

Obviously you should avoid using these names for your own header files. In addition, C++ also has <new.h> and <iostream.h>, so don't use these either. There are far too many routines in the standard libraries for me to describe them all, which is as good an excuse as any not to bother. I will, however, present a few program examples or program fragments for the more commonly used routines. You should refer to the DEC C Run Time Library Manual, using BookReader or MGBOOK to get the latest version, and familiarize yourself with what is available. VMS provides several Unix style functions in <unixlib.h> and <unixio.h>. To be strictly accurate, these are nonportable, but they are available on many Unix systems. One such function is getenv, which, in Unix land, gets the string value of "environment variables", which roughly correspond to DCL symbols, or logicals names. On the VAX or Alpha, the getenv function first looks for a logical name match, and returns the translation if it finds one, else it looks for a local symbol with the same name and returns the definition, or, if that wasn't found it looks for a global symbol. Compile and run "symbols.c" and try defining MYSYM as a symbol, and as a logical name, and see what output you get.


/*---- Getting Symbols or Logical Names C Example ("symbols.c") --------------*/

/* ANSI C Headers */
#include <stdio.h>
#include <stdlib.h>

/* Main Program starts here */
int main( int argc, char *argv[] )
{
    char *s_ptr;
/*  End of declarations ... */

    printf("MYSYM = \"%s\"\n", ( s_ptr = getenv("MYSYM")) ? s_ptr : "" );

    exit(EXIT_SUCCESS);
}

Getting the time is another commonly required function, and C provides a number of standard routines for this purpose. An example program demonstrates the use of various time routines, including strftime, which is very flexible in letting you form a formatted time string.


/*---- Getting the time ("time.c") -------------------------------------------*/

/* ANSI C Headers */
#include <stdio.h>
#include <stdlib.h>
#include <time.h>

/* Main Program starts here */
int main( int argc, char *argv[] )
{
    time_t c_time;
    struct tm *c_time_rec;
    char string[80];
/*  End of declarations ... */

/*  Get the current time */
    time(&c_time); /* c_time now contains seconds since January 1, 1970 */

/*  Convert this to a string using the ctime() function */
    printf("%s\n", ctime( &c_time ) );

/*  Split the time into it's components - hours, minutes, seconds etc. */
    c_time_rec = localtime( &c_time );

/*  Selectively copy day of week into string */
    strftime( string, sizeof(string)-1, "Today is %A", c_time_rec );

/*  Print out the day of the week */
    printf("%s\n", string );

    exit(EXIT_SUCCESS);
}

It is important to use the standard defined types for time variables, like time_t, and not use, say, unsigned int because "you know that's what it is". I say this because some systems are going to have a 2038 bug, a bit like the millenium bug, caused by the number of seconds since January 1, 1970 exceeding the storage capacity of the type currently used for time_t. Compiler vendors might well change time_t to be something completely different in future, like a 64 bit quantity or perhaps a structure. If you have used time_t throughout your code, a simple recompilation and relink will be all that you need to do, and this will avoid your name being cursed by later generations of programmers, or your being ejected into space by HAL ;-)

String-to-number conversion is a topic that frequently crops up in the comp.lang.c Usenet news group. You have already met sprintf, which is the number-to-string converter. Other functions, like atoi and atof convert strings to ints and floats, and strtod and strtol convert strings to double and long respectively. Random integer numbers, seeded using srand, can be obtained using the rand function, and if you want floating point numbers, cast the result of rand to float and divide by the macro value RAND_MAX, also cast to float. Searching and sorting routines, bsearch and qsort are also provided. They expect you to pass a pointer to a function which is used to determine whether objects compare equal, greater than or less than, then they do the sorting or searching, using your function for the comparison.

The equivalent of LIB$SPAWN is the system() function. The argument is either a command shell command, or NULL to test whether a command shell is available.


     if ( system(NULL) ) {
/*     Do VMS command */
       system("DIRECTORY")
     } else {
       fprintf(stderr,"Sorry - no command shell available !");
     }

The atexit function lets you register exit handlers in FILO order, which are called when the program exits. These are useful for tidying up resources, even if some deeply nested subroutine calls exit(). Handlers for other types of condition can be registered using the signal() function. In Unix, signals are a bit like AST notifications. They range from SIGALRM, which lets you know when an alarm set by the alarm function has gone off, to SIGINT which can be used to trap Ctrl C. A pair of functions, setjmp and longjmp provide one of the nearest thing I have seen to the mythical "comefrom" statement ! An example showing you how to trap Ctrl C will demonstrate. This program includes <signal.h> and <setjmp.h>, and stores the position to which we want to return in saved_position, which is of type jmp_buf. You should never actually "look" at this, because it is only meaningful as an argument to longjmp. The second parameter to longjmp is an non-zero integer, which will be returned as the value of setjmp when we come back to it from the longjmp. Called directly, setjmp returns zero. The example "signal.c" should make signal handling and longjmping clearer.


/*---- Signal and longjmp Example ("signal.c") -------------------------------*/

/* ANSI C Headers */
#include <setjmp.h>
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>

#if defined( _WIN32 )
# include "windows.h"
#endif

/* Defines and macros */
#define MAX_COUNT 5

/* Global variables */
jmp_buf saved_position;

/* Function prototypes */
void ctrl_c_handler( int scode );

/* Main Program starts here */
int main( int argc, char *argv[] )
{
    int icount;
/*  End of declarations ... */

    icount = 0;

    signal( SIGINT, ctrl_c_handler );

    if ( !setjmp(saved_position) ) {
      for ( ; icount < MAX_COUNT; ++icount ) {
        printf("At main line - looping #%d - enter Ctrl/C\n", icount);
#if !defined( _WIN32 )
        sleep( 10 );
#else
        Sleep( 10000 );
#endif
      };
    } else {
      printf("Returned from Ctrl C handler - exiting early\n");
    }

    exit(EXIT_SUCCESS);
}

void ctrl_c_handler( int scode )
{
/*  End of declarations ... */

    if ( scode == SIGINT ) {
      printf("Handling Ctrl C - return to position saved by setjmp()\n");
      longjmp( saved_position, 1 ); /* Use any non-zero number */
    } else {
      printf("Strange - Ctrl C handler called with wrong signal code !\n");
    }
}

To get "signal.c" to link properly, you must compile it with /PREFIX=ALL to properly prefix the nonstandard function call to sleep, a <unixlib.h> function. All the recognised RTL functions are actually prefixed with "DECC$" by the compiler, and this allows the linker to find them automatically at link time. It is a good idea to get into the habit of using /PREFIX=ALL because this will cause warnings to be issued at link time if you inadvertently name any of your own functions so as to clash with inbuilt ones. This example works slightly differently under Windows, because although the Ctrl C handler is called, the program then exits (this is the documented behaviour).

Finally, I will present an example of a function that can be called with variable arguments. The C syntax for variable arguments is to use three dots, ..., to represent the variable arguments. You must however, specify at least one argument at the start of the parameter list. In one way, <stdarg.h> routines are rather inferior to the nonstandard <varargs.h> routines, available on Unix and VMS, because the latter can tell you how many arguments were passed, whereas the former makes you tell the function in some way, like the "%" conversion characters in a printf format string does. Y.R.L. programmers - for some examples of the nonstandard varargs mechanism (which you should avoid using if at all possible) see SRC$OLB:PRINTFILE.C and FIFO.C in GNRC. Here is an example program using the standard stdargs mechanism, called "vargs.c".


/*---- Variable Arguments C Example ("vargs.c") ------------------------------*/

/* ANSI C Headers */
#include <stdarg.h>
#include <stdio.h>
#include <stdlib.h>

/* Defines and macros */
#define END 0

/* Function prototypes */
int add_some_ints( int first, ...);

/* Main Program starts here */
int main( int argc, char *argv[] )
{
    int total;
/*  End of declarations ... */

    total = add_some_ints( 1, 2, 3, 4, END);

    printf("Total was %d\n", total );

    exit(EXIT_SUCCESS);
}

/*---- Rather pathetic function to add up some integers ----------------------*/
int add_some_ints( int first, ...)
{
    va_list ap; /* Use to point to each argument in turn */
    int inext, sum, icount;
/*  End of declarations ... */

    icount = 1;
    sum = first;
    va_start( ap, first ); /* Use last named argument - first in this case */

/*  End of args is marked by END, which is zero */

    while ( inext = va_arg( ap, int ) )  {    /* Second arg of va_arg is type */
     sum += inext;
     ++icount;
    }
    va_end( ap ); /* Tidy up */

    printf("add_some_ints added %d integers\n", icount );

    return( sum );
}

The best way to find out about the run time library functions is to look them up in K&R, or with VMS HELP, then have a go at using them.

§16 Obfuscation

Obfuscation is the art of writing C so that no-one, including the author, has a clue what is going on. Many people have been practising this for years in Fortran, but C's #define definitely gives you the extra scope to really muddy the water. Here are some examples taken from comp.lang.c, compiled by [email protected] (Peter Conrad), of the contest to find the shortest C program to count from a given number i1 up to or down to a second number, i2. To compile these you will need to use CC/DEFINE="o(arg)=printf(""%d"",arg)" , which is a bit of a cheat.

In no particular order (they are all 69 bytes):

[01] main(a,y,d)int*y;{for(a=atoi(y[1]);o(a),d=atoi(y[2])-a;)a+=(d|1)%2;}
[02] main(c,v,x)int*v;{for(c=atoi(v[1]);o(c),x=atoi(v[2])-c;c+=(x|1)%2);}
[03] main(a,v)int*v;{for(a=atoi(*++v);o(a),*v=atoi(v[1])-a;a+=*v>>31|1);}
[04] main(i,v,j)int*v;{for(i=atoi(v[1]);o(i),j=atoi(v[2])-i;i+=(j|1)%2);}
[05] main(c,d){int*v=d;for(c=atoi(v[1]);o(c),d=atoi(v[2])-c;c+=d>>31|1);}
[06] main(d,O,_)int*O;{for(_=atoi(O[1]);o(_),d=_-atoi(O[2]);_-=d>>-1|1);}
[07] main(a,v)int*v;{for(a=atoi(v[1]);o(a),*v=atoi(v[2])-a;a+=*v>>31|1);}
[08] main(a,b)int*b;{for(a=atoi(*++b);o(a),*b=atoi(b[1])-a;a+=(*b|1)%2);}
[09] main(a,b)int*b;{for(a=atoi(b[1]);o(a),*b=atoi(b[2])-a;a+=*b>>31|1);}
[10] main(J,_)int*_;{for(J=atoi(*++_);o(J),*_=atoi(_[1])-J;J+=(*_|1)%2);}
[11] main(i,b,j)int*b;{for(i=atoi(b[1]);o(i),j=i-atoi(b[2]);i-=j>>31|1);}
[12] main(k,j,i)int*j;{for(k=atoi(j[1]);o(k),i=atoi(j[2])-k;k+=(i|1)%2);}
[13] main(n,a,e)int*a;{for(n=atoi(a[1]);o(n),e=atoi(a[2])-n;n+=e>>31|1);}
[14] a;main(c,d)int*d;{for(a=atoi(d[1]);o(a),c=atoi(d[2])-a;a+=(c|1)%2);}
[15] main(n,c,d)int*c;{for(n=atoi(c[1]);o(n),d=atoi(c[2])-n;n+=d>>31|1);}
[16] main(d,a)int*a;{for(d=atoi(a[1]);o(d),*a=atoi(a[2])-d;d+=(*a|1)%2);}
[17] *p;main(i,x){for(i=atoi((p=x)[1]);o(i),x=atoi(p[2])-i;x>0?i++:i--);}
[18] main(c,v,d)int*v;{for(c=atoi(v[1]);o(c),d=atoi(v[2])-c;c+=d>>31|1);}
[19] main(c,v,d)int*v;{for(c=atoi(v[1]);d;o(d>0?c++:c--))d=atoi(v[2])-c;}

These are the winners of the 1. Int'l Kaiserslautern Shortest C Contest (the numbers refer to the programs above):

[01] Lars C. Hassing <[email protected]>
[02] Stefan Bock <[email protected]>
[03] Heather Downs <[email protected]>
[04] Patrick Seemann <[email protected]>
[05] Roland Nagel <[email protected]>
[06] Klaus Singvogel <[email protected]>,
     Michael Schroeder <[email protected]>,
     Markus Kuhn <[email protected]>
[07] Markus Simmer <[email protected]>
[08] Willy Seibert <[email protected]>
[09] Oliver Bianzano <[email protected]>
[10] Jens Schweikhardt <[email protected]>
[11] Thomas Omerzu <[email protected]>,
     Matthias Sachs <[email protected]>,
     Udo Salewski <[email protected]>
[12] Jahn Rentmeister <[email protected]>
[13] Gregor Hoffleit <[email protected]>
[14] John Rochester <[email protected]>
[15] Markus Siegert <[email protected]>
[16] Siegmar Zaeske <[email protected]>
[17] Arnd Gerns <[email protected]>,
     Dirk Eiden <[email protected]>,
     Steffen Moeller <[email protected]>
[18] James C. Hu <[email protected]>
[19] Frank Neblung <[email protected]>

I think you would agree that these would not be much fun to maintain ! Another favourite pastime for obfuscators is the self-producing program. Here is one of my favourites - strictly ANSI compliant, including all headers, by Ashley Roll.


#include <stdio.h> /* Ashley Roll [email protected] */
main(){char *s,*a,*t;for(t=s="7}|wzarq6*gbr}{<~,6;@6Ug~zqm6H{zz6uh{zzVsaw}g<\\\
w}b<sa<qra<ua6@; yu}|>=ow~uh6@g:@u:@b/t{h>b)g)8_8/@g/g??= }t>@g))3K3=t{h>u)b\\\
/@u/u??=o@u))3JJ3+fh}|bt>8JJJJJJJ|8=0fabw~uh>@u=/i qzgq@g5)3JJ3+fabw~uh>@g9%\\\
L(%=0%/i6;@6Sh}tt}b~6A|}dqhg}bm:6Xh}gxu|q6Uagbhuz}u6@; ";*s;s++)
if(*s=='_')for(a=t;*a;a++){*a=='\\'?printf("\\\\\\\n"):putchar(*a);}
else*s!='\\'?putchar(*s-1^21):1;} /* Griffith University, Brisbane Australia */

This is in the course directory, called "self.c". As a final treat, here is a festive Christmas program, "xmas.c", by [email protected] (Brendan Hassett), based on the more traditional, but equally obfuscated program by Ian Phillipps.


/*
From: [email protected] (Brendan Hassett)
+------------------------------------------------------------------+
| Brendan Hassett      Tel +353-902-74601 ext 1109    ECN 830-1109 |
| [email protected]   [email protected]     ~~~   |
| Ericsson Systems Expertise Ltd,  Athlone,  Ireland,  EU. ( o o ) |
+--------------------------------------------------------ooO-(_)-Ooo
#include <disclaimer.h>
*/

/*
  Based on an original program by Ian Phillipps,
  Cambridge Consultants Ltd., Cambridge, England
*/

#include <stdio.h>
#define __ main

__(t,_,a)
char
*
a;
{
	return!

0<t?
t<3?

__(-79,-13,a+
__(-87,1-_,
__(-86, 0, a+1 )


+a)):

1,
t<_?
__(t+1, _, a )
:3,

__ ( -94, -27+t, a )
&&t == 2 ?_
<13 ?

__ ( 2, _+1, "%s %d %d\n" )

:9:16:
t<0?
t<-72?
__( _, t,
"k,#n'+,#'/\
*{}w+/\
w#cdnr/\
+,{}r/\
*de}+,/\
*{*+,/\
w{%+,/\
w#q#n+,/\
#{l,+,/\
n{n+,/\
+#n+,/#;\
#q#n+,/\
+k#;*+,/\
'-el' ))# }#r]'K:'K n l#}'w {r'+d'K#!/\
+#;;'+,#K'{+w' '*# +e}#]!/\
w :'{+w'nd+'we))d}+#r]!/\
c, nl#'+,#'rdceK#n+ +{dn]!/\
-; K#'{+'dn'+,#', }rk }#]!/\
*{nr' 'k :' }denr'{+]!/\
w :'+,#:'n##r' n'e)l} r#]!/\
}#{nw+ ;;'+,#'wd*+k }#]!/\
 w['*d}' 'reK)]!/\
ew#' 'r#-ell#}]!/\
+}:'+d'}#)}drec#'{+]!/\
 w['+,#K',dk'+,#:'r{r'{+]!/\
w##'{*'{+', ))#nw' l {n(!!/")
:
t<-50?
_==*a ?
putchar(a[31]):

__(-65,_,a+1)
:
__((*a == '/') + t, _, a + 1 )
:

0<t?

__ ( 2, 2 , "%s")
:*a=='/'||

__(0,

__(-61,*a,
"!ek;dc i@bK'(q)-[w]*\
%n+r3#l,{}:\nuwloca-O\
;m .vpbks,fxntdCeghiry")

,a+1);}

Beat that. This evokes a slight whinge in DEC C's strictest ANSI mode, because main takes more than two parameters, but compile, link, run and enjoy !

§17 Changing from VAX C or K&R C to ANSI Compliant DEC C

The main change with DEC C for VAX or Alpha is that it is much stricter than K&R/VAX C. I have found this to be a very good thing (apart from the pain of bug fixing !) because it has brought to light hitherto undiscovered errors - the type of thing that causes intermittent faults ! You tend to get a lot of informational messages about "implicitly declared functions" (like exit) where people have been lazy and not included <stdlib.h> and <stdio.h>. You can always turn informational warnings off, but this is a bit suspect because you then don't get alerted to other problems that might be important. I usually just ensure that every bit of code has

#include <stdlib.h>
#include <stdio.h>

at the top. Make sure that all exit()s are exit(EXIT_SUCCESS) or exit(EXIT_FAILURE). Avoid "magic number" exit codes. Unix programmers often think that everyone understands that exit(0) means no problems, and often don't use the ANSI exit(EXIT_SUCCESS) which will be correct for any O.S.. If you specifically want to return a VMS status (other than normal successful completion) then use #ifdef __VMS for the VMS specific part where possible. In a nutshell, make sure that you write ANSI C. All the programs that I have converted to ANSI C have still worked under VAX C, provided I use #pragma in certain places (see later). I prefer #pragma solutions to just adding global qualifiers, because they are closer to the machine-specific code, and flag the fact that you are doing something special.

§17.1 Using Object Libraries

On VMS platforms you might well be putting several main programs in one library and hence want to use the MAIN_PROGRAM macro to show that myprog(...) is really main(...) . The use of MAIN_PROGRAM is nonportable, and hence the compiler will whinge. Similarly you might be using variant_unions for, say, VMS item lists , which are so VMS specific that you are happy to use the nonportable extension in that (limited) portion of code because it is convenient. To stop complaints about this sort of thing, I do the following (which works with VAX C and DEC C) ...

#ifdef __VMS
#  ifdef __DECC
#    pragma message save
#    pragma message disable portable
#  endif

int myprog(int argc, char *argv[])
MAIN_PROGRAM /* VMS specific macro to identify main() */

#  ifdef __DECC
#    pragma message restore
#  endif
#else

int main(int argc, char *argv[]) /* Standard C version */

#endif

Here is an example of a VMS item list for system service calls ...


#ifdef __DECC
# pragma message save
# pragma message disable portable
#endif
/*  VMS Item List structure */
struct item_list {
  variant_union {
    variant_struct {
      short int w_buflen;
      short int w_code;
      long int *l_bufptr;
      long int *l_retlenptr;
    } list_structure;
    int end_list;
  } whole_list;
};
#ifdef __DECC
# pragma message restore
#endif

Obviously, completely nonportable constructs should be used sparingly, because although we have turned the error off, they are still nonstandard !

§17.2 Structure Alignment

Watch out for arrays of structures, or structures populated by data reads from disk. DEC C for Alpha likes to have structure members aligned on natural boundaries which will confuse your code if (say) you read in from disk some data written out by a VAX C or DEC C on VAX program. Make sure that structures like this are tightly packed. I tend to do this ...

/* Following structures MUST be packed tightly ie. no member alignment */
#ifdef __DECC
# pragma member_alignment save /* Push current default alignment on to 'stack'*/
# pragma nomember_alignment
#endif

struct my_struct_s    my_struct;
struct other_struct_s other_struct;

#ifdef __DECC
# pragma member_alignment restore /* Restore previous default alignment */
#endif

Don't be tempted to force nomember_alignment for everything. Restrict it to individual structures as shown above. Imposing it globally is likely to cause performance penalties, or mysterious crashes if you call routines that expect natural alignment.

§17.3 Accessing Fortran Common Blocks

External models have changed between VAX C and DEC C, so C code that accesses Fortran common blocks should beware. On the Alpha, the default for Fortran common blocks is now NOSHR . VAX Fortran uses SHR as the default. DEC C on the other hand, wants RELAXED_REFDEF. Try the following for a "normal" common block, ie. not a global section ...

/* FORTRAN common blocks are extern to everything -----------------------*/
#ifdef __DECC
# pragma extern_model save
#endif

#ifdef __DECC
# ifdef __ALPHA
/* Default if not overridden by options file is NOSHR for Alpha Fortran common*/
#  pragma extern_model common_block noshr
# else
/* Default if not overridden by options file is SHR for VAX Fortran common */
#  pragma extern_model common_block shr
# endif
#endif

extern struct {
  int a;          /* These names don't matter, type/size must be correct */
  int b;
  int c;
  char string[40];
} COMBLK; /* Case matters if linker is case sensitive */

#ifdef __DECC
# pragma extern_model restore
#endif

Then if your Fortran common block looked like this ...


*
*     Common:
*
      INTEGER A
      INTEGER B
      INTEGER C
      CHARACTER*40 STRING
      COMMON /COMBLK/ A,B,C,STRING

... your C function would access the members thus


void SET_FORTRAN_COMMON(void)
{
    COMBLK.a = 1;
    COMBLK.b = 2;
    COMBLK.c = 3;
    sprintf(COMBLK.string,"Hello World");
}

§17.4 Accessing Global Sections

If you were trying to access a global section, named PATSEC in this example, you would have to look at the options file to see the PSECT attributes (or if you were lucky, the attributes might be specified in the .CMN file itself with a $CDEC compiler directive). For PATSEC, you need the NOSHR option ...

#ifdef __DECC
# pragma extern_model common_block noshr
#endif
     .
extern struct {struct patsec patsec;} ZZZZ_PATSEC;
     .
#ifdef __DECC
# pragma extern_model restore
#endif

Then you can refer to it like this ...


        num = ZZZZ_PATSEC.patsec.ckp.apt[ndx-1].patrol;

Here the common block _containing_ the patsec structure is called ZZZZ_PATSEC, but we need to specify .patsec even though the common block in that case only contains one item (Previous example had .a, .b, .c and .string) .

§17.5 Passing Fortran style strings

Fortran expects strings to be passed by descriptor, or, more precisely, it expects to receive the address of a descriptor, which is a structure containing a pointer to a string, its length and its type and class. A string descriptor structure looks like this ...

struct  dsc$descriptor_s
{
  unsigned short  dsc$w_length;   /* length of data item in bytes,
                                     or if dsc$b_dtype is DSC$K_DTYPE_V, bits,
                                     or if dsc$b_dtype is DSC$K_DTYPE_P, 
                                     digits (4 bits each) */
  unsigned char   dsc$b_dtype;    /* data type code */
  unsigned char   dsc$b_class;    /* descriptor class code = DSC$K_CLASS_S */
  char            *dsc$a_pointer; /* address of first byte of data storage */
};

It is often convenient to use the $DESCRIPTOR macro defined in >descrip.h< to set up a descriptor. This expects the address of the string to be _constant_ so you must declare the string referred to by the descriptor as static to avoid compilation warnings. Example:


    static char user[16];
    $DESCRIPTOR( user_dsc, user );
        .
    istatus = CALL_FORTRAN_ROUTINE( &user_dsc );
        .

Be aware that the $DESCRIPTOR macro will set the string length to be one less than sizeof(string) to allow for the fact that C may have a terminating null character. Hence, in the example above, CALL_FORTRAN_ROUTINE will think it has been passed a CHARACTER*15 string.

§17.6 VMS Message Codes

In Fortran we often use EXTERNAL CARS__ERROPEN to get the linker to pull in the correct message code as a "global value". In C, this is how to do the same thing portably.

#ifdef __DECC
# pragma extern_model save
# pragma extern_model globalvalue
#else
# ifdef VAXC
#  define extern globalvalue
# endif
#endif

/* Put _declarations_ of message parameters here */
extern  cars__erropen, cars__errconn, cars__errread, cars__errwrite;

#ifdef __DECC
# pragma extern_model restore
#else
# ifdef VAXC
#  undef extern
# endif
#endif

If you wanted to use the same codes on a different machine, you could then define the integer error code values in a separate .c file and link against the error code object.

§18 Bibliography

Amazon Books This is an approved Amazon.com Associates site, and if you click on the titles below it will take you straight to the Amazon Books order page for that book. Amazon.com offers a safe option for Internet purchases: The Netscape Secure Commerce Server, which encrypts any information you type in. Click here to read their policy on privacy and security.

Back to Phil Ottewell's Home Page


© Phil Ottewell mailto:[email protected] 24-Nov-1998