Version 1.1 © Phil Ottewell 1995,1998
phils_c_examples.dsw
. You should open this with Microsoft Visual
Studio 98 and Microsoft Visual C++ 6.0, then "batch build" everything. VMS users
should use unzip -a phils_c_examples.zip
to get the
correct file attributes for the text files, like the .c
source
files and MAKE.COM
, the VMS command file which you can use to build
the programs.
There are also several programming challenges. Have a go at these,
nicking as much code as you can from the examples ! Using C is the best way to learn it, and making mistakes is
definitely the best way to find out how it really works. I mention the ANSI
C standard, ANSI/ISO 9899-1990, a lot in this document.
Always try to adhere to the standard; experience has shown that it pays off in
the long term. Some of the points I make are stylistic. However, many of these
suggestions are made for one of two reasons; either the majority of the C programming world has reached consensus that the style is
good (which will make it easier for you to read and learn from other peoples'
code) or I have found that you can avoid errors by doing things in a particular
way. I reckon that you can "learn" C in about an hour,
then spend the next year wishing you hadn't done things in a particular way the
hour after that. This course should help you avoid some of the pitfalls that are
so easy to fall into (and, in fact, dig for yourself) because of the total
control, power, and 0 to 60 ACCVIOS in under 10 seconds that C can deliver to the programmer.
#define SAYS =
char * Clarkson SAYS "It's sexy enough to snap knicker elastic at 50 paces";
BCPL and B were typeless languages - variables were all multiples of byte or
word sized bits of memory. C is more strongly typed than
B, BCPL or Fortran. Its basic types are
char
, int
, float
and double
,
which are characters, integers and single and double precision floating point
numbers. An important addition, compared to Fortran, is
the pointer type, which points to the other types (including other pointers).
All these types can be combined in structures or unions to provide composite
types.
The main shock to Fortran programmers is the fact
that C has no built-in string type, and consequently you
have to make a function call to compare two strings, or assign one string to
another. Luckily, the ANSI standard describes a set of string manipulation
routines that MUST be present if an implementation is described as ANSI C. Similarly, a good set of standard IO, time manipulation and
even sorting routines exist. HELP CC RUN-TIME_FUNCTIONS will give you
information on all of these, and even tell you which header files you should
include to use them. For example HELP CC RUN PRINTF will inform you that you
need the header file stdio.h
.
In the early days of C, different compiler vendors all had their own flavours of C, usually based on the book, The C Programming Language, by Brian Kernighan and Dennis Ritchie. These older compilers are often referred to as "Classic C" or "K&R C". As C gained in popularity, the need to standardize certain features became apparent, and in 1983 the American National Standards Institute established the X3J11 technical committee, which published the ANSI C standard in 1988.
If you only buy one book on C, get the second edition
of the K&R
book. If you want to buy two books add Expert
C Programming: Deep C Secrets by Peter van der Linden. If you really want to
be a language lawyer and contribute to threads like "Is i = i++ + --i legal ?"
in the comp.lang.c newsgroup, then get "The Annotated ANSI C Standard",
annotated by Herbert Schildt. Personally I think a line of code like i =
i++ + --i
should be taken out and shot.
The DEC C compiler is a good ANSI compiler, and any code you write should pass through this compiler (with its default qualifiers) without so much as an informational murmer. If it doesn't you are storing up big trouble and intermittent bugs for the future. Even if you decide to do nonstandard things, there are techniques to do them in a standard way (!), which will be explained later.
OK, enough waffle. Let's look at a "Hello World" program in C.
/*---- Hello World C Example ("hello.c") -------------------------------------*/
/* ANSI C Headers */
#include <stdio.h>
#include <stdlib.h>
/* Main Program starts here */
int main( int argc, char *argv[] )
{
int i;
/* End of declarations ... */
for ( i = 0; i < 10; i++ ) {
printf("%d Hello World !\n",i);
}
exit(EXIT_SUCCESS);
}
As you have probably gathered, comments in C are
delimited by /*
and */
, and comments must NOT be
nested, or you will get some very interesting bugs. The perceived need for
nested comments is usually for commenting out (say) a debug piece of code, and
this can be done in a better way, which will be explained later. Some C compilers let you use the trailing C++ style comments //, which
are a like a trailing ! in DEC Fortran. NEVER USE THESE
IN C PROGRAMS. It is not ANSI standard, and immediately
confuses people as to whether they are looking a C or
C++ code (and some meanings can subtly change).
To compile this program under DEC C (both Alpha's and VAX should be using DEC C now. VAX C was retired around 1993, and you really should switch to DEC C for both platforms now)
$ CC HELLO
$ LINK HELLO
Alternatively, you can use the
MAKE.COM
DCL command file, as shown below. On Alphas the
resulting executable will have file type .EXE_ALPHA
, and on VAX
machines it will be .EXE
.
$ @MAKE HELLO
DEV$DISK:[PHIL.PHILS_C_EXAMPLES]
CC/PREFIX=ALL HELLO.C -> HELLO.OBJ_ALPHA
LINK HELLO -> HELLO.EXE_ALPHA
Exiting
If you must use VAX C (and you mustn't :-) the link step will whinge about unresolved symbols, so change the line to
$ LINK HELLO, VAXCRTL/OPT
where the VAXCRTL.OPT
options file contains the line
SYS$SHARE:VAXCRTL/SHARE
You are now ready to RUN HELLO . Not too many surprises there. Note the form
of the code. The main entry point in a standard C is
always called main
, though you can override this on VMS platforms, as we
will discover.
The main program in C is declared as int main(some
funny stuff)
. This is because the main program should always return a
value (usually to DCL or the Unix shell) indicating how things went. This is
done by the call to exit(EXIT_SUCCESS)
. There are two ANSI standard
return codes, EXIT_SUCCESS and EXIT_FAILURE, both defined in
<stdlib.h>
. Always use these values, and don't do what a lot
of Unix programmers do which is exit(0)
or some other magic number
just because "everybody knows that exit(0)
means success". You can
return VMS condition codes, e.g. exit(SS$_NORMAL)
, but this should
be avoided unless really necessary, and even then there are ways to fall back to
the standard return codes if your code is compiled on a non-VMS machine.
The (some funny stuff)
is the argument list, or the "formal
parameters" of function main
. Imagine main
as a
function called from your command shell (DCL on VMS, or the DOS window on
Windows NT). The declaration int main( int argc, char *argv[] )
means that main is a function returning an integer, which takes two arguments.
The first is an integer, and is the number of arguments passed to main by DCL,
and the second is a pointer to arrays of characters. The latter are, in fact,
any command line arguments, as will be demonstrated in args.c
, a
demo programming coming soon to a disk near you. The body of a function is
delimited by {
and }
. Because C is largely a free format language, the whole function can be
on one line if you really want, but that tends to be unreadable and confusing. I
like to start the function with a {
in column one, just after the
function declaration (which I can then nick for prototyping), and end the
function with a }
in the same column.
Notice how each statement ends with a semicolon. The ";" is known as a statement terminator. It is also a "sequence point", as are the comma operator, and various other logical comparison operators, and the standard guarantees that side effects of expressions will be over once a sequence point is reached. This basically means that all the things you made happen in one statement will have happened by the time you start on the next statement or expression.
The printf(...)
statement is a call to a routine defined in
<stdio.h>
, and enables formatted output to the
stdout
stream. In C, three default output
streams are defined. These are stdin, stdout and stderr, and they correspond to
SYS$INPUT
, SYS$OUTPUT
and SYS$ERROR
under
VMS. The first argument is a format string containing conversions characters,
each preceded by the %
sign (use %%
if you actually
want a %
sign), which tell the routine how to interpret the
variable number of arguments to be printed. In this case the integer
i
is to be printed in decimal, so "%d"
is used. There
are corresponding functions, sprintf
to write directly into a
character string array, and fprintf
to write to a file. Similar
formatted input routines, sscanf
and fscanf
are also
available. The table below, nicked off the network, summarizes the conversion
characters:
Clive Feather's Excellent Table: Types of arguments for the various fprintf and fscanf conversions Conversion fprintf fscanf ---------------------------------------------------- d i int int * o u x X unsigned int unsigned int * hd hi int short * ho hu hx hX [see note 1] unsigned short * ld li long long * lo lu lx lX unsigned long unsigned long * e E f g G double float * le lE lf lg lG [invalid] double * Le LE Lf Lg LG long double long double * c int [see note 2] s [see note 2] [see note 2] p void * void ** n int * int * hn short * short * ln long * long * [ [invalid] [see note 2] Note 1: the type that (unsigned short) is promoted to by the integral promotions. This is (int) if USHORT_MAX <= INT_MAX, and (unsigned int) otherwise. Note 2: any of (char *), (signed char *), or (unsigned char *).
Don't worry about the "*"s for now. They can be read as "pointer to thing
named before them", so int *
means pointer to int. Similar
summaries can be found in K &R II pages 154 and 158.
Programming Challenge 1 _______________________ Have a go at adapting "hello.c" to print out the value ofi
in hexadecimal. Fiddle about with the format string - remove the"\n"
for example, and see what happens to your output.
Unlike Fortran, whitespace is significant in C, and there are reserved keywords. These reserved keywords should not appear as any type of identifier, even a structure member (YRL people - don't forget LID files). The list below shows both C and C++ reserved keywords.
asm1 continue float new1 signed try1 auto default for operator1 sizeof typedef break delete1 friend1 private1 static union case do goto protected1 struct unsigned catch1 double if public1 switch virtual1 char else inline1 register template1 void class1 enum int return this1 volatile const extern long short throw1 while
The items marked like this1 are C++, not C keywords, but it makes sense to avoid both. Avoid using a language name like Fortran too.
x = x + 1
isn't a contradictory algebraic statement.
C, unlike Fortran, has case
sensitive variable and other identifier names. Therefore the variable
NextPage
is completely different to nextpage
. The same
is true for functions. Some people like to use the capitalized-first-letter form
of naming, others prefer underbars, e.g. GetNextPage() or get_next_page() . Many
professional library packages tend towards TheCapitalizedFormat. Some people
like Microsoft's Hungarian Notation which involves prefixing variable
names with their type, e.g. uiCount
for an unsigned
int
counter variable . It all depends how good you are with the Shift key
:-) Whatever method you choose, try and be clear and consistent.
In C, local variable definitions can be at the start of any {block}, and aren't restricted to the top of the module as in Fortran. Be careful if you take advantage of this feature, because you may run into scoping problems where the innermost variable definition hides an outer one. If you are used to C++, remember that the variable definitions can only be at the start of the {block} before the first statement (e.g. expression, function call or flow control statement). If you try and intersperse definitions, C++ style, the C compiler will issue some sort of "bad statement" warning.
Variables declared at the beginning of the {function body} are local to the
function, variables declared at the "top" of the file (or compilation unit to be
pedantic), in the header files, or outside any function bodies, are global to
the compilation unit (and are externally visible symbols, unless declared as
static
). More will be said about this later. For now, suffice it to
say that you should avoid using global variables wherever possible.
A brief example will illustrate the scope of variables:
/*---- Variable Scope Example ( "scope.c" ) ----------------------------------*/
/* ANSI C Headers */
#include <stdio.h>
#include <stdlib.h>
/* Global variables, visible externally too (i.e. to things linked */
/* against this) Generally they should be avoided as far as */
/* possible, because it can be very difficult to discover which */
/* routine changes their value, and they introduce "hidden" dependencies */
int some_counter;
double double_result;
/* Function prototypes */
void set_double_result(void);
/* Main Program starts here */
int main( int argc, char *argv[] )
{
int j;
int i_am_local; /* .. to main */
/* End of declarations ... */
i_am_local = 1;
printf("i_am_local = %d (in main)\n\n", i_am_local );
for ( j = 0; j < 10; j++ ) {
int i_am_local; /* .. to this loop - Not necessarily a good idea */
/* because it can cause confusion as to which */
/* variable we actually want to access */
i_am_local = j;
printf("i_am_local = %d (inside loop)\n", i_am_local );
}
printf("\ni_am_local = %d (in main)\n\n", i_am_local );
/* Now let's look at the default initialization values of the globals */
printf("nsome_counter = %d (in main)\n", some_counter);
printf("double_result = %f (in main)\n\n", double_result);
/* Call a function that changes the global variables .. */
set_double_result();
/* .. and look at them again */
printf("some_counter = %d (in main)\n", some_counter);
printf("double_result = %f (in main)\n", double_result);
exit(EXIT_SUCCESS);
}
void set_double_result(void)
{
++some_counter;
double_result = 3.141;
printf("some_counter = %d (in set_double_result)\n", some_counter);
printf("double_result = %f (in set_double_result)\n\n", double_result);
}
The basic types in C are:
char - this defines a byte, which must be able to hold one character in the local character set (normally, but not necessarily 8 bits); int - holds an integer, usually in the machine's natural size. They are 32 bits on both VAX and Alpha. float - holds a single precision floating point number. They are 32 bits on both VAX and Alpha. double - Double precision floating point number, 64 bits on the VAX and Alpha.
These bit sizes are just to give you an idea. They should not be relied on, and you should code independently of them, unless you are addressing hardware registers or some equally hardware-specific task.
Some of these basic types can be modified with various qualifiers:
char - can be signed or unsigned; int - can be long or short, signed or unsigned; double - can be long, for (possibly) even more precision;
The long
modifier normally gives larger integers, but the
compiler vendor is free to ignore it provided that
short <= int <= long 16 bits <= short/int 32 bits <= long
Assignment to variables of these basic types is fairly intuitive, and can be done in the definition, rather like using the DATA statement in Fortran or the DEC Fortran extension
/* C Example */ |* DEC Fortran Example
|
int x, y; /* Not initialized*/| INTEGER X, Y
int counter = 0; | INTEGER COUNTER /0/
float total = 0.0; | REAL TOTAL /0.0/
char c = 'A'; | CHARACTER*1 C /'A'/
Note that C uses single quote ' for character
constants. The double quotes are used for strings. There are escape
sequences for getting nonprintable characters. These are listed on page 38
of K&R II. A few useful ones are '\n' to get a new line (C doesn't automatically add line feeds when you use
printf() ), '\a' to get a bell (alert) sound, and '\0' to get the null
character (which is NOT THE SAME as the NULL
pointer) used to
terminate strings (arrays of characters). The initialization of non-static
(discussed later) int and float variables is necessary before use. It doesn't
have to be done in the definition, but you can't rely on their value being
anything sensible, so whilst the initialization of COUNTER and TOTAL is
redundant in the Fortran example (assuming
non-recursive compilation), you do need to initialize the variables before use
in C.
.
{
int i;
int j;
.
i = 0; /* i's value could be anything up to this point */
.
j = i*OFFSET; /* j's value could be anything up to this point */
.
}
Global variables are guaranteed to be initialized to 0 (or 0.0 if floating type) but you can override this by specifying an initial value.
Similar rules apply to float, double, and long double. There are two standard
header files, <limits.h>
and <float.h>
which tell you the maximum and minimum values that can be stored in a particular
type; for example INT_MAX
is 2147483647, and FLT_MAX
is 1.7014117e+38 on the VAX.
The signed or unsigned modifiers are fairly self explanatory. The default for
int is signed, so it is rarely specified. Signed integer arithmetic is usually
done in Two's Complement form, but this need not be the case.
Characters can be signed or unsigned by default - it is implementation defined.
I find it best just to use char
with no qualifiers, and let the
compiler do what it will.
This is probably a good point to introduce the sizeof(thing)
operator. It is an operator, not a function, and is evaluated at compile time.
It returns the size of the argument, where the size of char
is
defined to be 1. To be pedantic, it returns an unsigned integer type,
size_t
, defined in <stddef.h>
, but is not often
used in a way that requires a size_t
declaration. Here are some
examples of its use (this is a "programming fragment" not a complete program).
size_t s;
int fred; /* Integer */
char bob; /* Character */
char *c_ptr; /* Pointer to character */
char bloggs[6]; /* Array of 6 characters */
.
s = sizeof( fred );
s = sizeof( bob );
s = sizeof( c_ptr );
s = sizeof( long double ); /* Allowed to use types instead of variables */
.
/* Safe string copy, checks size of destination and allows for terminating */
/* null character (not to be confused with the NULL pointer discussed later) */
strncpy( bloggs, "Bloggs", sizeof(bloggs)-1 );
.
You can leave the brackets off after sizeof, e.g. sizeof int
is
quite legal, but I think that the bracketed form is clearer.
Programming Challenge 2 _______________________ Have a go at adapting "hello.c" to print out the size of some commonly used types, e.g. int, short int, long int, float, double and so on. Try some arithmetic to familiarize yourself with the basic operators, +, -, *, /, and one that doesn't appear in Fortran, the modulus operator, %, which acts on integer types to yield the remainder after division. Use this to determine whether the year 2000 is a leap year. The rule is that it is a leap year if the year is divisible by 4, except if it is a multiple of 100 years, unless it is also divisible by 400.
In addition to the integer and floating point types, there is a type called
void
. The meaning of void changes according to context ! If you
declare a function returning void, you mean that it returns no value, like a
Fortran subroutine. A void in the argument list means
that the function takes no arguments (you can have a void function that does
take arguments by declaring arguments in the usual way, and you can have a
function that does return a value but takes no arguments). Below is an example
of a Fortran subroutine and C function:
/* C Version */ |* Fortran Version
|
void initialize_things( void ) | SUBROUTINE INITIALIZE_THINGS
{ |*
/* Do cunning setup procedure */ |* Do cunning setup procedure
/* No need for a return statement */ |*
} | END
. | .
/* Call it */ |* Call it
initialize_things(); /* Note () */ | CALL INITIALIZE_THINGS
. | .
The void qualifier also has yet another meaning, which will be discussed when we look at pointers.
The void function above demonstrates the general form of functions in C. They have a function definition with the formal parameters, then a {body} enclosed by the {} brackets. Function arguments are always passed by value in C. The actual arguments are copied to the (local) function formal arguments, as if by assignment. The arguments may be expressions, or even calls to other functions. The order of evaluation of arguments is unspecified, so don't rely on it ! Here is a C function example, with a similar Fortran routine for comparison.
/* C Version */ |* Fortran Version
|
int funcy( int i ) | INTEGER FUNCTION FUNCY( I )
{ | INTEGER I
|*
int j; | INTEGER J
/* End of declarations ... */ |* End of declarations ...
j = i; | J = I
i = i + 1;/* Only local i changed*/| I = I + 1 ! Calling arg changed
j = i*j; | J = I*J
| FUNCY = J
return( j ); | RETURN
} | END
. | .
/* Call it */ |* Call it
k = 3; | K = 3
ival = funcy(k); /* ival is 12 */ | IVAL = FUNCY( K ) ! IVAL is 12
. /* k is still 3 */| . ! K is 4
Notice that changing the function parameter in the C function does not alter the actual argument, only the local copy. To change an actual argument, you would pass it by address, using the address operator, &, and declare the function argument as a pointer to type int. More will be said about this in the pointers section. Generally, you should avoid writing functions in C that change the actual arguments. It is better to return a function value instead, where possible.
/* C Version */ |* Fortran Version
|
myval = funcy( gibbon ); | CALL FUNCY( GIBBON, MYVAL )
Programming Challenge 3 _______________________ Hack your copy of the "hello.c" to call some sort of arithmetic function, perhaps to return the square of the argument. Write the function, and add a "prototype" (these are discussed later) for it before the main program, e.g. . /* Function prototype */ int funcy( int myarg ); /* semicolon where function body would be */ . /* Main Program starts here */ int main( int argc, char *argv[] ) { . } /* The real McCoy - "Dammit Jim, I'm a function not a prototype" */ int funcy( int myarg ) { . /* Do something and return() an int value */ . } If you are feeling really cocky, write a recursive factorial() function that calls itself. Hint: . if ( n > 0) { factorial = n * factorial( n-1 ); } else { factorial = 1; } . Call it from you main program and step through with the debugger to convince yourself that it really is recursive.
When you write your own functions, try to avoid interpositioning, i.e. naming your function with the same name as a standard library (or system/Motif/X11/Xt library) function. Use
$ HELP CC RUN-TIME_FUNCTIONS your_function_name
to check for the existence of a similarly named DEC C
RTL function. Or look in a book. It is a very bad idea to replace a standard
function. If you need to write something with the same purpose as a standard
function, but maybe with better accuracy or speed, call it something different,
e.g. my_fast_qsort()
.
Three other modifiers I haven't yet explained are static
,
const
and extern
. The static modifier is another one
that changes meaning depending on its context. If you declare a global variable
or function as static
, it will still be visible throughout
the same compilation unit (file to us), but will NOT be visible externally to
programs linked against our routines. This is often used as a neat way of
storing data that has to be visible to a number of related functions, but must
not be accessible from outside. Some code fragments below illustrate this.
/*---- C Fragments -----------------------------------------------------------*/
/* Global Vars, NOT visible externally (i.e. to things linked against this) */
static int number_of_things;
int AddToThings( int a_thing )
{
.
number_of_things = number_of_things + 1;
return( number_of_things );
}
int GetNumberOfThings(void)
{
return( number_of_things );
}
int RemoveThing( int a_thing )
{
.
number_of_things = number_of_things - 1;
return( number_of_things );
}
* Fortran (sort of) Equivalent
*-----------------------------------------------------------------------
INTEGER FUNCTION ADD_TO_THINGS( A_THING )
.
INTEGER NUMBER_OF_THINGS
SAVE NUMBER_OF_THINGS
.
NUMBER_OF_THINGS = NUMBER_OF_THINGS + 1
ADD_TO_THINGS_ = NUMBER_OF_THINGS
RETURN
*
ENTRY FUNCTION GET_NUMBER_OF_THINGS()
GET_NUMBER_OF_THINGS = NUMBER_OF_THINGS
RETURN
*
ENTRY REMOVE_THING( A_THING )
.
NUMBER_OF_THINGS = NUMBER_OF_THINGS - 1
REMOVE_THING = NUMBER_OF_THINGS
RETURN
*
END
Another use of static is with variables that are local to a function. In this
case it is similar to the Fortran SAVE statement, i.e.
the variable will retain its value across function calls, and WILL BE
INITIALIZED to 0 if it is an integer type, or 0.0 if a floating point type (even
if the floating point representation of 0 on your machine is not all bits set to
0), or NULL
(pointer to nothing) if it is a pointer.
/*---- C Example -------------------------------------------------------------*/
int log_error( int code )
{
static int total_number_of_errors;
/* End of declarations ... */
/* ++ is the same as total_number_of_errors = total_number_of_errors + 1 */
return( ++total_number_of_errors );
}
* Fortran Equivalent
*-----------------------------------------------------------------------
SUBROUTINE LOG_ERROR( CODE )
.
INTEGER TOTAL_NUMBER_OF_ERRORS
* Not required for non-recursive DEC Fortran, but it documents your intent
SAVE TOTAL_NUMBER_OF_ERRORS
.
TOTAL_NUMBER_OF_ERRORS = TOTAL_NUMBER_OF_EBRORS + 1
END
The const
modifier is used to flag a read only
quantity. For example,
const double pi = 3.14159265358979;
.
/* Arizona ? */
pi = 3.0; /* Gives compiler error - try it in your test program */
The const modifier is useful for function prototype arguments which are passed by pointer, where you want to indicate that your function will not change the object pointed to. More will be said about function prototypes later.
Programming Challenge 4 _______________________ Look at the Fortran example above. Spot the deliberate mistake. The compiler would probably flag an error for it, but think of another instance where perhaps you wanted to increment an array element indexed by a non-trivial expression. Using the ++ operator in C helps avoid typographical errors, and looks less clumsy (and saves valuable bytes ;-) ). There is a similar operator, --, which decrements by one. Read K&R II, pages 46-48, and pages 105-106. Make sure you understand the difference between prefix and postfix versions of ++ and --, and try to rewrite the AddToThings() set of functions using these operators. Great - that's saved me having to explain it all.
The extern
qualifier is rather like EXTERN in Fortran, and basically gives type information for a reference
that is to be resolved by the linker. You DO NOT need to use extern
with function declarations - int funcy( int i );
is the same as
extern int funcy( int i);
. It is usually used when declaring
global variables to indicate that they are referenced in the particular
compilation unit, but not defined in it.
What is the difference between "definition" and "declaration" ? In short, a definition actually ALLOCATES SPACE for the entity, whereas a declaration tells the compiler what the entity is and what it is called, but leaves it up to the linker to find space for it ! A global variable, structure or function can have many declarations, but only one definition. This is explained in more details in the "Header Files" section which follows.
Three less commonly used modifiers are volatile
,
auto
and register
. The volatile modifier tells the
compiler not to perform any optimization tricks with the variable, and is most
often used with locations that refer to hardware, like memory-mapped IO, or
shared memory regions which might change in a way the compiler cannot predict.
The auto
qualifier may only be used for variables at function scope
(inside {}) and is in fact the default. Auto variables are usually allocated off
the stack (but this is up to the implementation). They will certainly not be
retained across function calls. NEVER return the ADDRESS of an automatic
variable from a function call (once you know about pointers). Because
new
automatic variables are "created" every time you go into a
function, this allows C functions to be called
recursively. The register qualifier is really obsolete. It is a hint to the
compiler that a variable is frequently used and should be placed in a register.
The compiler is quite free to ignore this hint, and frequently does, because it
generally knows far more about optimizing than you do (Microsoft Visual C++ or DEC C for example). Don't
bother using register.
Enumerated types, enum
, are similar to Fortran integer PARAMETERs, but
nicer to use. The general form is enum identifier { enumerator_list
}
, where "identifier" is optional but recommended. The comma-separated
list of enumerated values starts at zero by default, but you can override this
as shown in the example.
C Example
/*----------------------------------------------------------------------------*/
enum timer_state_e { TPending, TExpired, TCancelled};
enum timer_trn_e { TmrSet=4401, TCancel=4414};
.
enum timer_state_e t_state;
enum timer_trn_e t_trn;
.
t_state = TExpired; /* t_state now contains 1 */
t_trn = TCancel; /* t_trn now contains 4414 */
* Fortran Example
*------------------------------------------------------------------------
INTEGER TPENDING, TEXPIRED, TCANCELLED
INTEGER TSET, TCANCEL
PARAMETER (TPENDING = 0, TEXPIRED = 1, TCANCELLED = 2)
PARAMETER (TSET = 4401, TCANCEL = 4414)
.
INTEGER T_STATE, T_TRN
.
T_STATE = TEXPIRED
T_TRN = TCANCEL
When examining t_state or t_trn in the C program with the DEC debugger, the integer value will be converted to a name, e.g.
DBG> EXAMINE t_trn
PROG\main\t_trn: TCancel
which is handy. Unfortunately, because the enumerated types are really type
int, you can assign any integer value to t_trn
without a compiler
whinge ! Types and storage class modifiers are discussed in more detail in
K&R II, page 209 onwards, if you still thirst for knowledge.
for
loops, while
loops and do
loops. An example is worth a thousand words:
* Fortran Loops Example
.
INTEGER I
LOGICAL FIRST
.
PRINT *, I
ENDDO
*
I = 0
DO WHILE ( I .LT. LIMIT )
I = I + 1
PRINT *, I
ENDDO
*
FIRST = .TRUE.
DO WHILE ( FIRST .OR. I .LT. LIMIT )
IF ( FIRST ) FIRST = .FALSE.
PRINT *, I
I = I + 1
ENDDO
/*---- C Loops Example ("loops.c") -------------------------------------------*/
/* ANSI C Headers */
#include <stdio.h>
#include <stdlib.h>
/* Defines and Macros */
#define LMT 5
/* Main Program starts here */
int main( int argc, char *argv[] )
{
int i;
/* End of declarations ... */
printf("LMT = %d\n", LMT);
printf("\n'for' loop - for ( i = 1; i <= LMT; i++ ) {...}\n");
for ( i = 1; i <= LMT; i++ ) { /* More usual in C would be i = 0; i < LMT; i++ */
printf("%d\n", i );
}
printf("\ni = 0\n");
printf("'while' loop - while ( i++ < LMT ) {...}\n");
i = 0;
while ( i++ < LMT ) {
printf("%d\n", i );
}
printf("\ni = LMT\n");
printf("'do' loop - do {...} while ( ++i < LMT ); - always executes at least once\n");
i = LMT;
do {
printf("%d\n", i );
} while ( ++i < LMT );
exit(EXIT_SUCCESS);
}
All these constructs are explained in detail in K&R II, chapter 3. The
for
loop has the following general form:
for ( expression1; terminate_if_false_expression2; expression3 ) {
.
}
If "terminate_if_false_expression2" is missed out it is taken as being true,
so an infinite loop results, for (;;) {ever}
. The "expression1" is
evaluated once before the loop starts and is most often used to initialize the
loop count, whereas "expression3" is evaluated on every pass through the loop,
just before starting the next loop, and is frequently used to modify the loop
counter. It is quite legal, in C, to modify the loop
counter within the loop, and the loop control variable retains its value when
the loop terminates. Obviously "terminate_if_false_expression2" causes the loop
to end if it is false, and is used to test the termination condition.
The "while" looks like this:
while ( expression ) {
.
}
and keeps going for as long as "expression" is true. It zero trips
(that is, the code in it is never executed) if "expression" is false on the
first encounter. The for
loop above could be written using
while
.
expression1;
while ( terminate_if_false_expression2 ) {
.
expression3;
}
It isn't a good idea to do this though, because someone will spend ages
looking at your code wondering why you didn't write a for
loop,
expecting some cunning algorithm.
Finally, before time, the old enemy, makes us leave Loopsville City Limits, let's look at the "do-while" construct. The loop body is always executed at least once
do {
.
} while ( expression ); /* Semicolon needed */
and the loop will be repeated if "expression" is true at the end of the
current loop. There is a keyword, break
, which lets you leave the
innermost loop early, transferring control to the statement immediately after
the loop.
for ( i = 0; i < strlen(string); i++) {
if ( string[i] == '$' ) {
found_dollar = TRUE;
/* Once we've found the dollar no need to search rest of string */
break;
}
}
/* Jump to here on "break" */
.
A related keyword, continue
, skips to the end of the loop and
continues with the next loop iteration.
for ( i = 0; i < strlen(string); i++) {
/* Don't bother trying to upcase spaces */
if ( string[i] == ' ' ) continue; /* Move on to next character */
/* It wasn't a space so have a go */
string[i] = toupper( string[i] );
}
/* Jump to here on "break" */
.
This is most often used to avoid complex indenting and "if" tests. Don't use it like I just did, which was a silly example.
You have already met the "if" construct. Here it is again, with the "else if" demonstrated too.
if ( expression ) {
.
/* Do something */
.
} else if ( other_expression ) {
.
/* Do something else */
.
} else if ( final_expression ) {
.
/* Do something different */
.
} else {
.
/* Catch all if none of above expressions are true */
.
}
It is legal to write this kind of thing
if ( expression ) /* Avoid this form */
i = 1;
else
i = 2;
The problem arises if you do this
if ( expression ) /* This is probably not what was intended */
i = 1;
else
i = 2;
dont_forget_this = 3;
You might think that if "expression" is true (i.e. non-zero) then you would
set i
to 1, and if it were false you would set i
to 2
and dont_forget_this
to 3. In fact you will always set
dont_forget_this
to 3, because only the first statement after the
"else" is grouped with the "else". I never use this form, other than for a one
liner like
if ( expression ) expression_was_true = TRUE;
where the meaning is clear. Use the bracketed form which makes it totally unambiguous, and is easier to use with the debugger.
C provides an alternative to lots of if - else if tests. This is the "switch" statement. The "expression_yielding_integer" is calculated, and matched against the "case" "const-int-expression"s. When one matches, the statements following are executed, or if none match, the statements following "default" are executed
switch ( expression_yielding_integer ) {
case const-int-expression1:
statements1;
case const-int-expression2:
statements2;
case const-int-expression3:
statements3;
.
.
default:
statementsN;
}
Unfortunately a bad default behaviour was chosen for this. Each "case" drops through to the next one by default, so if, say, "expression_yielding_integer" matched "const-int-expression2", then "statements2" through to "statementsN" would ALL be executed. This is solved by using "break" again.
switch ( expression_yielding_integer ) {
case const-int-expression1:
statements1;
break; /* Always use break by default */
case const-int-expression2:
statements2;
break;
case const-int-expression3:
statements3;
break;
.
.
default:
statementsN;
break;
}
The default behaviour is rarely what is required in practise, and it would have been far better to have a default "break" before each case, and maybe use "continue" to indicate fall-through. Remember that chars can be used as small integers, so the following is quite legal.
char command_line_option;
.
switch ( command_line_option ) {
case 'v':
verbose_mode = TRUE;
break;
case 'l':
produce_listing = TRUE;
break;
case '?': /* Following two cases deliberately fall thru */
case 'h':
display_help = TRUE;
break;
default:
use_default_options = TRUE:
break;
}
int job[20]; /* job[0], job[1] .. job[19] */
and the dimension must be an integer greater than zero. This is how to declare a two-dimensional array [rows][columns]
int job[4][20]; /* Like 4 job[20] 's, job[0][0], job[0][1] .. job[3][19] */
.
i = job[2][0]; /* Good */
.
i = job[2,0]; /* Bad - don't ever do this */
.
Multi-dimensional arrays are column major; that is, the right-most subscript
varies fastest, unlike Fortran. Notice that you can't
use commas to separate the indices. Separate pairs of square brackets are needed
for each index. There is no limit to the number of dimensions other than those
imposed by your compiler and the amount of memory available. In practice,
multi-dimensional arrays are rarely used. Unfortunately, you can't (in C) use const int's as array bounds. You have to use
#define
, like this:
#define MAX_SIZE
.
float floaty[MAX_SIZE];
More will be said about #define
later. Arrays can be initialized
when they are defined:
int days_in_month[12] = { 31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31 };
int matrix[2][3] = { { 0, 1, 2 }, { 3, 4, 5 } };
Remember that uninitialized arrays can contain anything at all, so don't expect them to be full of zeros. In addition, initialized arrays can't be "demand zero compressed". You can leave out the size of an array and have it use the number of initializers, like this
int array_initialization_pages_in_K_and_R_II[] = { 86, 112, 113, 219 };
which produces an array of 4 integers. You would probably want to use
sizeof()
to determine the size of the array in this case
nelements = sizeof( array_initialization_pages_in_K_and_R_II ) /
/* ----------------------------------------------------- */
sizeof( array_initialization_pages_in_K_and_R_II[0] );
.
for ( i = 0; i < nelements; i++) {
.
}
Notice that you index up to LESS THAN the number of elements, because the last element is (nelements-1) .
static
by default, i.e. retain their value across function calls,
unless you change them. The initializer, or "string literal" is delimited by
double quotes "like this" . You can split a string initializer over several
lines, each part being in " quotes, and they will be concatenated together. The
resultant string has the null character, represented by the escape sequence
'\0', appended to the end of it.
char random[80]; /* Could contain anything */
char title[] = "Phil's Ramblings"; /* Takes 17 bytes due to '\0' at end */
char longer_string[] = "Here is quite a long string split up over"
"two lines. VAX C doesn't allow this, though."
"Another good reason to switch to DEC C on"
"VAX or Alpha VMS, or Visual C++ for Windows."
char string_with_quote[] = "Here is the quote \" character";
The name that you give to an array can be used as a pointer to the zeroeth
element of the array. More will be said about this in the "Pointers" section.
There are many functions in the standard library for manipulating character
strings, and these all begin with "str". You will need to include
<string.h>
to use them. Look in K&R II pages 249-250.
These functions expect an array, or a pointer to characters, as their arguments.
Finally, note that an empty string is not really empty.
char is_it_empty[] = ""; /* No, it contains one character, '\0' */
Always bear in mind that the string functions often copy trailing '\0'
characters, so you must ensure that you allow space for this. It is a good idea
to always use the "strn" versions of the calls, with
sizeof(destination)
as the character limit, because that way you
will avoid runaway (and hard to detect) memory overwriting. Remember to
terminate the destination string, e.g.
strncpy( destination, source, sizeof(destination) );
destination[sizeof(destination)-1] = '\0';
else you'll end up avoiding potential overwrites, but leave a potentially unterminated string to catch you out later !
int *i_ptr;
declares a pointer to type int. As declared above, i_ptr
is most
likely not yet pointing at a valid location. In order to make it point somewhere
valid, you generally use the "address operator", &, like this
int i;
int j;
int *i_ptr;
.
i_ptr = &i;
.
You can then change or read the value of i
by using the
"dereference operator", and change the object pointed to, providing it is an
object of the correct type.
*i_ptr = 3; /* Set the int pointed to by i_ptr to 3 */
printf("%d\n", i ); /* i will be 3 */
.
i_ptr = &j; /* Set the i_ptr to point to j now */
.
*i_ptr = 3; /* Set the int pointed to by i_ptr to 3 */
printf("%d\n", j ); /* j will be 3 */
.
This is a rather silly example, because you would obviously just use i
or j
directly. A more realistic use of pointers is with
arrays:
char string[] = "Here is a string with a $ in it";
char *sptr;
int contains_dollar;
.
sptr = string; /* Remember that the array name is the same as &array[0] */
contains_dollar = FALSE;
while ( *sptr ) { /* While thing pointed to is not 0 i.e. null character */
if ( *sptr == '$' ) {
contains_dollar = TRUE;
break; /* Leave the while loop early and safely */
}
++sptr;
}
.
When you increment pointers, they automatically increment the address they
point to by the size of one of the objects to which they point. In the example
above, that is one character, i.e. a byte. If the array was an array of int,
then the pointer would increment by sizeof(int)
bytes. Just to
frighten you, this loop could be written
while ( *sptr && !( contains_dollar = *sptr++ == '$' ) );
Programming Challenge 5 _______________________ You guessed it. Figure out what is happening in the scary "while" loop above. Now write your own (differently named) version of strcpy using similar techniques to make it as short as possible.
Arrays and pointers are closely related. They can be used in identical ways in many situations. For example:
.
char string[80];
char *sptr;
.
sptr = &string[0]; /* This could be written as sptr = string */
.
*string = 'A'; /* Using array name like pointer */
*(string+10) = 'B'; /* Using array name like pointer */
.
sptr[0] = 'A'; /* Using pointer like array */
sptr[10] = 'B'; /* Using pointer like array */
.
This is because, in expressions or function calls, arrays and pointers are
both converted to the form "*(pointer + index-offset)". The main thing to
remember is that pointers are variables, and can be changed to point to
different objects, whereas array names are not variables. The index-offset is
automatically scaled according to the type of data pointed to. In this case, we
are dealing with char
which, by definition, has a size of 1, but if
the pointers were pointers to int, then on the VAX or Alpha, the index-offset
would be automatically scaled by 4.
.
int array[20];
int another_array[20];
int *i_ptr;
.
i_ptr = array; /* Legal */
i_ptr[12] = 3;
.
i_ptr = another_array; /* Legal */
i_ptr[2] = 4;
.
array = another_array; /* Illegal ! */
.
Even multi-dimensional arrays get decomposed to the "*(pointer + index-offset)" by the compiler in say, a function call, which gives you no special knowledge of how they fold. Hence if you are using a pointer to a multi-dimensional array where the dimensions could vary, it is up to you to calculate the offset correctly, e.g.
int mda[ROWS][COLS];
.
i = funcy( mda, ROWS, COLS );
.
int funcy( int *array, rows, cols )
{
.
for ( i = 0; i < rows; i++) {
for ( j = 0; j < cols; j++) {
total += *(array + i*cols + j);
}
}
.
}
Of course, if the function was only expected to deal with arrays of set dimensions, you could just declare those in funcy().
int mda[ROWS][COLS];
.
i = funcy( mda );
.
int funcy( int array[ROWS][COLS] )
{
.
for ( i = 0; i < ROWS; i++) {
for ( j = 0; j < COLS; j++) {
total += array[i][j];
}
}
.
}
The strange += assignment operator isn't a misprint. It is shorthand, so that
x = x + 4;
can be written
x += 4;
Similarly
y = y - 10;
becomes
y -= 10;
There must be NO SPACE between the operator and the = sign, and the operator comes immediately before the =. This notation is handy for more complex expressions, such as
array[ hash_value[index]*k + offset[i] ] += 4;
so you only need maintain the expression in one place. Many binary operators have a similar assignment operator. Check K&R II page 50 and page 48 for the bitwise operators that can also be used in this way.
The other thing to remember is that whereas arrays allocate space, and hence the array name points to something valid, pointers MUST NEVER BE USED UNTIL THEY HAVE BEEN INITIALIZED TO POINT TO SOMETHING VALID.
There is a special pointer value defined by the standard, called the
NULL
pointer, which is used to indicate that the pointer doesn't
point to anything. Normally, you cannot directly assign integers to pointers,
but the NULL
pointer is an exception. Both the following lines make
p
point to "nothing" (well, a guaranteed "not valid location"
really).
i_ptr = 0; /* Legal but not recommended */
i_ptr = NULL; /* Recommended - it is clear that you refer to a pointer */
The NULL
macro (see Macros section later on), defined
identically in <stddef.h>
and <stdio.h>
among other places, is often defined as
#define NULL ((void *) 0)
even though "0" would do. This discourages its use as an integer, which you should never do. People often make the mistake of writing
string[i] = NULL; /* Never do this - you really want '\0' */
i = NULL; /* Never do this if i is integer and you really mean 0 */
when what they actually mean is
string[i] = '\0'; /* The null character - that's more like it */
i = 0; /* Integer zero */
A pointer of type (void *)
is a special type of pointer that is
guaranteed to be able to point to any type of object, hence the
NULL
pointer can be assigned to any pointer type. The
NULL
pointer need not have all bits set to zero, so don't rely on
this.
Pointers are very useful as function arguments for routines that manipulate strings of unknown (at compile time) length.
int how_long( const char *s )
{
int i;
/* End of declarations ... */
i = 0;
while ( *s++ ) {
i++; /* Increment i until '\0' found */
}
return( i );
}
Even though the thing pointed to by s
is const
,
note that it is quite legal to increment the pointer s
in the
function, because s
is a local, variable pointer, pointing to
whatever the calling argument to how_long()
was. Hence if you call
how_long(string)
, you don't change string, you assign string to
s
then increment s
. Any expression using array
subscripting, for example array[index]
, is exactly the same in
C as its pointer equivalent, in this case
*(array+index)
. You have to be careful when using the
const
modifier with pointers. The following examples should
illustrate the point.
int i;
const int *i_ptr; /* i_ptr points to a const int */
int * const i_ptr = &i; /* i_ptr is const, points to variable int */
Another important difference between pointers and arrays relates to the
sizeof()
operator.
int array[20];
int *i_ptr;
size_t s;
.
i_ptr = array;
.
s = sizeof(array); /* s is 20*sizeof(int), which is 100 on the VAX */
s = sizeof(i_ptr); /* s is sizeof(int *), which is 4 on the VAX */
.
You can't deduce the size of an array from a pointer, only the size of the
pointer. Because arrays as function arguments are treated the same as pointers,
then even if you declare the function arguments as "func( int array[10] )" array
is still treated like a pointer in the function body, so
sizeof(array)
in the function will give you the size of pointer to
int
, not 10 times size of int
.
It is quite legal to write a pointer definition like this:
int* i_ptr; /* Not recommended */
This is best avoided, because it can be confusing. Consider
int* i_ptr1, i_ptr2; /* Probably not what you intended */
At first glance it looks like you have just declared two pointers to int. In fact, i_ptr1 is a pointer, but i_ptr2 is an int.
int *i_ptr1, *i_ptr2; /* Better */
The second example keeps the * with the variable to which it relates, and is considered better style (by me at any rate) !
There are two standard library functions often used with pointer. They are
declared in <stdlib.h>
, and are malloc()
and
free()
. Both are AST reentrant under DEC C.
The malloc()
function allocates an area of memory specified in
bytes, and is declared as
void *malloc(size_t size);
and would be used like this
.
int *i_ptr;
.
i_ptr = (int *)malloc( sizeof(int)*nelements_wanted );
if ( i_ptr != NULL ) {
.
i_ptr[i] = i;
.
} else {
.
/* Couldn't get the memory - do some cunning recovery */
.
}
It is good practise to "cast" the result of a malloc()
to the
correct type. This helps the compiler to indirectly check whether you are using
the correct type in the sizeof()
invocation too. If it complains
about your cast, then (assuming the type is the same in the
sizeof()
) you are probably using the wrong type in both places,
and might have allocated too little memory. There is no check if you wander off
the allocated memory, out into memory space no man has seen before ! The memory
returned by malloc()
can contain any values when you get it, i.e.
it is not set to zero.
The free()
function frees up the memory obtained from
malloc()
. It is declared as
void free(void *pointer);
and would be used like this to free the memory obtained in the previous example
free( i_ptr );
i_ptr = NULL; /* Good practise */
I like to set the pointer to NULL
immediately upon freeing the
memory, because the pointer MUST NOT BE USED again after being
free()
ed. By setting it to NULL
, you will (under VMS
or Windows NT) get an ACCVIO if you try and dereference the pointer. This is
safer than leaving it, having the memory reused elsewhere, then changing it via
the duff pointer. This sort of mistake is very hard to track down. It is very
important to always free malloc()
-ed memory when you are done with
it, or you will cause what is known as a "memory leak".
There are a couple of functions related to malloc()
. One is
calloc()
, which allows you to allocate memory and initialize it's
value in one go.
void *calloc(size_t number, size_t size);
The other function is realloc()
, which allows you to expand a
region of memory obtained by malloc()
, whilst retaining its current
contents.
void *realloc(void *pointer, size_t size);
The new, expanded region of memory need not be in the same place as the original
new_i_ptr = (int *)realloc( i_ptr, sizeof(int)*larger_nelements_wanted );
if ( new_i_ptr ) {
/* Successfully expanded */
i_ptr = new_i_ptr; /* Don't free anything here ! */
} else {
/* Couldn't get the extra memory, stick with the existing pointer */
}
so in this example the memory may have changed location, but the original
content will have been copied to the new location. Note how I use a new pointer,
new_i_ptr
, to check that the relocation was successful. This is
essential because if you directly assigned to the pointer to the memory
you were trying to realloc
and the call failed (returning
NULL
) you would have no way to free the memory originally pointed
to by i_ptr
.
/* Never do this - always assign the return value to a different pointer */
i_ptr = (int *)realloc( i_ptr, sizeof(int)*larger_nelements_wanted );
A final couple of warnings about pointers. Firstly, the []
operator has a higher precedence than the *, so int *array[]
means
an array of pointers to int, not a pointer to an array of ints. Secondly, the
following two statements are not equivalent:
extern int is[]; /* This declares an int array, defined elsewhere */
extern int *is; /* This declares a pointer to int, defined elsewhere */
The compiler will actually generate code you did not intend, and probably cause an ACCVIO if you confuse these. This is because an access via a pointer first looks at the address of the pointer, gets the pointer value stored there, and uses that as the base address for lookups. Access via an array name uses the address of the array itself as the base address for lookups. Draw a diagram if you are confused ! Using the EXT and DEFINE_GLOBALS macros, explained later, should stop this ever happening to you.
struct optional_structure_identifier {what's in it} optional_instance;
I suggest that you always specify optional_structure_identifier, then declare
the instances of the structure later in a manner similar to the way we used
enum
. Example:
/* C Example */ |* Fortran Example
|
struct oscar_location_s { | STRUCTURE /OSCAR_LOCATION_S/
int x; | INTEGER X
int y; | INTEGER Y
}; /* Note the semicolon ; */ | END STRUCTURE
. | .
int main( int argc, char *argv[] ) | .
{ | .
struct oscar_location_s loc; | RECORD /OSCAR_LOCATION_S/ LOC
. | .
loc.x = 100; | LOC.X = 100
loc.y = 50; | LOC.Y = 50
. | .
} |
Similarly with unions, the following trivial example shows how they might be declared and used:
/* C Example */ |* Fortran Example
|
union hat_u { | STRUCTURE /HAT_U/
int mileage; | UNION
float hotel_cost; | MAP
}; | INTEGER MILEAGE
. | END MAP
int main( int argc, char *argv[] ) | MAP
{ | REAL HOTEL_COST
int was_tow; | END MAP
union hat_u cost; | END UNION
. | END STRUCTURE
if ( was_tow ) { | .
cost.mileage = 100; | IF ( WAS_TOW ) THEN
} else { | COST.MILEAGE = 100
cost.hotel_cost = 45.50; | ELSE
} | COST.HOTEL_COST = 45.50
. | ENDIF
. | .
} | .
Notice that you don't need the MAP - END MAP
sequence in C that is used in DEC Fortran. Everything in the union { body }
acts
as though it is sandwiched between MAP - END MAP.
Structures may contain pointer references to themselves, which is very handy for implementing linked lists:
struct list_s {
struct list_s *prev;
struct list_s *next;
void *data_ptr;
};
When you declare a pointer to a structure, let's call it p
,
there is a potential trap in using the pointer because the binding of the
structure member operator, .
, is higher than the * dereference
operator. Hence *p.thing
means lookup the member "thing" of
p
, and use that as an address for the dereference. What you really
want is (*p).thing
. This is a bit ugly, so C provides the ->
operator.
.
struct my_struct_s my_struct;
struct my_struct_s *struct_ptr;
.
struct_ptr = &my_struct;
(*struct_ptr).thing = 1; /* "thing" = 1 in struct pointed to by struct_ptr*/
struct_ptr->thing = 1; /* Same as above */
.
This is good place to introduce a program example kindly provided by Rob Cannings. This uses cunning (Cannings ?) pointer manipulation to create a binary sorted tree.
/*---- Illustration of pointer manipulation ("treesort.c") -------------------*/
/* Example provided by Rob Cannings: */
/* (Excess white space removed by Phil O. ;-)) */
/* We implement a sorting routine with the sorted list stored in a tree. */
/* ANSI C Headers */
#include <stdlib.h>
#include <stdio.h>
/* Structures */
struct treeNode {
int data;
struct treeNode *pLeft;
struct treeNode *pRight;
};
/* Function prototypes */
void AddNode(struct treeNode **ppNode,struct treeNode *pNewNode);
void Dump(struct treeNode *pNode);
/* Defines and macros */
#define NUMBER_OF_NUMBERS 4
/* Main Program starts here */
int main(int argc,char *argv[])
{
int i;
int toBeSorted[NUMBER_OF_NUMBERS] = { 93, 27, 15, 47};
struct treeNode dataNode[NUMBER_OF_NUMBERS];
struct treeNode *pSortedTree;
struct treeNode *pNewNode;
/* End of declarations ... */
/* Initialise one node for each item of data */
for (i = 0; i < NUMBER_OF_NUMBERS; i++) {
dataNode[i].pLeft = NULL;
dataNode[i].pRight = NULL;
dataNode[i].data = toBeSorted[i];
}
/* Build a sorted tree out of the data nodes, printing it */
/* out after each new node is added to the tree */
pSortedTree = NULL; /* the tree starts as just as a stump */
for (i = 0; i < NUMBER_OF_NUMBERS; i++) {
pNewNode = &dataNode[i];
AddNode(&pSortedTree,pNewNode);
printf("\nSorted list of %d items:\n",i + 1);
Dump(pSortedTree);
}
exit(EXIT_SUCCESS);
}
void AddNode(struct treeNode **ppSortedTree,struct treeNode *pNewNode)
{
struct treeNode *pCurrentNode;
/* End of declarations ... */
pCurrentNode = *ppSortedTree; /* ppSortedTree is a pointer to a pointer */
/* Have we reached the end of a branch ? */
if (pCurrentNode == NULL) {
*ppSortedTree = pNewNode;
} else {
/* We have not reached the end of a branch */
if (pCurrentNode->data > pNewNode->data) {
AddNode(&(pCurrentNode->pRight),pNewNode);
} else {
AddNode(&(pCurrentNode->pLeft),pNewNode);
}
}
}
void Dump(struct treeNode *pNode)
{
/* End of declarations ... */
if (pNode != NULL) {
Dump(pNode->pLeft);
printf("%d\n",pNode->data);
Dump(pNode->pRight);
}
}
Programming Challenge 6 _______________________ Compile and link "treesort.c" with the debugger. Step through and experiment with looking at pointers, and looking at the things they point to, e.g. EXAMINE *pNode . Modify the program so you can add numbers with a single argument function call.
Sometimes it is useful to know what offset a structure member has from the
start of the structure. There is a useful macro defined in
<stddef.h>
called offsetof
which will calculate
the offset of a structure member from that start of the structure.
byte_offset = offsetof(struct my_struct_s, thing);
The first argument to the offsetof
macro is a TYPE, not a
variable name. An example of this is shown in the "key.c" example program later
in the course.
The typedef
statement lets you define a new name for a
pre-existing type. It doesn't create a new type itself. An example should make
the usage clear. Imagine you wanted to store coordinates, and initially you
thought they could all fit in a short int. You might decide to
typedef
the coordinate declarations like this:
typedef short int Coordinate_t;
.
Coordinate_t x[MAX_POINTS], y[MAX_POINTS];
.
Later on it might transpire that increased resolution means that you need more than a short int. All you need do then is
typedef long int Coordinate_t;
Be careful and sparing in your use of typedef
. Don't use
typedef
for everything so that no-one can tell the true type of
anything. Some people like to use typedef
with structures,
struct coord_s {
int x;
int y;
};
typedef struct coord_s Coordinate_t;
.
Coordinate_t points[MAX_POINTS];
.
points[i].x = 100;
points[i].y = 50;
.
whereas others argue that this masks the fact that coordinates are really structures and that it would be clearer to use
struct coord_s points[MAX_POINTS];
I would suggest that you put all your structure and typedef
s in
one place, like in a header file, and use whatever makes the code uncluttered
and easy to follow. One place where I think typedef
does improve
clarity is when defining pointers to functions.
typedef int (*verify_cb_func_ptr)( Bodget b, PxPointer cdata, PxCBstruct cbs );
declares verify_cb_func_ptr
as a pointer to a function returning
an int
, with 3 arguments of the types shown. Note that the type
returned by the functions themselves is int
.
int verify_name( Bodget b, PxPointer cdata, PxCBstruct cbs );
.
verify_cb_func_ptr vcb;
.
vcb = verify_name;
i = (*vcb)( b, cdata, cbs); /* Note how to call function thru pointer */
.
The brackets around the (*vcb)
are needed because the
function brackets ()
take precedence over *.
printf
, which is a "stdio" function, the
for (;;)
loop, and the exit(EXIT_SUCCESS)
end-your-program function from "stdlib". These functions, or others from these
two libraries, are so commonly used that it is a good idea to always include the
<stdio.h>
and <stdlib.h>
ANSI standard
header files in all your programs. Header #include
files in C can be specified in two ways:
#include <stdio.h>
and
#include "myheader.h"
The quoted "myheader.h" form starts searching in the same directory as the
file from which it is included, then goes on to search in an implementation
defined way. The angle bracketed <stdio.h>
form follows "an
implementation defined search path". In practise "implementation defined search
path" tends to be the system libraries. Under VAX C, all
the header files lived as .h
files in SYS$LIBRARY:
.
Under DEC C, they live in text libraries like
DECC$RTLDEF.TLB and SYS$STARLET_C.TLB
. On Windows using Visual
C++ 6.0 they are in
C:\Program Files\DevStudio\VC98\Include , assuming that you installed
Visual C++ on to your C: disk. If you want to know
the full search rules for VMS, type
$ HELP CC LANGUAGE_TOPICS PREPROCESSOR #INCLUDE
You should always use the angle bracket <> form for ANSI header files, and use the quoted form for your own headers, e.g.
#include "src$par:trntyp.h"
The #
symbol is known as the preprocessor operator. When you
perform a C compilation, the first stage it goes through
is preprocessing, where all the #
directives are obeyed, and
various inclusions and substitutions are made before the code is compiled. The
#
sign must always be the first non-whitespace character on the
line, and is one of the few exceptions to the general free format of C code. You can have spaces after the #
, and
these are often useful when using #if
constructs.
Another common preprocessor directive is #define
. This can
be used to define "parameters" which you might want to use as array bounds for
example, but in addition it lets you define macros which take arguments and
produce inline code using the arguments. For example,
/* Some defines and macro definitions */
#define PI 3.14159265358979
#define MAX(a,b) (((a)>(b))?(a):(b))
#define STRING_SIZE 16
.
.
{
char string[STRING_SIZE]; /* Using a #define'd array bound */
.
}
Notice that there are no semicolons at the end of the #define
lines. Leading and trailing blanks before the "token sequence" (the body of the
macro or definition) are discarded, although you can use \ at the end of a line
to indicate that there is more of the macro on the next line. In the second form
of macro shown above, you cannot have a space between the identifier, MAX, and
the first "(", or the preprocessor will not know that the () delimit the
parameter list for the macro expansion. Also notice that (if you are a beginner)
you haven't got a clue what is going on with that MAX macro !
The #if
, #else
, #elif
and
#endif
conditional preprocessor directives are used to include code
selectively during preprocessing. They can be used to test if a particular macro
name has been defined (even as an empty string). A common use for this is
stopping the same header file contents being included more than once. For
example, imagine you had created a header file called "utils.h".
/*---- My header file for my util routines, called "tla_utils.h" -------------*/
#if !defined( TLA_UTILS_H ) /* Could have used #ifndef TLA_UTILS_H */
#define TLA_UTILS_H
.
#if defined( __VMS ) /* Could have used #ifdef __VMS */
# include "vms_specific_stuff.h"
#elif defined( UNIX )
# include "inferior_unix_alternative.h"
#else
# include "oh_dear_it_must_be_dos.h"
#endif
.
/* Do some stuff that should only be done once */
.
#ifndef DEFINE_GLOBALS
# define EXT extern
#endif
.
#define MY_PROGRAM_ARRAY_LIMIT 100
.
EXT int tla_global_int;
.
EXT const float tla_global_pi
#ifdef DEFINE_GLOBALS
= 3.14159265358979
#endif
;
.
EXT char tla_title_string[]
#ifdef DEFINE_GLOBALS
= "Program Title"
#endif
;
.
int MyFunction( int meaningful_name ); /* This is not a function definition */
/* it is a "function prototype" which */
/* allows arg and return val checking */
.
#endif /* End of TLA_UTILS_H block */
This technique is widely used to enable selection of the correct code at compile time. Try
$ HELP CC Language_topics Predefined_Macros System_Identification_Macros
which will give you some of the predefined (by the compiler) macros that let you switch code on and off depending on, say, whether you are on a VAX or Alpha. See K&R II pages 91 and 232 for more information on this subject.
The definition of the EXT macro is another useful technique for ensuring that
you only DEFINE a variable once (ie. actually allocate space for, or initialize
a variable with a value). Macros are explained in more detail below, but
basically the text (if any) associated with the macro name is substituted
wherever the macro appears, before compilation proper begins. In your main
program, you #define DEFINE_GLOBALS
and the header file then
becomes
.
int tla_global_int;
.
const float tla_global_pi = 3.14159265358979;
.
whereas any files of subroutines which don't #define
DEFINE_GLOBALS
will process the same header fragment as
.
extern int tla_global_int;
.
extern const float tla_global_pi;
.
so the values are resolved at link time, and won't be contradictory to the main program. This technique saves having to have two versions of your header files (which inevitably get out of step).
#define MAX(a,b) (((a)>(b))?(a):(b))
.
maxval = MAX( maxval, this);
.
to this before the compiler proper ever saw it:
maxval = (((maxval)>(this))?(maxval):(this));
Removing some of the "guard brackets" you get this slightly more readable version
maxval = (maxval > this) ? maxval : this ;
The brackets around the parameters in the expansion are necessary to keep the meaning correct if, say, one of the arguments is a function call, or complex expression. Sometimes it is advisable to create a temporary variable to avoid "using" the parameters more than once, and this will be explained later. See page 229 - 231 of K&R II for a fuller explanation of defining macros. Convention dictates that macros should be totally uppercase. This is certainly the style used in the ANSI header files, and it is generally best to make all your macros uppercase.
The ? operator is a ternary operator, i.e. it takes three operands. It should be used sparingly, and is a shorthand as illustrated below:
value = (expression_1) ? expression_2 : expression_3;
is (more or less) equivalent to
if ( expression_1 ) {
value = expression_2;
} else {
value = expression_3;
};
The reason it is handy in macros is that it is best to avoid multiple ;
separated statements in a macro, because that could well change the meaning of
code. Macros tend to be invoked on the assumption that they are a single
statement and code meaning could change if they weren't, e.g. if (
condition ) INVOKE_MACRO( bob );
. By using the ?
operator
you can get a single statement that still has some switching logic in it. There
is a trick to get round the single statement restriction, and still behave
nicely:
#define MULTI_STATEMENT_MACRO( arg ) do { \
first_thing; \
.
last_thing; \
} while (0) /* DONT put a ; at end ! */
In C, an expression is TRUE if it is ANY nonzero value, or in the case of pointers, if it doesn't compare equal to NULL. The results of logical comparisons or other built-in operators is guaranteed to be 0 or 1, so
i = ( 2 > 1); /* Sets i to be 1 */
i = ( 1 > 2); /* Sets i to be 0 */
So, in our MAX example ((a)>(b))
will be 1, i.e. TRUE, if
a
is greater than b
, 0 otherwise. So "expression_1" is
TRUE if a > b
. Hence the value of "expression_2" i.e.
a
will be chosen. Otherwise "expression_3", in this case
b
will be used.
"Why define MAX as a macro at all ?" you might ask (pause until someone asks). Well the reason is that if you used a function, you would need to write a version for floating point numbers, another for ints, another for long ints and so on. Of course, a macro can circumvent type checking, which some people don't like very much, so in C++ macros have been effectively eliminated for most purposes by "templates" which you can learn about in my STL Course.
When using the #if
test mentioned in the "Header Files" section,
you can use relational tests on constant expressions. Here is an example of
checking that you are using Motif 1.2 or greater
#if (XmVERSION >= 1 && XmREVISION > 1)
XtSetArg( argl[narg], XmNtearOffModel, XmTEAR_OFF_ENABLED ); narg++;
#endif
The expression following the #if
must either use the
preprocessing operator defined(identifier) (which returns 1 if identifier has
been #define
d, else 0) or be a constant expression. This can be
handy for defining a number of levels of debugging information. The
#if
is also the safest way to "comment out" unused code, rather
than messing about making sure you haven't illegally nested comments. For
example:
#ifdef NEW_CODE_IS_RELIABLE
/* New code that should be faster but hasn't been tested as much as the old */
.
#else
/* Here is the old code that worked - don't want to remove it yet */
.
#endif
Clearly the first #if
test will always fail in our lifetime
because the macro will never be defined, so the old code will not be compiled.
This technique avoids problems caused by inadvertent comment nesting.
Macros can be undefined using the #undef
directive.
#define DEBUG 1
.
#ifdef DEBUG
printf("The value of x is %d in routine Funcy\n",x);/* Print out debug msg*/
#endif
.
#undef DEBUG
.
#ifdef DEBUG
printf("The value of x is %d in routine Gibbon\n",x); /* Not printed */
#endif
.
You will need to #undef
a macro if you want to use it again.
Complete redefinitions aren't allowed. You can, however, define a macro more
than once provided the tokens it expands to are the same, ignoring whitespace.
This is known as a "benevolent redefinition" and is often used to get identical
definitions of the NULL
macro in several header files.
Avoid starting your macro names with _ and in particular __ because underbars
are reserved for the implementations, and double underbars are use for macros
predefined by the standard. For example, the standard reserves
__LINE__
,__FILE__
,__DATE__
,
__TIME__
and __STDC__
. Look in K&R II page 233 for
the meanings of these.
Occasionally it is useful to be able to use the macro arguments as strings. This is done by using the # directly in front of the argument.
#define DEBUG_PRINT_INT( x ) (printf("int variable "#x" is %d",x))
#ifdef DEBUG
DEBUG_PRINT_INT( i ); /* Prints "int variable i is 10" or whatever */
#endif
Concatenation of macro arguments is also possible using the ## directive. Some people like commas in big numbers, so you might use it like this:
#define NICKS_MEGA_INT(a,b,c) a##b##c
.
int i;
.
i = NICKS_MEGA_INT( 10,000,000 ); /* same as 10000000 after expansion */
.
Then again, you might not. As a final thought for this section, I will demonstrate a couple of benign uses for the ? operator - it's not just there for the nasty things in life.
got_space = GetSpace(how_much); /* Returns NULL if it fails */
printf( got_space ? "Success\n" : "Failure\n");
.
/* Avoid ACCVIO if pointer is NULL */
printf( "Name is %s\n", name_ptr[i] ? name_ptr[i] : "**Unknown**" );
.
/* Handle plurals */
printf( "Found %d item%s\n", nitems, (nitems != 1) ? "s" : "" );
The ACCVIO avoidance works because the expression that is NOT selected is
guaranteed to be "thrown away", so the NULL
pointer is never
dereferenced
Finally, remember my mentioning that it was a good idea to only reference macro arguments once if the macro was to be used like a function ? The X Toolkit Intrinsics macro, XtSetArg, doesn't follow this sound advice. It is defined like this:
#define XtSetArg(arg, n, d) \
((void)( (arg).name = (n), (arg).value = (XtArgVal)(d) ))
Notice that (arg) is referenced twice, but only appears once in the macro argument list. Hence the intuitive usage
XtSetArg( argl[narg++], XmNtearOffModel, XmTEAR_OFF_ENABLED );
actually increments narg by two, not one. It therefore has to be used something like this
XtSetArg( argl[narg], XmNtearOffModel, XmTEAR_OFF_ENABLED ); narg++;
If they had defined it like this
#define XtSetArg(arg, n, d) \
do { Arg *_targ = &(arg); \
( (void)( _targ->name = (n), _targ->value = (XtArgVal)(d) ); ) \
} while (0)
you would be able to use the argl[narg++] form. This is something to be aware of if your pre or post decrements seem to be behaving strangely. Obviously, you should not actually redefine standard macros, because this can lead to even more confusion. Create your own version, like SETARG if you feel the need.
C Means Fortran > - Greater than ( .GT. ) - >= - Greater than or equal to ( .GE. ) | Same precedence < - Less than ( .LT. ) | as each other, <= - Less than or equal to ( .LE. ) - below */+- == - Equal ( .EQ. ) - Same as each | other, just != - Not equal ( .NE. ) - below < etc.
They are left to right associative, and represent sequence points by which side effects of expressions must be complete. E.g.
if ( x*3 > y ) {
.
}
guarantees that x will have been multiplied by three before comparison with y.
A word of caution about the equality operator, == . It is very easy to miss out the second = and this will still be a legal expression. Example:
if ( x = 3*12 ) {
.
}
will always be true. This is because, in C,
expressions have a value, propagated right-to-left. So the value of ( x =
3 )
, which calculates the right-hand side, 36, and assigns it to
x
, is 36, which is nonzero and hence always true. So that mistake
will cause the if {} body to be always executed, and worse than that you will
have unknowingly changed the value of x
. To avoid this, some people
like to write the test the other way round, e.g.
if ( 3*12 == x ) {
.
}
Now, if you miss of the second = you have an illegal expressions because you
cannot assign 3*12 = x, because 3*12 is not an
lvalue (a modifiable location or symbol, which can be on the left-hand
side of the =
sign in an expression).
The logical operators are (in decreasing precedence)
C Means Fortran && - Logical AND ( .AND. ) || - Logical OR ( .OR. )
and are below the relational operators in precedence. Hence the expression
if ( j > 0 && i*3 > 12 || i != k ) ...
is the same as
if ( ( ( j > 0 ) && ( (i*3) > 12 ) ) || ( i != k ) ) ...
See K&R II page 52 for operator precedence. Most people don't remember
these, but use brackets to make the meaning of more complex expressions quite
clear. The ! as a unary operator is similar to Fortran
.NOT., so ( !x )
is true if x
is equal to zero.
<< - Left shift, bring in zero bits on right. >> - Right shift. Bring in 0s on left for unsigned integers, implementation defined for signed integers. ~ - One's complement. Unary operator, changes 0s to 1, 1's to 0 & - Bitwise AND, do not confuse with relational && | - Bitwise INclusive OR, do not confuse with relational || ^ - Bitwise EXclusive OR
Here are some examples:
i = i << 2; /* Multiply by 4 */
i <<= 2; /* Same as above */
mask |= MSK_RW; /* Set the bits in mask that are set in MSK_RW */
valuemask = GCForeground|GCBackground; /* Set the bits that are the OR of */
/* GCForeground and GCBackground */
mask = ~opposite; /* mask is complementary bit pattern to opposite */
mask |= 1UL << MSK_R_V; /* Shift unsigned long 1 left MSK_R_V bits */
/* and set that bit in mask */
These are very useful for setting and unsetting flag bits, but you must be aware of the size of object that you are dealing with. By their very nature, bitwise operators can make code more unportable.
Programming Challenge 7 _______________________ Use the bitwise operators to determine what your machine does with a right shift of a negative integer. Write some bit manipulation and checking functions. Check the priority of the bitwise operators and see how this affects the bracketing of your tests and expressions.
.h
(header) file with
the prototypes for those functions. The reason prototypes are so useful is that
they allow the compiler to check that you are calling a function with the right
number of arguments, and that the arguments themselves are of the correct type.
You should NEVER ignore warnings about argument numbers or types, and you should
only cast (see later, but briefly, the "cast" (float)3 is like the Fortran FLOAT(3) ) if you are absolutely sure what you are
doing !
Notice that the function prototype (for power in this example) is exactly the same as the function header, but with a ; where the body of the function would go. The arguments named in the prototype are optional, so we could have declared "int power( int , int );" . Don't ever do this. Give the arguments either the same names as those in the function definition, or maybe a more verbose name, so that someone looking at your header file with your function prototypes can easily work out how they are meant to be called.
/*---- To sign or not to sign, that is the example ("charsign.c") ------------*/
/* ANSI C Headers */
#include <limits.h>
#include <stdio.h>
#include <stdlib.h>
/* Function prototypes */
int power( int base, int n );
/* Main Program starts here */
int main( int argc, char *argv[] )
{
char c;
unsigned char uc;
/* End of declarations ... */
/* Set the top bit of both characters */
c = power( 2, (CHAR_BIT-1) );
uc = power( 2, (CHAR_BIT-1) );
/* Shift them both right by one bit, >> is the right shift operator */
c >>= 1;
uc >>= 1;
/* Check for equality - check out the ? ternary operator ! */
printf("Your computer has %ssigned char.\n", ( c == uc ) ? "un" : "" );
exit(EXIT_SUCCESS);
}
/*---- Function to raise integer to a power, nicked from K&R II, page 25 -----*/
int power( int base, int n )
{
int i, p;
/* End of declarations ... */
p = 1;
for ( i = 1; i <= n; i++) {
p = p * base;
}
return( p );
}
In this example, power is a function that returns an int. You can return any
type except an array. However, you can return structures (which might contain an
array). Similarly, you can pass structures as arguments. In general, it is best
to avoid passing or returning structures, because there may be extra overhead
due to structures being larger than machine registers, hence they are often
passed on the stack. Return or pass a pointer instead. DON'T return a pointer to
a function-local, automatic object ! Either make the user pass you a maximum
size and some memory into which you can write your structure/array, or
malloc()
it and return that. In the latter case you should document
somewhere that it is up to the user to free()
the memory when they
are done with it.
If your function doesn't actually return a value, like a Fortran SUBROUTINE, it is declared as void
. The
void keyword is also used to indicate that a function takes no arguments, for
example:
void initialize_something(void);
would be used like this
initialize_something();
The brackets are necessary, even though there are no arguments, so the compiler can tell that you intend to call a function.
Programming Challenge 8 _______________________ Hack the "charsign.c" example to try and call power() with the wrong type of argument (you might declare a float variable and use that). See what compiler message you get. Call it with the wrong number of arguments (but leave the prototype unchanged). Create a new function, powerf() that lets you raise a floating point number to any power. Try HELP CC RUN-TIME_FUNCTIONS LOG and HELP CC RUN-TIME_FUNCTIONS EXP for clues. The print format conversion character for a floating point number in printf is "%f". Compile your program and wonder why you get the error %CC-I-IMPLICITFUNC, In this statement, the identifier "exp" is implicitly declared as a function. Remember that when you typed HELP CC RUN-TIME_FUNCTIONS EXP it told you to stick "#include <math.h>" in your program. Put it in and the error should go away. If you chose to make your function something like float powerf( float base, float exp); Think about the fact that exp() didn't whinge when you passed it a float. This is because when arguments are passed (by value always in C), they are (if possible) converted, AS IF BY ASSIGNMENT, to the type specified in the function prototype. The order of evaluation of arguments is unspecified, so never rely on it. See K&R II pages 45 and 201-202 for a detailed description of this behaviour. Finally when you have your powerf function working, think what a git I am for not mentioning the "double pow(double base, double exp);" which also exists in <math.h>.
(type_I_want_to_cast_to) expression_I_want_to_cast
For example
int index;
float realval;
index = (int)realval;
Because, as explained in the example, this is done by default when calling functions for which good prototypes have been declared, it is generally only useful if calling older style "Classic C" functions where the arguments types have not been declared. E.g.
float funcy(); /* We know that this actually takes a double argument */
.
float f;
.
f = funcy( 100 ); /* Unpredictable result */
f = funcy( (double)100 ); /* f is 10.0 */
Declarations can be quite complicated, and you should read and understand K&R II, pages 122 to 126. There is a very good set of rules and a diagram for parsing declarations in "Expert C Programming", pages 75 to 78, and I strongly recommend everyone to read this.
Try to avoid casting, except in the circumstances defined above, and possibly
when using the RTL function malloc()
.
stdin
, stdout
and
stderr
, which are usually the keyboard, the terminal and the
terminal again respectively. If you have included <stdio.h>
(which you should have) the symbols stdin
, stdout
and
stderr
are available for your use. The functions from
<stdio.h>
that have seen so far, like printf
,
write to stdout
. Others, like scanf
, read from
stdin
. Here is an example of using scanf
to read
keyboard input.
/*---- Keyboard Input C Example ("input.c") ----------------------------------*/
/* ANSI C Headers */
#include <stdio.h>
#include <stdlib.h>
/* Main Program starts here */
int main( int argc, char *argv[] )
{
int i;
float f;
char string[80];
/* End of declarations ... */
printf("Enter a string, a decimal and a real number separated by spaces\n");
scanf("%s %d %f", string, &i, &f); /* Not good - no string length check */
printf("You entered \"%s\", %d and %f\n", string, i, f);
exit(EXIT_SUCCESS);
}
Compile the program and enter some data. Here is some example input and output.
Enter a string, a decimal and a real number separated by spaces
Hello 4 3.14159
You entered "Hello", 4 and 3.141590
Each item is delimited by whitespace (which includes new lines, of course),
but you can use a scanset format specifier to overcome this,
"%[characters_wanted]". See the DEC C Run-Time Library
Reference Manual, Chapter 2, and Table 2-3, and K&R II page 246 for more
information on this. The scanf
function actually returns an integer
value, which is the number of items successfully read in, or the predefined
macro value EOF if an error occurred.
Because you can't safely limit string input with scanf
(which means you could unintentionally overwrite important memory locations and
cause your program to crash by entering a string longer than the memory
allocated for it), it is far better to use fgets()
.
What you do is read a limited length string with fgets()
, the
prototype for which is
char *fgets( char *str, int maxchar, FILE *file_ptr)
.
So if we had a first argument, destination_string
, declared as
char destination_string[STRING_SIZE]
, we would use
sizeof(destination_string)
for maxchar
, and
stdin
as the input FILE
stream.
.
char destination_string[STRING_SIZE];
char *pszResult;
.
pszResult = fgets( destination_string, sizeof(destination_string), stdin );
if ( !pszResult ) {
/* Error or EOF (End Of File) */
.
} else {
if ( destination_string[strlen(destination_string)-1] != '\n' ) {
/* Length limit means we didn't get all the input string */
.
} else {
/* Got it all - do sscanf() or whatever */
.
}
}
The fgets()
function stops reading after the first newline
character is encountered, or, if no newline is found, it reads in at most
maxchar-1
characters. In either case the string is terminated with
'\0'
. You can tell whether the length limit cut in by checking if
the last character in the string is '\n'
. If it isn't, then you may
need to read more input. This will require malloc()
ing space enough
for all the "segments" of the string and perhaps strncat()
ing them
together. This is left as an exercise for the reader. The resultant string is
then parsed with sscanf()
, which takes its input from a string;
destination_string
in this description.
Failure to limit the length of an input string led to the infamous "finger"
bug. There is a function similar to fgets
, called
gets()
, which was used in the original finger program (finger is a
program which returns information about users on the target system).
Never, repeat never, use gets
. It reads
characters into the string pointed to by its argument, with no length
check. This was exploited to overwrite the return address on the stack, and
make the "privileged" finger image execute some code (sent as a part of a long
string) to create a command shell running with full privileges.
Programming Challenge 9 _______________________ Write a program using scanf, or fgets and sscanf. Try out different conversion characters, check the effects of giving bad data.
The standard C file system is based on streams, which
are rather like unit numbers in Fortran. Streams
connect to devices, such as files on disks, terminals or printers. By using file
streams, you are shielded from having to do the low level IO. C recognises two types of stream which are text streams and
binary streams. Binary streams are written to and read from without any mucking
about ! Text streams can have, or not have, implementation defined line feeds
and carriage returns. Binary streams are usually used for writing out data in
the machine's internal representation, like an array of structures for example.
The type is determined when the file is opened with fopen()
. The
following table show the access modes you can use with fopen()
.
r Open text file reading. w Open or create text file for writing, discard previous contents. (creates new version under VMS) a Open or create for appending, write at end of existing data. r+ Open text file reading AND writing. w+ Open or create text file for update, discard previous contents. (creates new version under VMS) a+ Open or create for appending, write at end of existing data.
Add a "b" after the access mode letter, and we are talking binary files. Here is an example program to write and read back an array of data structure, which will make everything remarkably clear.
/*---- File IO Example ("fileio.c") ------------------------------------------*/
/*---- Put all #include files here -------------------------------------------*/
/* ANSI Headers */
#include <ctype.h> /* Character macros */
#include <errno.h> /* errno error codes */
#include <stdio.h> /* Standard I/O */
#include <stdlib.h> /* Standard Library */
#include <string.h> /* String Library */
#include <time.h> /* Time Library */
/*---- Put all #define statements here ---------------------------------------*/
#define PROGRAM_VERSION "1.6"
#define TYPE_OF_FILE "TEST_DATA"
#define NDATA_POINTS 10
/*---- Put all structure definitions here ------------------------------------*/
/* Following structures MUST be packed tightly ie. no member alignment -------*/
#ifdef __DECC
#pragma member_alignment save
# pragma nomember_alignment
#endif
struct file_header_s {
char type[32];
char version[8];
char creator[20];
time_t time;
};
struct data_s {
short x;
short y;
char name[8];
};
#ifdef __DECC
# pragma member_alignment restore
#endif
int main(int argc, char *argv[])
{
struct data_s *data_ptr;
struct file_header_s file_header;
FILE *outfile, *infile;
int i, got_answer;
char answer[8], yeno[4], node_user[20], filename[128];
char *c_ptr;
long int ndata = 0, nitems = 0;
/* End of declarations ... */
/* Set up a default namne in case user hasn't specified one */
if (argc < 2) {
strcpy(filename,"MYDATA.DAT");
} else {
strcpy(filename,argv[1]);
}
/* Get node name and user name */
#if !defined( _WIN32 )
sprintf( node_user, "%s%s", getenv("SYS$NODE"), getenv("USER") ); /* VMS */
#else
sprintf( node_user,"\\\\%s\\%s",getenv("COMPUTERNAME"),getenv("USERNAME"));
#endif
/* Convert to uppercase */
c_ptr = node_user;
while ( *c_ptr = toupper(*c_ptr) ) ++c_ptr; /* toupper lives in <ctype.h> */
/* Allocate data space and initialize it */
data_ptr = (struct data_s *)malloc( sizeof(struct data_s)*NDATA_POINTS );
if ( data_ptr ) {
ndata = NDATA_POINTS;
for ( i = 0; i < ndata; i++) {
data_ptr[i].x = data_ptr[i].y = i;
sprintf(data_ptr[i].name, "%3.3d,%3.3d", data_ptr[i].x, data_ptr[i].y );
}
/* Open the file for writing in binary mode */
outfile = fopen(filename,"wb");
if ( outfile != NULL ) {
/* Set up the header */
strcpy(file_header.type,TYPE_OF_FILE);
strcpy(file_header.version,PROGRAM_VERSION);
sprintf(file_header.creator,"%s",node_user);
(void)time(&file_header.time);
printf("Writing out the header\n");
/* Items Written Data Pointer Size in bytes No. of items stream */
/* | | | | | */
/* v v v v v */
nitems = fwrite( &file_header, sizeof(file_header), 1, outfile);
if ( ferror(outfile) ) {
fprintf(stderr,"Error writing file 'header':\n%s",strerror(errno));
}
printf("Writing out the number of data items, %d\n", ndata);
nitems = fwrite( &ndata, sizeof(ndata), 1, outfile);
if ( ferror(outfile) ) {
fprintf(stderr,"Error writing number of data items:\n%s",
strerror(errno));
}
printf("Writing out the actual data data all in one chunk\n");
nitems = fwrite( data_ptr, sizeof(struct data_s)*ndata, 1, outfile);
if ( ferror(outfile) ) {
fprintf(stderr,"Error writing data:\n%s",strerror(errno));
}
printf("Closing output file\n");
fclose(outfile);
} else {
fprintf(stderr,"Error creating data file %s:\n%s", filename,
strerror(errno));
}
} else {
fprintf(stderr, "Couldn't allocate space for %d data structures\n",ndata);
}
/* Now optionally read the data back in and format */
do {
printf("\nRead data back in ? [Y/N]: ");
fgets( yeno, sizeof(yeno) , stdin); /* Reads in sizeof(yeno)-1 chars */
got_answer = sscanf( yeno, "%[YyNnTtFf]", answer);
} while ( !got_answer );
if ( answer[0] == 'Y' || answer[0] == 'y' ||
answer[0] == 'T' || answer[0] == 't' ) {
printf( "Here we go ..\n" );
/* Zero out the structures just to show there's no cheating */
ndata = 0;
memset( &file_header, 0 , sizeof( file_header ) );
memset( data_ptr, 0 , sizeof(struct data_s)*NDATA_POINTS );
/* Open the file for reading in binary mode */
infile = fopen(filename,"rb");
if ( infile != NULL ) {
printf("Reading in the header\n");
nitems = fread(&file_header,sizeof(file_header),1,infile);
if ( ferror(infile) ) {
fprintf(stderr,"Error reading file 'header':\n%s",strerror(errno));
}
printf("Header information: file type %s\n", file_header.type );
printf(" version %s\n", file_header.version );
printf(" created by %s\n", file_header.creator );
printf(" on %s\n", ctime( &file_header.time ) );
nitems = fread( &ndata, sizeof(ndata), 1, infile);
printf("Read in the number of data items, %d\n", ndata);
if ( ferror(infile) ) {
fprintf(stderr,"Error reading number of data items:\n%s",
strerror(errno));
}
printf("Reading in the actual data data all in one chunk\n");
nitems = fread( data_ptr, sizeof(struct data_s)*ndata, 1, infile);
if ( ferror(infile) ) {
fprintf(stderr,"Error reading data:\n%s",strerror(errno));
}
printf("Closing intput file\n\n");
fclose(infile);
printf("Read in %d data items\n", ndata);
for ( i = 0; i < ndata; i++) {
printf("%3d) x:%3d, y:%3d, Label: %s\n",
i, data_ptr[i].x, data_ptr[i].y, data_ptr[i].name );
}
} else {
fprintf(stderr,"Error opening data file %s:\n%s", filename,
strerror(errno));
}
} else {
printf( "OK - be like that\n" );
}
exit(EXIT_SUCCESS);
}
The strange #pragma
directive is a standard way to do
non-standard things, like instruct the compiler to close-pack the data (i.e.
don't use natural alignment) and is explained in §17. The "f" routines you can
look up yourself in a book. Several DEC system and library routines are used, so
the VMS headers <ssdef.h>
, <starlet.h>
and
<lib$routines.h>
are included. Notice that the function calls
are lowercase, and there are no prototypes defined yet, so you are on your own
there if you get an argument wrong ! This should change with future releases of
DEC C. The strerror
routine is useful for
getting a text error message. After many library calls, not just stdio calls, an
integer expression, errno
, defined in <errno.h>
,
yields a non-zero value if an error occurs. Be very careful making assumptions
about errno
, because in many cases it isn't actually a variable but
a macro, which allows it, for example, to behave in a thread-safe manner. This
means, however, that it isn't safe to treat it as a global integer variable and
take it's address and so forth.
The strerror
function from <string.h>
converts the error number into a text string, and the function value is a
pointer to this string. You must not modify this string, and it will be
overwritten by later calls to strerror
. This program reads in many
bytes at a time with each read, but there is a function, fgetc
, to
read a single character, and a matching function, ungetc
which
returns the last read character to the input stream to be read by the next
fgetc
. This provides a kind of look-ahead function which is
exploited by the next example program, "calc.c", provided by Neill Clift. In
this program, getchar is used instead of fgetc
. It is equivalent to
fgetc
except that it reads from stdin
. This sturdy
example is adapted from the very expression parser used by LID (a Y.R.L.
replacement for DEC's CDD; contact [email protected] if you are interested).
/*---- Calculator expression evaluator example ("calc.c") --------------------*/
/*
History:
Version Name Date
V01-001 Neill Clift 16-Mar-1995
Initial version
*/
/* ANSI Headers */
#include <ctype.h>
#include <stdio.h>
#include <stdlib.h>
/* Function prototypes */
int parse_expression(void);
int parse_expression_factor(void);
int parse_expression_term(void);
int parse_literal(void);
int getnonwhite(void);
int match_token(int tomatch);
/* Start of main program */
int main( int argc, char *argv[])
{
int val;
/* End of declarations ... */
printf("Enter expression terminated by ;\n");
printf("Calc> ");
val = parse_expression();
if (match_token(';')) {
printf("Result is %d\n",val);
} else {
printf("Expression seems bust\n");
}
exit(EXIT_SUCCESS);
}
/*---- Get the next character skipping white space ---------------------------*/
int getnonwhite(void)
{
int c;
/* End of declarations ... */
while (1) {
c = getchar();
if (c == EOF) {
break;
} else if (!isspace(c)) {
break;
}
}
return( c );
}
/*--- Match single character against next input char. If match then gobble ---*/
/*--- it up. If we don't match it then push it back for future matches -------*/
int match_token(int tomatch)
{
int c;
/* End of declarations ... */
c = getnonwhite();
if (c == tomatch) {
return( 1 );
} else {
ungetc(c,stdin); /* Put character back on input stream to be read again */
return( 0 );
}
}
/*---- Parse a single number from input +/- nnnnn ---------------------------*/
int parse_literal(void)
{
int retval,st;
/* End of declarations ... */
retval = 0;
st = scanf("%d",&retval);
if (st == EOF) {
printf("Hit EOF looking for literal\n");
} else if (st == 0) {
printf("Missing literal\n");
}
return( retval );
}
/*---- Syntax is: Literal or: (expression) ---------------------------*/
int parse_expression_factor(void)
{
int retval;
/* End of declarations ... */
if (match_token('(')) {
retval = parse_expression();
if (!match_token(')'))
printf("Missing close bracket\n");
} else {
retval = parse_literal();
}
return( retval );
}
/*---- Parse an expression term, Syntax: <factor>{<multiplying_op><factor>} -*/
int parse_expression_term(void)
{
int tmp,mul,opr;
/* End of declarations ... */
tmp = parse_expression_factor();
while (1) {
if (match_token('*')) {
opr = 1;
} else if (match_token('/')) {
opr = -1;
} else {
break;
}
mul = parse_expression_factor();
if (opr == 1) {
tmp = tmp * mul;
} else if (mul == 0) {
printf("Division by zero!\n");
} else {
tmp = tmp / mul;
}
}
return( tmp );
}
/*---- Parse an expression, Syntax: [+/-]<term>{<adding_op><term>} -----------*/
int parse_expression(void)
{
int tmp,mul,add;
/* End of declarations ... */
/* Check for leading + or -. None means plus */
if (match_token('+')) {
mul = 1;
} else if (match_token('-')) {
mul = -1;
} else {
mul = 1;
}
tmp = parse_expression_term();
tmp = tmp * mul;
while (1) {
if (match_token('+')) {
mul = 1;
} else if (match_token('-')) {
mul = -1;
} else {
break;
}
add = parse_expression_term();
tmp = tmp + mul * add;
};
return( tmp );
}
A few other file routines are worth mentioning. These are
fgetpos
, fsetpos
and fseek
, which
generally apply to files open in binary mode. They allow you to position to a
particular byte within a file, specified from the current position, or the
beginning or end of the file. The fgetpos
function returns the
position in an object of type fpos_t
, which is only meaningful when
used with fsetpos
. See K&R II page 248 for more details. The
fflush
function allows you to flush cached data on an output stream
if you want to do this before fclose
, which flushes anyway, as does
exit()
. To make sure that data is actually written to disk you must
call a non-standard function like fsync
after fflush
-
fflush
on it's own doesn't guarantee that the data has actually
been written to permanent storage. Streams can be redirected using
freopen
, and this is a commonly used method for making
stdout
get written to a file without having to change printf calls.
Testing for end of file is achieved by using the feof
function.
A standard method for getting command line arguments is provided in C. These are the arguments to the main program,
argc
and argv
. The integer argc
is the
number of command line arguments, and must be greater than or equal to zero. The
second argument, argv
, is an array of pointers to characters. If
argc
is zero, then argv[0]
must be the
NULL
pointer. On most implementations, it will be greater than
zero, and argv[0]
points to the program name. On some machines this
will be a string like "myprog". On VMS or Windows NT systems it is the full file
specification. If the program name is not available, argv[0]
must
point to the null character, '\0'. The elements argv[1]
to
argv[argc-1]
, if they exist, point to strings which are
implementation defined. In practise, these are usually the whitespace separated
(unless "quoted") arguments supplied on the command line. Under VMS, command
line arguments are converted to lowercase, unless quoted. The following example,
"args.c", shows how to get the command line arguments.
/*---- Getting Command Line Arguments C Example ("args.c") -------------------*/
/* ANSI C Headers */
#include <stdio.h>
#include <stdlib.h>
/* Main Program starts here */
int main( int argc, char *argv[] )
{
int i;
/* End of declarations ... */
for ( i = 0; i < argc; i++ ) {
printf("Argument %d = \"%s\"\n", i, argv[i] );
}
exit(EXIT_SUCCESS);
}
In order to make it work, you must either run it like this (assuming you are in the directory containing the image)
$ MC SYS$DISK:[]ARGS HELLO WORLD
or define a symbol, and invoke it as a "foreign command"
$ args:==$SYS$DISK:[]ARGS
$ args hello world again
Programming Challenge 10 ________________________ Try the args program. With your new-found skills, modify Neill's calc program to take a command line argument expression, or to behave in the existing manner if one is not supplied.
On VMS systems, we often want to access keyed indexed files. This is slightly more difficult than using the standard file functions, because you have to set up the RMS (Record Management Structures) yourself. If you want to do this, you should really read the DEC C documentation about using RMS from C. Alternatively you can ask your friendly local Clift for an example. Here's one we prepared before the course, "key.c"
/*---- Keyed Index File C Demonstration Program ("key.c") --------------------*/
/*
History:
Version Name Date
V01-001 Neill Clift 09-Mar-1995
Initial version
*/
/* ANSI Headers */
#include <stddef.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
/* VMS Headers */
#include <rab.h> /* RMS RAB */
#include <rms.h> /* RMS access blocks etc */
#include <ssdef.h> /* System service completion codes */
#include <starlet.h>
#include <lib$routines>
/* Defines and Macros */
#define NAME_SIZE 32 /* Size of person field */
#define PHONE_SIZE 20 /* Size of the phone number field */
/* Structure declarations */
struct phone_r {
/* Define the structure that will be the record of the keyed file. */
/* It is indexed with two keys for each of the structures fields. */
char name[NAME_SIZE];
char phone[PHONE_SIZE];
};
/* Global Variables - not externally visible */
static struct FAB fab; /* FAB for file */
static struct RAB rab; /* RAB for file */
static struct NAM nam; /* NAM block to report I/O errors nicely */
static struct XABKEY xabkey1, xabkey2; /* XAB to define keys structure */
static char essbuf[NAM$C_MAXRSS]; /* Expanded file name */
static char rssbuf[NAM$C_MAXRSS]; /* Resultant file name */
static char keyname1[32] = "Person"; /* Name of first key */
static char keyname2[32] = "Phone"; /* Name of second key */
/*---- Routine to close the RMS file -----------------------------------------*/
int close_file( void )
{
long int status;
/* End of declarations ... */
status = sys$close( &fab );
return( status );
}
/*---- Open/Create the keyed index file --------------------------------------*/
int create_file( char *filename )
{
long int status, status1;
/* End of declarations ... */
fab = cc$rms_fab; /* initialise the FAB */
fab.fab$l_alq = 100; /* Preallocate space */
fab.fab$w_deq = 100;
fab.fab$b_fac = FAB$M_PUT|FAB$M_DEL|FAB$M_UPD;
fab.fab$l_fop = FAB$M_DFW|FAB$M_CIF;
fab.fab$b_org = FAB$C_IDX;
fab.fab$b_rfm = FAB$C_VAR;
fab.fab$l_fna = filename;
fab.fab$b_fns = strlen(filename);
fab.fab$b_rat = FAB$M_CR;
fab.fab$l_xab = (char *) &xabkey1;
/* Init XABKEY to define key for name key */
xabkey1 = cc$rms_xabkey; /* Initialise XABKEY structure */
xabkey1.xab$b_bln = XAB$C_KEYLEN;
xabkey1.xab$b_cod = XAB$C_KEY;
xabkey1.xab$b_dtp = XAB$C_STG;
xabkey1.xab$b_ref = 0; /* Key zero */
xabkey1.xab$l_knm = (char *) &keyname1; /* Key name */
xabkey1.xab$l_nxt = (char *) &xabkey2; /* Next XAB in chain */
/*
The next two fields describe the section of the record that contain the
key.
*/
xabkey1.xab$w_pos0 = offsetof(struct phone_r, name);
xabkey1.xab$b_siz0 = NAME_SIZE;
/* Init XABKEY to define key for phone kek */
xabkey2 = cc$rms_xabkey; /* Initialise XABKEY structure */
xabkey2.xab$b_bln = XAB$C_KEYLEN;
xabkey2.xab$b_cod = XAB$C_KEY;
xabkey2.xab$b_dtp = XAB$C_STG;
xabkey2.xab$b_ref = 1; /* Key one */
xabkey2.xab$l_knm = (char *) &keyname2;
xabkey2.xab$l_nxt = 0;
xabkey2.xab$w_pos0 = offsetof(struct phone_r, phone);
xabkey2.xab$b_siz0 = PHONE_SIZE;
/*
Init NAM block just for good file I/O error reporting. We won't use it
thought!
*/
nam = cc$rms_nam;
fab.fab$l_nam = &nam;
nam.nam$b_rss = sizeof( rssbuf );
nam.nam$l_rsa = (char *) &rssbuf;
nam.nam$b_ess = sizeof( essbuf );
nam.nam$l_esa = (char *) &essbuf;
status = sys$create( &fab );
if (!(status&SS$_NORMAL))
return( status );
rab = cc$rms_rab; /* initialise the RAB */
rab.rab$b_mbf = 127;
rab.rab$b_mbc = 127;
rab.rab$l_rop = RAB$M_WBH|RAB$M_RAH;;
rab.rab$l_fab = &fab;
status1 = sys$connect( &rab );
if (!(status1&SS$_NORMAL)) {
status = status1;
sys$close( &fab );
};
return( status );
}
/*---- Write a record to the file --------------------------------------------*/
int put_record( char *name, char *phone )
{
long int status;
struct phone_r phonerec;
/* End of declarations ... */
strncpy( phonerec.name, name, sizeof(phonerec.name) );
strncpy( phonerec.phone, phone, sizeof(phonerec.phone) );
rab.rab$w_rsz = sizeof( phonerec );
rab.rab$l_rbf = (char *) &phonerec;
rab.rab$b_rac = RAB$C_KEY;
rab.rab$l_rop |= RAB$M_UIF;
status = sys$put( &rab );
return( status );
}
/*---- Look a record up by name ----------------------------------------------*/
int get_record( char *name, struct phone_r *phonerec )
{
long int status;
/* End of declarations ... */
rab.rab$w_usz = sizeof( *phonerec );
rab.rab$l_ubf = (char *) phonerec;
rab.rab$b_ksz = strlen( name );
rab.rab$l_kbf = (char *) name;
rab.rab$b_krf = 0;
rab.rab$b_rac = RAB$C_KEY;
rab.rab$l_rop |= RAB$M_UIF;
status = sys$get( &rab );
return( status );
}
/*---- Main Program starts here ----------------------------------------------*/
int main( int argc, char *argv[] )
{
long int status;
struct phone_r phn;
/* End of declarations ... */
printf("Creating phone.dat ...\n");
status = create_file("phone.dat");
if (!(status&SS$_NORMAL)) lib$signal( status );
printf("Add record NEILL - 555 555 1417\n");
status = put_record ("NEILL", "555 555 1417");
if (!(status&SS$_NORMAL)) lib$signal( status );
printf("Add record PHIL - 555 555 6506\n");
status = put_record ("PHIL", "555 555 6506");
if (!(status&SS$_NORMAL)) lib$signal( status );
printf("Look record for PHIL\n");
status = get_record ("PHIL", &phn);
if (!(status&SS$_NORMAL)) {
lib$signal( status );
} else {
printf("Found %s - %s\n", phn.name, phn.phone );
}
status = close_file();
if (!(status&SS$_NORMAL)) lib$signal (status);
exit(EXIT_SUCCESS);
}
This will write out a data file, PHONE.DAT, then do a lookup and find a record keyed on the name. Try expanding the file with a few more records, and experiment with the lookup. Use this file to create your own database, with different types of keys.
<assert.h> <locale.h> <stddef.h> <ctype.h> <math.h> <stdio.h> <errno.h> <setjmp.h> <stdlib.h> <float.h> <signal.h> <string.h> <limits.h> <stdarg.h> <time.h>
Obviously you should avoid using these names for your own header files. In
addition, C++ also has <new.h>
and <iostream.h>
, so don't use these either. There are far
too many routines in the standard libraries for me to describe them all, which
is as good an excuse as any not to bother. I will, however, present a few
program examples or program fragments for the more commonly used routines. You
should refer to the DEC C Run Time Library Manual, using
BookReader or MGBOOK to get the latest version, and familiarize yourself with
what is available. VMS provides several Unix style functions in
<unixlib.h>
and <unixio.h>
. To be strictly
accurate, these are nonportable, but they are available on many Unix systems.
One such function is getenv
, which, in Unix land, gets the string
value of "environment variables", which roughly correspond to DCL symbols, or
logicals names. On the VAX or Alpha, the getenv
function first
looks for a logical name match, and returns the translation if it finds one,
else it looks for a local symbol with the same name and returns the definition,
or, if that wasn't found it looks for a global symbol. Compile and run
"symbols.c" and try defining MYSYM as a symbol, and as a logical name, and see
what output you get.
/*---- Getting Symbols or Logical Names C Example ("symbols.c") --------------*/
/* ANSI C Headers */
#include <stdio.h>
#include <stdlib.h>
/* Main Program starts here */
int main( int argc, char *argv[] )
{
char *s_ptr;
/* End of declarations ... */
printf("MYSYM = \"%s\"\n", ( s_ptr = getenv("MYSYM")) ? s_ptr : "" );
exit(EXIT_SUCCESS);
}
Getting the time is another commonly required function, and C provides a number of standard routines for this purpose. An
example program demonstrates the use of various time routines, including
strftime
, which is very flexible in letting you form a formatted
time string.
/*---- Getting the time ("time.c") -------------------------------------------*/
/* ANSI C Headers */
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
/* Main Program starts here */
int main( int argc, char *argv[] )
{
time_t c_time;
struct tm *c_time_rec;
char string[80];
/* End of declarations ... */
/* Get the current time */
time(&c_time); /* c_time now contains seconds since January 1, 1970 */
/* Convert this to a string using the ctime() function */
printf("%s\n", ctime( &c_time ) );
/* Split the time into it's components - hours, minutes, seconds etc. */
c_time_rec = localtime( &c_time );
/* Selectively copy day of week into string */
strftime( string, sizeof(string)-1, "Today is %A", c_time_rec );
/* Print out the day of the week */
printf("%s\n", string );
exit(EXIT_SUCCESS);
}
It is important to use the standard defined types for time variables, like
time_t
, and not use, say, unsigned int
because "you know that's what it is". I say this because some systems are going
to have a 2038 bug, a bit like the millenium bug, caused by the number of
seconds since January 1, 1970 exceeding the storage capacity of the type
currently used for time_t
. Compiler vendors might well change
time_t
to be something completely different in future, like a 64
bit quantity or perhaps a structure. If you have used time_t
throughout your code, a simple recompilation and relink will be all that you
need to do, and this will avoid your name being cursed by later generations of
programmers, or your being ejected into space by HAL ;-)
String-to-number conversion is a topic that frequently crops up in the
comp.lang.c Usenet news group. You have already met sprintf
, which
is the number-to-string converter. Other functions, like atoi
and
atof
convert strings to ints and floats, and strtod
and strtol
convert strings to double
and
long
respectively. Random integer numbers, seeded using
srand
, can be obtained using the rand
function, and if
you want floating point numbers, cast the result of rand
to
float
and divide by the macro value RAND_MAX
, also
cast to float
. Searching and sorting routines, bsearch
and qsort
are also provided. They expect you to pass a pointer to a
function which is used to determine whether objects compare equal, greater than
or less than, then they do the sorting or searching, using your function for the
comparison.
The equivalent of LIB$SPAWN
is the system()
function. The argument is either a command shell command, or NULL
to test whether a command shell is available.
if ( system(NULL) ) {
/* Do VMS command */
system("DIRECTORY")
} else {
fprintf(stderr,"Sorry - no command shell available !");
}
The atexit
function lets you register exit handlers in FILO
order, which are called when the program exits. These are useful for tidying up
resources, even if some deeply nested subroutine calls exit()
.
Handlers for other types of condition can be registered using the
signal()
function. In Unix, signals are a bit like AST
notifications. They range from SIGALRM
, which lets you know when an
alarm set by the alarm
function has gone off, to
SIGINT
which can be used to trap Ctrl C. A pair of functions,
setjmp
and longjmp
provide one of the nearest thing I
have seen to the mythical "comefrom" statement ! An example showing you how to
trap Ctrl C will demonstrate. This program includes
<signal.h>
and <setjmp.h>
, and stores the
position to which we want to return in saved_position
, which is of
type jmp_buf
. You should never actually "look" at this, because it
is only meaningful as an argument to longjmp
. The second parameter
to longjmp
is an non-zero integer, which will be returned as the
value of setjmp
when we come back to it from the
longjmp
. Called directly, setjmp
returns zero. The
example "signal.c" should make signal handling and longjmp
ing
clearer.
/*---- Signal and longjmp Example ("signal.c") -------------------------------*/
/* ANSI C Headers */
#include <setjmp.h>
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#if defined( _WIN32 )
# include "windows.h"
#endif
/* Defines and macros */
#define MAX_COUNT 5
/* Global variables */
jmp_buf saved_position;
/* Function prototypes */
void ctrl_c_handler( int scode );
/* Main Program starts here */
int main( int argc, char *argv[] )
{
int icount;
/* End of declarations ... */
icount = 0;
signal( SIGINT, ctrl_c_handler );
if ( !setjmp(saved_position) ) {
for ( ; icount < MAX_COUNT; ++icount ) {
printf("At main line - looping #%d - enter Ctrl/C\n", icount);
#if !defined( _WIN32 )
sleep( 10 );
#else
Sleep( 10000 );
#endif
};
} else {
printf("Returned from Ctrl C handler - exiting early\n");
}
exit(EXIT_SUCCESS);
}
void ctrl_c_handler( int scode )
{
/* End of declarations ... */
if ( scode == SIGINT ) {
printf("Handling Ctrl C - return to position saved by setjmp()\n");
longjmp( saved_position, 1 ); /* Use any non-zero number */
} else {
printf("Strange - Ctrl C handler called with wrong signal code !\n");
}
}
To get "signal.c" to link properly, you must compile it with
/PREFIX=ALL
to properly prefix the nonstandard function call to
sleep
, a <unixlib.h>
function. All the
recognised RTL functions are actually prefixed with "DECC$" by the compiler, and
this allows the linker to find them automatically at link time. It is a good
idea to get into the habit of using /PREFIX=ALL
because this will
cause warnings to be issued at link time if you inadvertently name any of your
own functions so as to clash with inbuilt ones. This example works slightly
differently under Windows, because although the Ctrl C handler is called, the
program then exits (this is the documented behaviour).
Finally, I will present an example of a function that can be called with
variable arguments. The C syntax for variable arguments
is to use three dots, ...
, to represent the variable arguments. You
must however, specify at least one argument at the start of the parameter list.
In one way, <stdarg.h>
routines are rather inferior to the
nonstandard <varargs.h>
routines, available on Unix and VMS,
because the latter can tell you how many arguments were passed, whereas the
former makes you tell the function in some way, like the "%" conversion
characters in a printf
format string does. Y.R.L. programmers - for
some examples of the nonstandard varargs mechanism (which you should avoid using
if at all possible) see SRC$OLB:PRINTFILE.C
and FIFO.C
in GNRC
. Here is an example program using the standard
stdargs
mechanism, called "vargs.c".
/*---- Variable Arguments C Example ("vargs.c") ------------------------------*/
/* ANSI C Headers */
#include <stdarg.h>
#include <stdio.h>
#include <stdlib.h>
/* Defines and macros */
#define END 0
/* Function prototypes */
int add_some_ints( int first, ...);
/* Main Program starts here */
int main( int argc, char *argv[] )
{
int total;
/* End of declarations ... */
total = add_some_ints( 1, 2, 3, 4, END);
printf("Total was %d\n", total );
exit(EXIT_SUCCESS);
}
/*---- Rather pathetic function to add up some integers ----------------------*/
int add_some_ints( int first, ...)
{
va_list ap; /* Use to point to each argument in turn */
int inext, sum, icount;
/* End of declarations ... */
icount = 1;
sum = first;
va_start( ap, first ); /* Use last named argument - first in this case */
/* End of args is marked by END, which is zero */
while ( inext = va_arg( ap, int ) ) { /* Second arg of va_arg is type */
sum += inext;
++icount;
}
va_end( ap ); /* Tidy up */
printf("add_some_ints added %d integers\n", icount );
return( sum );
}
The best way to find out about the run time library functions is to look them up in K&R, or with VMS HELP, then have a go at using them.
#define
definitely gives you the extra scope to really muddy the water. Here are some
examples taken from comp.lang.c, compiled by [email protected] (Peter
Conrad), of the contest to find the shortest C program
to count from a given number i1 up to or down to a second number, i2. To compile
these you will need to use CC/DEFINE="o(arg)=printf(""%d"",arg)" , which is a
bit of a cheat.
In no particular order (they are all 69 bytes):
[01] main(a,y,d)int*y;{for(a=atoi(y[1]);o(a),d=atoi(y[2])-a;)a+=(d|1)%2;} [02] main(c,v,x)int*v;{for(c=atoi(v[1]);o(c),x=atoi(v[2])-c;c+=(x|1)%2);} [03] main(a,v)int*v;{for(a=atoi(*++v);o(a),*v=atoi(v[1])-a;a+=*v>>31|1);} [04] main(i,v,j)int*v;{for(i=atoi(v[1]);o(i),j=atoi(v[2])-i;i+=(j|1)%2);} [05] main(c,d){int*v=d;for(c=atoi(v[1]);o(c),d=atoi(v[2])-c;c+=d>>31|1);} [06] main(d,O,_)int*O;{for(_=atoi(O[1]);o(_),d=_-atoi(O[2]);_-=d>>-1|1);} [07] main(a,v)int*v;{for(a=atoi(v[1]);o(a),*v=atoi(v[2])-a;a+=*v>>31|1);} [08] main(a,b)int*b;{for(a=atoi(*++b);o(a),*b=atoi(b[1])-a;a+=(*b|1)%2);} [09] main(a,b)int*b;{for(a=atoi(b[1]);o(a),*b=atoi(b[2])-a;a+=*b>>31|1);} [10] main(J,_)int*_;{for(J=atoi(*++_);o(J),*_=atoi(_[1])-J;J+=(*_|1)%2);} [11] main(i,b,j)int*b;{for(i=atoi(b[1]);o(i),j=i-atoi(b[2]);i-=j>>31|1);} [12] main(k,j,i)int*j;{for(k=atoi(j[1]);o(k),i=atoi(j[2])-k;k+=(i|1)%2);} [13] main(n,a,e)int*a;{for(n=atoi(a[1]);o(n),e=atoi(a[2])-n;n+=e>>31|1);} [14] a;main(c,d)int*d;{for(a=atoi(d[1]);o(a),c=atoi(d[2])-a;a+=(c|1)%2);} [15] main(n,c,d)int*c;{for(n=atoi(c[1]);o(n),d=atoi(c[2])-n;n+=d>>31|1);} [16] main(d,a)int*a;{for(d=atoi(a[1]);o(d),*a=atoi(a[2])-d;d+=(*a|1)%2);} [17] *p;main(i,x){for(i=atoi((p=x)[1]);o(i),x=atoi(p[2])-i;x>0?i++:i--);} [18] main(c,v,d)int*v;{for(c=atoi(v[1]);o(c),d=atoi(v[2])-c;c+=d>>31|1);} [19] main(c,v,d)int*v;{for(c=atoi(v[1]);d;o(d>0?c++:c--))d=atoi(v[2])-c;}
These are the winners of the 1. Int'l Kaiserslautern Shortest C Contest (the numbers refer to the programs above):
[01] Lars C. Hassing <[email protected]> [02] Stefan Bock <[email protected]> [03] Heather Downs <[email protected]> [04] Patrick Seemann <[email protected]> [05] Roland Nagel <[email protected]> [06] Klaus Singvogel <[email protected]>, Michael Schroeder <[email protected]>, Markus Kuhn <[email protected]> [07] Markus Simmer <[email protected]> [08] Willy Seibert <[email protected]> [09] Oliver Bianzano <[email protected]> [10] Jens Schweikhardt <[email protected]> [11] Thomas Omerzu <[email protected]>, Matthias Sachs <[email protected]>, Udo Salewski <[email protected]> [12] Jahn Rentmeister <[email protected]> [13] Gregor Hoffleit <[email protected]> [14] John Rochester <[email protected]> [15] Markus Siegert <[email protected]> [16] Siegmar Zaeske <[email protected]> [17] Arnd Gerns <[email protected]>, Dirk Eiden <[email protected]>, Steffen Moeller <[email protected]> [18] James C. Hu <[email protected]> [19] Frank Neblung <[email protected]>
I think you would agree that these would not be much fun to maintain ! Another favourite pastime for obfuscators is the self-producing program. Here is one of my favourites - strictly ANSI compliant, including all headers, by Ashley Roll.
#include <stdio.h> /* Ashley Roll [email protected] */
main(){char *s,*a,*t;for(t=s="7}|wzarq6*gbr}{<~,6;@6Ug~zqm6H{zz6uh{zzVsaw}g<\\\
w}b<sa<qra<ua6@; yu}|>=ow~uh6@g:@u:@b/t{h>b)g)8_8/@g/g??= }t>@g))3K3=t{h>u)b\\\
/@u/u??=o@u))3JJ3+fh}|bt>8JJJJJJJ|8=0fabw~uh>@u=/i qzgq@g5)3JJ3+fabw~uh>@g9%\\\
L(%=0%/i6;@6Sh}tt}b~6A|}dqhg}bm:6Xh}gxu|q6Uagbhuz}u6@; ";*s;s++)
if(*s=='_')for(a=t;*a;a++){*a=='\\'?printf("\\\\\\\n"):putchar(*a);}
else*s!='\\'?putchar(*s-1^21):1;} /* Griffith University, Brisbane Australia */
This is in the course directory, called "self.c". As a final treat, here is a festive Christmas program, "xmas.c", by [email protected] (Brendan Hassett), based on the more traditional, but equally obfuscated program by Ian Phillipps.
/*
From: [email protected] (Brendan Hassett)
+------------------------------------------------------------------+
| Brendan Hassett Tel +353-902-74601 ext 1109 ECN 830-1109 |
| [email protected] [email protected] ~~~ |
| Ericsson Systems Expertise Ltd, Athlone, Ireland, EU. ( o o ) |
+--------------------------------------------------------ooO-(_)-Ooo
#include <disclaimer.h>
*/
/*
Based on an original program by Ian Phillipps,
Cambridge Consultants Ltd., Cambridge, England
*/
#include <stdio.h>
#define __ main
__(t,_,a)
char
*
a;
{
return!
0<t?
t<3?
__(-79,-13,a+
__(-87,1-_,
__(-86, 0, a+1 )
+a)):
1,
t<_?
__(t+1, _, a )
:3,
__ ( -94, -27+t, a )
&&t == 2 ?_
<13 ?
__ ( 2, _+1, "%s %d %d\n" )
:9:16:
t<0?
t<-72?
__( _, t,
"k,#n'+,#'/\
*{}w+/\
w#cdnr/\
+,{}r/\
*de}+,/\
*{*+,/\
w{%+,/\
w#q#n+,/\
#{l,+,/\
n{n+,/\
+#n+,/#;\
#q#n+,/\
+k#;*+,/\
'-el' ))# }#r]'K:'K n l#}'w {r'+d'K#!/\
+#;;'+,#K'{+w' '*# +e}#]!/\
w :'{+w'nd+'we))d}+#r]!/\
c, nl#'+,#'rdceK#n+ +{dn]!/\
-; K#'{+'dn'+,#', }rk }#]!/\
*{nr' 'k :' }denr'{+]!/\
w :'+,#:'n##r' n'e)l} r#]!/\
}#{nw+ ;;'+,#'wd*+k }#]!/\
w['*d}' 'reK)]!/\
ew#' 'r#-ell#}]!/\
+}:'+d'}#)}drec#'{+]!/\
w['+,#K',dk'+,#:'r{r'{+]!/\
w##'{*'{+', ))#nw' l {n(!!/")
:
t<-50?
_==*a ?
putchar(a[31]):
__(-65,_,a+1)
:
__((*a == '/') + t, _, a + 1 )
:
0<t?
__ ( 2, 2 , "%s")
:*a=='/'||
__(0,
__(-61,*a,
"!ek;dc i@bK'(q)-[w]*\
%n+r3#l,{}:\nuwloca-O\
;m .vpbks,fxntdCeghiry")
,a+1);}
Beat that. This evokes a slight whinge in DEC C's
strictest ANSI mode, because main
takes more than two parameters,
but compile, link, run and enjoy !
exit
) where people have been lazy and not included
<stdlib.h>
and <stdio.h>
. You can always
turn informational warnings off, but this is a bit suspect because you then
don't get alerted to other problems that might be important. I usually just
ensure that every bit of code has
#include <stdlib.h>
#include <stdio.h>
at the top. Make sure that all exit()
s are
exit(EXIT_SUCCESS)
or exit(EXIT_FAILURE)
. Avoid "magic
number" exit codes. Unix programmers often think that everyone understands that
exit(0)
means no problems, and often don't use the ANSI
exit(EXIT_SUCCESS)
which will be correct for any O.S.. If you
specifically want to return a VMS status (other than normal successful
completion) then use #ifdef __VMS
for the VMS specific part where
possible. In a nutshell, make sure that you write ANSI C. All the programs that I have converted to ANSI C have still worked under VAX C,
provided I use #pragma
in certain places (see later). I prefer
#pragma
solutions to just adding global qualifiers, because they
are closer to the machine-specific code, and flag the fact that you are doing
something special.
main
programs in one library and hence want to use the
MAIN_PROGRAM
macro to show that myprog(...)
is really
main(...)
. The use of MAIN_PROGRAM
is nonportable, and hence the
compiler will whinge. Similarly you might be using variant_unions for, say, VMS
item lists , which are so VMS specific that you are happy to use the nonportable
extension in that (limited) portion of code because it is convenient. To stop
complaints about this sort of thing, I do the following (which works with VAX
C and DEC C) ...
#ifdef __VMS
# ifdef __DECC
# pragma message save
# pragma message disable portable
# endif
int myprog(int argc, char *argv[])
MAIN_PROGRAM /* VMS specific macro to identify main() */
# ifdef __DECC
# pragma message restore
# endif
#else
int main(int argc, char *argv[]) /* Standard C version */
#endif
Here is an example of a VMS item list for system service calls ...
#ifdef __DECC
# pragma message save
# pragma message disable portable
#endif
/* VMS Item List structure */
struct item_list {
variant_union {
variant_struct {
short int w_buflen;
short int w_code;
long int *l_bufptr;
long int *l_retlenptr;
} list_structure;
int end_list;
} whole_list;
};
#ifdef __DECC
# pragma message restore
#endif
Obviously, completely nonportable constructs should be used sparingly, because although we have turned the error off, they are still nonstandard !
/* Following structures MUST be packed tightly ie. no member alignment */
#ifdef __DECC
# pragma member_alignment save /* Push current default alignment on to 'stack'*/
# pragma nomember_alignment
#endif
struct my_struct_s my_struct;
struct other_struct_s other_struct;
#ifdef __DECC
# pragma member_alignment restore /* Restore previous default alignment */
#endif
Don't be tempted to force nomember_alignment
for everything.
Restrict it to individual structures as shown above. Imposing it globally is
likely to cause performance penalties, or mysterious crashes if you call
routines that expect natural alignment.
/* FORTRAN common blocks are extern to everything -----------------------*/
#ifdef __DECC
# pragma extern_model save
#endif
#ifdef __DECC
# ifdef __ALPHA
/* Default if not overridden by options file is NOSHR for Alpha Fortran common*/
# pragma extern_model common_block noshr
# else
/* Default if not overridden by options file is SHR for VAX Fortran common */
# pragma extern_model common_block shr
# endif
#endif
extern struct {
int a; /* These names don't matter, type/size must be correct */
int b;
int c;
char string[40];
} COMBLK; /* Case matters if linker is case sensitive */
#ifdef __DECC
# pragma extern_model restore
#endif
Then if your Fortran common block looked like this ...
*
* Common:
*
INTEGER A
INTEGER B
INTEGER C
CHARACTER*40 STRING
COMMON /COMBLK/ A,B,C,STRING
... your C function would access the members thus
void SET_FORTRAN_COMMON(void)
{
COMBLK.a = 1;
COMBLK.b = 2;
COMBLK.c = 3;
sprintf(COMBLK.string,"Hello World");
}
#ifdef __DECC
# pragma extern_model common_block noshr
#endif
.
extern struct {struct patsec patsec;} ZZZZ_PATSEC;
.
#ifdef __DECC
# pragma extern_model restore
#endif
Then you can refer to it like this ...
num = ZZZZ_PATSEC.patsec.ckp.apt[ndx-1].patrol;
Here the common block _containing_ the patsec structure is called ZZZZ_PATSEC, but we need to specify .patsec even though the common block in that case only contains one item (Previous example had .a, .b, .c and .string) .
struct dsc$descriptor_s
{
unsigned short dsc$w_length; /* length of data item in bytes,
or if dsc$b_dtype is DSC$K_DTYPE_V, bits,
or if dsc$b_dtype is DSC$K_DTYPE_P,
digits (4 bits each) */
unsigned char dsc$b_dtype; /* data type code */
unsigned char dsc$b_class; /* descriptor class code = DSC$K_CLASS_S */
char *dsc$a_pointer; /* address of first byte of data storage */
};
It is often convenient to use the $DESCRIPTOR macro defined in
>descrip.h<
to set up a descriptor. This expects the address
of the string to be _constant_ so you must declare the string referred to by the
descriptor as static to avoid compilation warnings. Example:
static char user[16];
$DESCRIPTOR( user_dsc, user );
.
istatus = CALL_FORTRAN_ROUTINE( &user_dsc );
.
Be aware that the $DESCRIPTOR macro will set the string length to be one less
than sizeof(string)
to allow for the fact that C may have a terminating null character. Hence, in the example
above, CALL_FORTRAN_ROUTINE will think it has been passed a CHARACTER*15 string.
#ifdef __DECC
# pragma extern_model save
# pragma extern_model globalvalue
#else
# ifdef VAXC
# define extern globalvalue
# endif
#endif
/* Put _declarations_ of message parameters here */
extern cars__erropen, cars__errconn, cars__errread, cars__errwrite;
#ifdef __DECC
# pragma extern_model restore
#else
# ifdef VAXC
# undef extern
# endif
#endif
If you wanted to use the same codes on a different machine, you could then define the integer error code values in a separate .c file and link against the error code object.
This is an approved Amazon.com Associates site, and if you click on the titles below it will take you straight to the Amazon Books order page for that book. Amazon.com offers a safe option for Internet purchases: The Netscape Secure Commerce Server, which encrypts any information you type in. Click here to read their policy on privacy and security. |
Back to Phil Ottewell's Home Page