A Guide To Tips and Tricks For C Programming
A Guide To Tips and Tricks For C Programming
for C programming
by Jim Hall
We are Opensource.com
Do you have an open source story to tell? Submit a story idea at opensource.com/story
Email us at [email protected]
Table of Contents
5 common bugs in C programming and how to fix them..................................................................4
Write a guessing game in ncurses.......................................................................................................13
Position text with ncurses.....................................................................................................................18
Write a chess game using bit-fields and masks................................................................................22
Short option parsing using getopt.....................................................................................................26
Learn how file input and output works in C.......................................................................................32
Learn C by writing a simple game......................................................................................................38
Parsing data with strtok in C................................................................................................................41
Programming on FreeDos: Print a Halloween greeting with ASCII art.........................................46
Get started programming with DOS conio........................................................................................51
How to program in C on FreeDOS......................................................................................................61
By Jim Hall
Even the best programmers can create programming bugs. Depending on what your program
does, these bugs could introduce security vulnerabilities, cause the program to crash, or
create unexpected behavior.
The C programming language sometimes gets a bad reputation because it is not memory safe
like more recent programming languages, including Rust. But with a little extra code, you can
avoid the most common and most serious C programming bugs. Here are five bugs that can
break your application and how you can avoid them:
1. Uninitialized variables
When the program starts up, the system will assign it a block of memory that the program
uses to store data. That means your variables will get whatever random value was in memory
when the program started.
Some environments will intentionally "zero out" the memory as the program starts up, so
every variable starts with a zero value. And it can be tempting to assume in your programs
that all variables will begin at zero. However, the C programming specification says that the
system does not initialize variables.
Consider a sample program that uses a few variables and two arrays:
#include <stdio.h>
#include <stdlib.h>
int
main()
{
The program does not initialize the variables, so they start with whatever values the system
had in memory at the time. Compiling and running this program on my Linux system, you'll see
that some variables happen to have "zero" values, but others do not:
Compiling the same program on a different system further shows the danger in uninitialized
variables. Don't assume "all the world runs Linux" because one day, your program might run on
a different platform. For example, here's the same program running on FreeDOS:
Always initialize your program's variables. If you assume a variable will start with a zero value,
add the extra code to assign zero to the variable. This extra bit of typing upfront will save you
headaches and debugging later on.
Some programmers sometimes forget this and introduce "off by one" bugs where they
reference the array starting at one. In an array that is five elements long, the value the
programmer intended to find at array element "5" is not actually the fifth element of the array.
Instead, it is some other value in memory, not associated with the array at all.
Here's an example that goes well outside the array bounds. The program starts with an array
that's only five elements long but references array elements from outside that range:
Note that the program initializes all the values of the array, from 0 to 4, but then tries to read
0 to 9 instead of 0 to 4. The first five values are correct, but after that you don’t know what
the values will be:
numbers[0] = 0
numbers[1] = 1
numbers[2] = 2
When referencing arrays, always keep track of its size. Store that in a variable; don't hard-code
an array size. Otherwise, your program might stray outside the array bounds when you later
update it to use a different array size, but you forget to change the hard-coded array length.
3. Overflowing a string
Strings are just arrays of a different kind. In the C programming language, a string is an array
of char values, with a zero character to indicate the end of the string.
And so, like arrays, you need to avoid going outside the range of the string. This is sometimes
called overflowing a string.
One easy way to overflow a string is to read data with the gets function. The gets function is
very dangerous because it doesn't know how much data it can store in a string, and it naively
reads data from the user. This is fine if your user enters short strings like foo but can be
disastrous when the user enters a value that is too long for your string value.
Here's a sample program that reads a city name using the gets function. In this program, I've
also added a few unused variables to show how string overflow can affect other data:
#include <stdio.h>
#include <string.h>
int
That program works fine when you test for similarly short city names, like Chicago in Illinois
or Raleigh in North Carolina:
var1 = 1; var2 = 2
Where do you live?
Raleigh
<Raleigh> is length 7
var1 = 1; var2 = 2
Ok
var1 = 1; var2 = 2
Where do you live?
Llanfairpwllgwyngyllgogerychwyrndrobwllllantysiliogogogoch
<Llanfairpwllgwyngyllgogerychwyrndrobwllllantysiliogogogoch> is length 58
var1 = 2036821625; var2 = 2003266668
Ok
Segmentation fault (core dumped)
Before aborting, the program used the long string to overwrite other parts of memory. Note
that var1 and var2 no longer have their starting values of 1 and 2.
However, you should only use the free function once. Calling free a second time will result
in unexpected behavior that will probably break your program. Here's a short example
program to show that. It allocates memory, then immediately releases it. But like a forgetful-
but-methodical programmer, I also freed the memory at the end of the program, resulting in
freeing the same memory twice:
#include <stdio.h>
#include <stdlib.h>
int
main()
{
int *array;
puts("malloc an array ...");
array = malloc(sizeof(int) * 5);
if (array) {
puts("malloc succeeded");
puts("Free the array...");
free(array);
}
puts("Free the array...");
free(array);
puts("Ok");
}
Running this program causes a dramatic failure on the second use of the free function:
Avoid calling free more than once on an array or string. One way to avoid freeing memory
twice is to locate the malloc and free functions in the same function.
For example, a solitaire program might allocate memory for a deck of cards in the main
function, then use that deck in other functions to play the game. Free the memory in the main
function, rather than some other function. Keeping the malloc and free statements
together helps to avoid freeing memory more than once.
So the ability to read data from a file is important for pretty much all programs. But what if the
file you want to read isn't there?
To read a file in C, you first open the file using the fopen function, which returns a stream
pointer to the file. You can use this pointer with other functions to read data, such as fgetc
to read the file one character at a time.
If the file you want to read isn't there or isn't readable by your program, then the fopen
function will return NULL as the file pointer, which is an indication the file pointer is invalid. But
here's a sample program that innocently does not check if fopen returned NULL and tries to
read the file regardless:
#include <stdio.h>
int
main()
{
FILE *pfile;
int ch;
puts("Open the FILE.TXT file ...");
pfile = fopen("FILE.TXT", "r");
/* you should check if the file pointer is valid, but we skipped that */
puts("Now display the contents of FILE.TXT ...");
while ((ch = fgetc(pfile)) != EOF) {
printf("<%c>", ch);
When you run this program, the first call to fgetc results in a spectacular failure, and the
program immediately aborts:
Always check the file pointer to ensure it's valid. For example, after calling fopen to open a
file, check the pointer's value with something like if (pfile != NULL) to ensure that the
pointer is something you can use.
We all make mistakes, and programming bugs happen to the best of programmers. But if you
follow these guidelines and add a little extra code to check for these five types of bugs, you
can avoid the most serious C programming mistakes. A few lines of code up front to catch
these errors may save you hours of debugging later.
By Jim Hall
In my last article, I gave a brief introduction to using the ncurses library to write text-mode
interactive applications in C. With ncurses, we can control where and how text gets displayed
on the terminal. If you explore the ncurses library functions by reading the manual pages,
you’ll find there are a ton of different ways to display text, including bold text, colors, blinking
text, windows, borders, graphic characters, and other features to make your application stand
out.
If you’d like to explore a more advanced program that demonstrates a few of these interesting
features, here’s a simple “guess the number” game, updated to use ncurses. The program
picks a random number in a range, then asks the user to make repeated guesses until they
find the secret number. As the user makes their guess, the program lets them know if the
guess was too low or too high.
Note that this program limits the possible numbers from 0 to 7. Keeping the values to a
limited range of single-digit numbers makes it easier to use getch() to read a single number
from the user. I also used the getrandom kernel system call to generate random bits, masked
with the number 7 to pick a random number from 0 (binary 0000) to 7 (binary 0111).
#include <curses.h>
#include <string.h> /* for strlen */
#include <sys/random.h> /* for getrandom */
int
random0_7()
{
int num;
getrandom(&num, sizeof(int), GRND_NONBLOCK);
return (num & 7); /* from 0000 to 0111 */
}
int
read_guess()
{
By using ncurses, we can add some visual interest. Let’s add functions to display important
text at the top of the screen and a message line to display status information at the bottom of
the screen.
void
print_header(const char *text)
{
move(0, 0);
clrtoeol();
attron(A_BOLD);
mvaddstr(0, (COLS / 2) - (strlen(text) / 2), text);
attroff(A_BOLD);
refresh();
}
void
print_status(const char *text)
{
move(LINES - 1, 0);
clrtoeol();
attron(A_REVERSE);
mvaddstr(LINES - 1, 0, text);
attroff(A_REVERSE);
refresh();
}
With these functions, we can construct the main part of our number-guessing game. First, the
program sets up the terminal for ncurses, then picks a random number from 0 to 7. After
displaying a number scale, the program then enters a loop to ask the user for their guess.
As the user makes their guess, the program provides visual feedback. If the guess is too low,
the program prints a left square bracket under the number on the screen. If the guess is too
high, the game prints a right square bracket. This helps the user to narrow their choice until
they guess the correct number.
int
main()
{
int number, guess;
Copy this program and compile it for yourself to try it out. Don’t forget that you need to tell
GCC to link with the ncurses library:
I’ve left the debugging line in there, so you can see the secret number near the upper-right
corner of the screen:
By Jim Hall
Most Linux utilities just scroll text from the bottom of the screen. But what if you wanted to
position text on the screen, such as for a game or a data display? That's where ncurses
comes in.
curses is an old Unix library that supports cursor control on a text terminal screen. The name
curses comes from the term cursor control. Years later, others wrote an improved version of
curses to add new features, called new curses or ncurses. You can find ncurses in every
modern Linux distribution, although the development libraries, header files, and
documentation may not be installed by default. For example, on Fedora, you will need to
install the ncurses-devel package with this command:
These functions are defined in the curses.h header file, which you'll need to include in your
program with:
#include <curses.h>
After initializing the terminal, you're free to use any of the ncurses functions, some of which
we'll explore in a sample program.
For example, if you wanted to move the cursor to line 10 and column 30, you could use the
move function with those coordinates:
move(10, 30);
Any text you display after that will start at that screen location. To display a single character,
use the addch(c) function with a single character. To display a string, use addstr(s) with
your string. For formatted output that's similar to printf, use printw(fmt, …) with the usual
options.
Moving to a screen location and displaying text is such a common thing that ncurses provides
a shortcut to do both at once. The mvaddch(row, col, c) function will display a character at
screen location row,col. And the mvaddstr(row, col, s) function will display a string at that
location. For a more direct example, using mvaddstr(10, 30, "Welcome to ncurses"); in a
program will display the text "Welcome to ncurses" starting at row 10 and column 30. And the
line mvaddch(0, 0, '+'); will display a single plus sign in the upper-left corner at row 0 and
column 0.
Drawing text to the terminal screen can have a performance impact on certain systems,
especially on older hardware terminals. So ncurses lets you "stack up" a bunch of text to
display to the screen, then use the refresh() function to make all of those changes visible to
the user.
#include <curses.h>
int
main()
The program starts by initializing the terminal, then prints a plus sign in the upper-left corner,
a minus in the lower-left corner, and the text "press any key to quit" at row 10 and column 30.
The program gets a single character from the keyboard using the getch() function, then uses
endwin() to reset the terminal before the program exits completely.
getch() is a useful function that you could use for many things. I often use it as a way to
pause before I quit the program. And as with most ncurses functions, there's also a version of
getch() called mvgetch(row, col) to move to screen position row,col before waiting for a
character.
Running the new program will print a simple "press any key to quit" message that's more or
less centered on the screen:
$ man 3x curs_printw
or just:
$ man curs_printw
With ncurses, you can create more interesting programs. By printing text at specific locations
on the screen, you can create games and advanced utilities to run in the terminal.
By Jim Hall
Let's say you were writing a chess game in C. One way to track the pieces on the board is by
defining a structure that defines each possible piece on the board, and its color, so every
square contains an element from that structure. For example, you might have a structure that
looks like this:
struct chess_pc {
int piece;
int is_black;
}
With this programming structure, your program will know what piece is in every square and its
color. You can quickly identify if the piece is a pawn, rook, knight, bishop, queen, or king—and
if the piece is black or white. But there's a more straightforward way to track the same
information while using less data and memory. Rather than storing a structure of two int
values for every square on a chessboard, we can store a single int value and use binary bit-
fields and masks to identify the pieces and color in each square.
To list all pieces on a chessboard, we only need the three bits that represent (from right to
left) the values 1, 2, and 4. For example, the number 6 is binary 110. All of the other bits in the
binary representation of 6 are zeroes.
And with a bit of cleverness, we can use one of those extra always-zero bits to track if a piece
is black or white. We can use the number 8 (binary 00001000) to indicate if a piece is black. If
this bit is 1, it's black; if it's 0, it's white. That's called a bit-field, which we can pull out later
using a binary mask.
/* game pieces */
#define EMPTY 0
#define PAWN 1
#define ROOK 2
#define KNIGHT 3
#define BISHOP 4
#define QUEEN 5
#define KING 6
/* piece color (bit-field) */
#define BLACK 8
#define WHITE 0
/* piece only (mask) */
#define PIECE 7
When you assign a value to a square, such as when initializing the chessboard, you can assign
a single int value to track both the piece and its color. For example, to store a black rook in
position 0,0 of an array, you would use this code:
int board[8][8];
..
board[0][0] = BLACK | ROOK;
00001000 = 8
OR 00000010 = 2
________
00001010 = 10
Similarly, to store a white pawn in position 6,0 of the array, you could use this:
This stores the value 1 because the binary OR of WHITE (0) and PAWN (1) is just 1:
00000000 = 0
OR 00000001 = 1
________
00000001 = 1
For example, the program might need to know the contents of a specific square on the board
during the chess game, such as the array element at board[5][3]. What piece is there, and
is it black or white? To identify the chess piece, combine the element's value with the PIECE
mask using the binary AND:
int board[8][8];
int piece;
..
piece = board[5][3] & PIECE;
The binary AND operator (&) combines two binary values so that for any bit position, if that
bit in both numbers is 1, then the result is also 1. For example, if the value of board[5][3] is 11
(binary 00001011), then the binary AND of 11 and the mask PIECE (7, or binary 00000111) is
binary 00000011, or 3. This is a knight, which also has the value 3.
Separating the piece's color is a simple matter of using binary AND with the value and the
BLACK bit-field. For example, you might write this as a function called is_black to
determine if a piece is either black or white:
int
is_black(int piece)
{
return (piece & BLACK);
}
This works because the value BLACK is 8, or binary 00001000. And in the C programming
language, any non-zero value is treated as True, and zero is always False. So
is_black(board[5][3]) will return a True value (8) if the piece in array element 5,3 is
black and will return a False value (0) if it is white.
Bit fields
Using bit-fields and masks is a common method to combine data without using structures.
They are worth adding to your programmer's "tool kit." While data structures are a valuable
tool for ordered programming where you need to track related data, using separate elements
to track single On or Off values (such as the colors of chess pieces) is less efficient. In these
cases, consider using bit-fields and masks to combine your data more efficiently.
By Jim Hall
Writing a C program to process files is easy when you already know what files you'll operate on
and what actions to take. If you "hard code" the filename into your program, or if your program
is coded to do things only one way, then your program will always know what to do.
But you can make your program much more flexible if it can respond to the user every time
the program runs. Let your user tell your program what files to use or how to do things
differently. And for that, you need to read the command line.
int main()
That's the simplest way to start a C program. But if you add these standard parameters in the
parentheses, your program can read the options given to it on the command line:
The argc variable is the argument count or the number of arguments on the command line.
This will always be a number that's at least one.
The argv variable is a double pointer, an array of strings, that contains the arguments from
the command line. The first entry in the array, *argv[0], is always the name of the program.
The other elements of the **argv array contain the rest of the command-line arguments.
I'll write a simple program to echo back the options given to it on the command line. This is
similar to the Linux echo command, except it also prints the name of the program. It also
prints each command-line option on its own line using the puts function:
Compile this program and run it with some command-line options, and you'll see your
command line printed back to you, each item on its own line:
This command line sets the program's argc to 8, and the **argv array contains eight
entries: the name of the program, plus the seven words the user entered. And as always in C
programs, the array starts at zero, so the elements are numbered 0, 1, 2, 3, 4, 5, 6, 7. That's
why you can process the command line with the for loop using the comparison i < argc.
You can use this to write your own versions of the Linux cat or cp commands. The cat
command's basic functionality displays the contents of one or more files. Here's a simple
version of cat that reads the filenames from the command line:
#include <stdio.h>
void
copyfile(FILE *in, FILE *out)
{
int ch;
while ((ch = fgetc(in)) != EOF) {
fputc(ch, out);
}
}
int
This simple version of cat reads a list of filenames from the command line and displays the
contents of each file to the standard output, one character at a time. For example, if I have
one file called hello.txt that contains a few lines of text, I can display its contents with my
own cat command:
$ ./cat hello.txt
Hi there!
This is a sample text file.
Using this sample program as a starting point, you can write your own versions of other Linux
commands, such as the cp program, by reading only two filenames: one file to read from and
another file to write the copy.
Fortunately, there's an easy way to read these from the command line. All Linux and Unix
systems include a special C library called getopt, defined in the unistd.h header file. You
can use getopt in your program to read these short options.
Unlike other Unix systems, getopt on Linux will always ensure your short options appear at
the front of your command line. For example, say a user types cat -E file -n. The -E
option is upfront, but the -n option is after the filename. But if you use Linux getopt, your
program will always behave as though the user types cat -E -n file. That makes
processing a breeze because getopt can parse the short options, leaving you a list of
filenames on the command line that your program can read using the **argv array.
#include <unistd.h>
int getopt(int argc, char **argv, char *optstring);
The option string optstring contains a list of the valid option characters. If your program
only allows the -E and -n options, you use "En" as your option string.
You usually use getopt in a loop to parse the command line for options. At each getopt call,
the function returns the next short option it finds on the command line or the value '?' for
any unrecognized short options. When getopt can't find any more short options, it returns -1
and sets the global variable optind to the next element in **argv after all the short options.
Let's look at a simple example. This demo program isn't a full replacement of cat with all the
options, but it can parse its command line. Every time it finds a valid command-line option, it
prints a short message to indicate it was found. In your own programs, you might instead set a
variable or take some other action that responds to that command-line option:
#include <stdio.h>
#include <unistd.h>
int
main(int argc, char **argv)
{
int i;
int option;
If you compile this program as args, you can try out different command lines to see how they
parse the short options and always leave you with the rest of the command line. In the
simplest case, with all the options up front, you get this:
Now try the same command line but combine the two short options into a single option string:
If necessary, getopt can "reorder" the command line to deal with short options that are out
of order:
If you're looking for gentle reminders on the syntax and structure of getopt() and
getopt_long(), download my getopt cheat sheet. One page demonstrates short options,
and the other side demonstrates long options with minimum viable code and a listing of the
global variables you need to know.
By Jim Hall
If you want to learn input and output in C, start by looking at the stdio.h include file. As you
might guess from the name, that file defines all the standard ("std") input and output ("io")
functions.
The first stdio.h function that most people learn is the printf function to print formatted
output. Or the puts function to print a simple string. Those are great functions to print
information to the user, but if you want to do more than that, you'll need to explore other
functions.
You can learn about some of these functions and methods by writing a replica of a common
Linux command. The cp command will copy one file to another. If you look at the cp man
page, you'll see that cp supports a broad set of command-line parameters and options. But in
the simplest case, cp supports copying one file to another:
cp infile outfile
You can write your own version of this cp command in C by using only a few basic functions to
read and write files.
Writing the cp command requires accessing files. In C, you open a file using the fopen
function, which takes two arguments: the name of the file and the mode you want to use. The
mode is usually r to read from a file or w to write to a file. The mode supports other options
too, but for this tutorial, just focus on reading and writing.
Copying one file to another then becomes a matter of opening the source and destination
files, then reading one character at a time from the first file, then writing that character to the
second file. The fgetc function returns either the single character read from the input file or
the end of file (EOF) marker when the file is done. Once you've read EOF, you've finished
copying and you can close both files. That code looks like this:
do {
ch = fgetc(infile);
if (ch != EOF) {
fputc(ch, outfile);
}
} while (ch != EOF);
You can write your own cp program with this loop to read and write one character at a time by
using the fgetc and fputc functions. The cp.c source code looks like this:
#include <stdio.h>
int
main(int argc, char **argv)
{
FILE *infile;
FILE *outfile;
int ch;
/* parse the command line */
/* usage: cp infile outfile */
if (argc != 3) {
fprintf(stderr, "Incorrect usage\n");
fprintf(stderr, "Usage: cp infile outfile\n");
return 1;
}
/* open the input file */
infile = fopen(argv[1], "r");
if (infile == NULL) {
fprintf(stderr, "Cannot open file for reading: %s\n", argv[1]);
return 2;
}
And you can compile that cp.c file into a full executable using the GNU Compiler Collection
(GCC):
The -o cp option tells the compiler to save the compiled program into the cp program file.
The -Wall option tells the compiler to turn on all warnings. If you don't see any warnings, that
means everything worked correctly.
A better way to write this cp command is by reading a chunk of the input into memory (called
a buffer), then writing that collection of data to the second file. This is much faster because
the program can read more of the data at one time, which requires fewer "reads" from the file.
The different options provide quite a bit of flexibility for more advanced file input and output,
such as reading and writing files with a certain data structure. But in the simple case of
reading data from one file and writing data to another file, you can use a buffer that is an array
of characters.
And you can write the buffer to another file using the fwrite function. This uses a similar set
of options to the fread function: the array or memory buffer to read data from, the size of
the smallest thing you need to write, how many of those things you need to write, and the file
to write to.
size_t fwrite(const void *ptr, size_t size, size_t nmemb, FILE *stream);
In the case where the program reads a file into a buffer, then writes that buffer to another file,
the array (ptr) can be an array of a fixed size. For example, you can use a char array called
buffer that is 200 characters long.
With that assumption, you need to change the loop in your cp program to read data from a
file into a buffer then write that buffer to another file:
while (!feof(infile)) {
buffer_length = fread(buffer, sizeof(char), 200, infile);
fwrite(buffer, sizeof(char), buffer_length, outfile);
}
Here's the full source code to your updated cp program, which now uses a buffer to read and
write data:
#include <stdio.h>
int
main(int argc, char **argv)
{
FILE *infile;
FILE *outfile;
char buffer[200];
size_t buffer_length;
Since you want to compare this program to the other program, save this source code as
cp2.c. You can compile that updated program using GCC:
As before, the -o cp2 option tells the compiler to save the compiled program into the cp2
program file. The -Wall option tells the compiler to turn on all warnings. If you don't see any
warnings, that means everything worked correctly.
I ran a runtime comparison using the Linux time command. This command runs another
program, then tells you how long that program took to complete. For my test, I wanted to see
the difference in time, so I copied a 628MB CD-ROM image file I had on my system.
I first copied the image file using the standard Linux cp command to see how long that takes.
By running the Linux cp command first, I also eliminated the possibility that Linux's built-in
file-cache system wouldn't give my program a false performance boost. The test with Linux
cp took much less than one second to run:
Copying the same file using my own version of the cp command took significantly longer.
Reading and writing one character at a time took almost five seconds to copy the file:
Reading data from an input into a buffer and then writing that buffer to an output file is much
faster. Copying the file using this method took less than a second:
My demonstration cp program used a buffer that was 200 characters. I'm sure the program
would run much faster if I read more of the file into memory at once. But for this comparison,
you can already see the huge difference in performance, even with a small, 200 character
buffer.
By Jim Hall
I taught myself about programming back in elementary school. My first programs were on the
Apple II, but eventually, I learned C by reading books and practicing. And the best way to
practice programming is to write sample programs that help exercise your new knowledge.
One program I like to write in a new language is a simple "guess the number" game. The
computer picks a random number from 1 to 100, and you have to figure it out by making
guesses. In another article, I showed how to write this "Guess the number" game in Bash, and
my fellow Opensource.com authors have written articles about how to write it in Java, Julia,
and other computer languages.
What's great about a "Guess the number" game is that it exercises several programming
concepts: how to use variables, how to compare values, how to print output, and how to read
input.
Over the summer, I recorded a video series to teach people how to write programs in the C
programming language. Since then, I've heard from many people who are learning C
programming by following it. So, I thought I'd follow up by writing a "Guess the number" game
in C.
#include <stdio.h>
#include <sys/random.h>
The function uses the Linux system call getrandom to generate a series of random bits. You
can learn more about this system call on the man page, but note that getrandom will fill the
variable with random zeroes and ones. That means the final value could be positive or
negative, so you need to do a test afterward to ensure the result of your randnum function is
a positive value.
#include <stdio.h>
#include <sys/random.h>
int
randnum(int maxval)
{
...
}
int
main(void)
{
int number;
int guess;
number = randnum(100);
puts("Guess a number between 1 and 100");
do {
scanf("%d", &guess);
if (guess < number) {
puts("Too low");
}
else if (guess > number) {
The program starts by picking a random number between 1 and 100 using the randnum
function. After printing a prompt to the user, the program enters a do-while loop so the user
can guess the number.
In each iteration of the loop, the program tests the user's guess. If the user's guess is less than
the random number, the program prints "Too low," and if the guess is greater than the random
number, the program prints "Too high." The loop continues until the user's guess is the same
as the random number.
When the loop exits, the program prints "That's right!" and then immediately ends.
Try it out
This "guess the number" game is a great introductory program when learning a new
programming language because it exercises several common programming concepts in a
pretty straightforward way. By implementing this simple game in different programming
languages, you can demonstrate some core concepts and compare details in each language.
By Jim Hall
Some programs can just process an entire file at once, and other programs need to examine
the file line-by-line. In the latter case, you likely need to parse data in each line. Fortunately,
the C programming language has a standard C library function to do just that.
The strtok function breaks up a line of data according to "delimiters" that divide each field. It
provides a streamlined way to parse data from an input string.
102*103;K1.2;K0.5
In this example, store that in a string variable. You might have read this string into memory
using any number of methods. Here's the line of code:
Once you have the line in a string, you can use strtok to pull out "tokens." Each token is part
of the string, up to the next delimiter. The basic call to strtok looks like this:
#include <string.h>
char *strtok(char *string, const char *delim);
The first call to strtok reads the string, adds a null (\0) character at the first delimiter, then
returns a pointer to the first token. If the string is already empty, strtok returns NULL.
#include <stdio.h>
#include <string.h>
This sample program pulls off the first token in the string, prints it, and exits. If you compile
this program and run it, you should see this output:
102*103
102*103 is the first part of the input string, up to the first semicolon. That's the first token in
the string.
Note that calling strtok modifies the string you are examining. If you want the original string
preserved, make a copy before using strtok.
Modify the sample program to read the rest of the string as tokens. Use a while loop to call
strtok multiple times until you get NULL.
#include <stdio.h>
#include <string.h>
int
main()
{
char string[] = "102*103;K1.2;K0.5";
char *token;
token = strtok(string, ";");
By adding the while loop, you can parse the rest of the string, one token at a time. If you
compile and run this sample program, you should see each token printed on a separate line,
like this:
102*103
K1.2
K0.5
For example, if you were reading CSV data (comma-separated values, such as data from a
spreadsheet), you might expect a list of four numbers to look like this:
1,2,3,4
But if the third "column" in the data was empty, the CSV might instead look like this:
1,2,,4
This is where you need to be careful with strtok. With strtok, multiple delimiters next to each
other are the same as a single delimiter. You can see this by modifying the sample program to
call strtok with a comma delimiter:
#include <stdio.h>
#include <string.h>
If you compile and run this new program, you'll see strtok interprets the ,, as a single
comma and parses the data as three numbers:
1
2
4
#include <stdio.h>
#include <string.h>
int
main()
{
char string[] = " hello \t world";
char *token;
token = strtok(string, " \t");
if (token == NULL) {
puts("empty string");
return 1;
}
Each call to strtok uses both a space and tab character as the delimiter string, allowing
strtok to parse the line correctly into two tokens.
Wrap up
The strtok function is a handy way to read and interpret data from strings. Use it in your next
project to simplify how you read data into your program.
By Jim Hall
Full-color ASCII art used to be quite popular on DOS, which could leverage the extended
ASCII character set and its collection of drawing elements. You can add a little visual interest
to your next FreeDOS program by adding ASCII art as a cool “welcome” screen or as a colorful
“exit” screen with more information about the program.
But this style of ASCII art isn’t limited just to FreeDOS applications. You can use the same
method in a Linux terminal-mode program. While Linux uses ncurses to control the screen
instead of DOS’s conio, the related concepts apply well to Linux programs. This article looks
at how to generate colorful ASCII art from a C program.
Here’s part of a sample ASCII art file, saved as C source code. Note that the code snippet
defines a few values: IMAGEDATA_WIDTH and IMAGEDATA_DEPTH define the number of
columns and rows on the screen. In this case, it’s an 80x25 ASCII art “image.”
IMAGEDATA_LENGTH defines the number of entries in the IMAGEDATA array. Each character
in the ASCII art screen can be represented by two bytes of data: The character to display and
a color attribute containing both the foreground and background colors for the character. For
an 80x25 screen, where each character is paired with an attribute, the array contains 4000
entries (that’s 80 * 25 * 2 = 4000).
To display this ASCII art to the screen, you need to write a small program to read the array and
print each character with the right colors.
Character mode systems like ncurses on Linux or conio on DOS can display only sixteen
colors. That’s sixteen possible text colors and eight background colors. Counting sixteen
values (from 0 to 15) in binary requires only four bits:
• 1111 is 16 in binary
With color pairs, you can encode both the background and foreground colors in a single byte
of eight bits. That’s four bits for the text color (0 to 15 or 0 to F in hexadecimal) and three bits
for the background color (0 to 7 or 0 to E in hexadecimal). The leftover bit in the byte is not
used here, so we can ignore it.
To convert the color pair or attribute into color values that your program can use, you’ll need
to use a bit mask to specify only the bits used for the text color or background color. Using
the OpenWatcom C Compiler on FreeDOS, you can write this function to set the colors
appropriately from the color attribute:
void
textattr(int newattr)
{
The _settextcolor function sets just the text color, and the _setbkcolor function sets
the background color. Both are defined in graph.h. Note that because the color attribute
included both the background color and the foreground color in a single byte value, the
textattr function uses & (binary AND) to set a bit mask that isolates only the last four bits
in the attribute. That’s where the color pair stores the values 0 to 15 for the foreground color.
To get the background color, the function first performs a bit shift to “push” the bits to the
right. This puts the “upper” bits into the “lower” bit range, so any bits like 0xxx0000 become
00000xxx instead. We can use another bit mask with 7 (binary 0111) to pick out the
background color value.
Let’s leave room at the bottom of the screen for a separate message or prompt to the user.
That means instead of displaying all 25 lines of an 80-column ASCII screen, I only want to
show the first 24 lines.
Inside the for loop, we need to set the colors, then print the character. The OpenWatcom C
Compiler provides a function _outtext to display text with the current color values.
However, this requires passing a string and would be inefficient if we need to process each
character one at a time, in case each character on a line requires a different color.
Instead, OpenWatcom has a similar function called _outmem that allows you to indicate how
many characters to display. For one character at a time, we can provide a pointer to a
character value in the IMAGEDATA array and tell _outtext to show just one character. That
will display the character using the current color attributes, which is what we need.
textattr(attr);
_outmem(ch, 1);
}
This updated for loop sets the character ch by assigning a pointer into the IMAGEDATA
array. Next, the loop sets the text attributes, and then displays the character with _outmem.
#include <stdio.h>
#include <conio.h>
#include <graph.h>
#include "imgdata.inc"
void
textattr(int newattr)
{
_settextcolor(newattr & 15); /* 0000xxxx */
_setbkcolor((newattr >> 4) & 7); /* 0xxx0000 */
}
int
main()
{
char *ch;
int attr;
int pos;
if (_setvideomode(_TEXTC80) == 0) {
fputs("Error setting video mode", stderr);
return 1;
}
/* draw the array */
_settextposition(1, 1); /* top left */
/* print one line less than the 80x25 that's in there:
80 x 24 x 2 = 3840 */
Compile the program using the OpenWatcom C Compiler on FreeDOS, and you’ll get a new
program that displays this holiday message:
By Jim Hall
One of the reasons so many DOS applications sported a text user interface (or TUI) is
because it was so easy to do. The standard way to control console input and output (conio)
was with the conio library for many C programmers. This is a de-facto standard library on
DOS, which gained popularity as implemented by Borland's proprietary C compiler as
conio.h. You can also find a similar conio implementation in TK Chia's IA-16 DOS port of
the GNU C Compiler in the libi86 library of non-standard routines. The library includes
implementations of conio.h functions that mimic Borland Turbo C++ to set video modes,
display colored text, move the cursor, and so on.
#include <conio.h>
#include <graph.h>
int
main()
{
_setvideomode(_TEXTC80);
…
When you're done with your program and ready to exit back to DOS, you should reset the
video mode back to whatever values it had before. For that, you can use _DEFAULTMODE as
the mode.
_setvideomode(_DEFAULTMODE);
return 0;
}
You can set both the text color and the color behind it. Use the _settextcolor function to
set the text "foreground" color and _setbkcolor to set the text "background" color. For
example, to set the colors to yellow text on a red background, you would use this pair of
functions:
Positioning text
In conio, screen coordinates are always row,col and start with 1,1 in the upper-left corner. For
a standard 80-column display with 25 lines, the bottom-right corner is 25,80.
Use the _settextposition function to move the cursor to a specific screen coordinate,
then use _outtext to print the text you want to display. If you've set the colors, your text will
use the colors you last defined, regardless of what's already on the screen.
For example, to print the text "FreeDOS" at line 12 and column 36 (which is more or less
centered on the screen) use these two functions:
_settextposition(12, 36);
_outtext("FreeDOS");
#include <conio.h>
#include <graph.h>
int
main()
{
_setvideomode(_TEXTC80);
_settextcolor(14);
_setbkcolor(4);
_settextposition(12, 36);
_outtext("FreeDOS");
getch();
_setvideomode(_DEFAULTMODE);
return 0;
}
A text window is just an area of the screen, defined as a rectangle starting at a particular
row,col and ending at a different row,col. These regions can take up the whole screen or be as
small as a single line. Once you define a window, you can clear it with a background color and
position text in it.
To define a text window starting at row 5 and column 10, and extending to row 15 and column
70, you use the _settextwindow function like this:
Now that you've defined the window, any text you draw in it uses 1,1 as the upper-left corner of
the text window. Placing text at 1,1 will actually position that text at row 5 and column 10, where
the window starts on the screen.
_setbkcolor(3);
_clearscreen(_GCLEARSCREEN);
_settextwindow(5, 10, 15, 70);
_setbkcolor(1);
_clearscreen(_GWINDOW);
This makes it really easy to fill in certain areas of the screen. In fact, defining a window and
filling it with color is such a common thing to do that I often create a function to do both at
once. Many of my conio programs include some variation of these two functions to clear the
screen or window:
#include <conio.h>
#include <graph.h>
void
clear_color(int fg, int bg)
{
_settextcolor(fg);
_setbkcolor(bg);
_clearscreen(_GCLEARSCREEN);
}
void
textwindow_color(int top, int left, int bottom, int right, int fg, int bg)
{
_settextwindow(top, left, bottom, right);
_settextcolor(fg);
_setbkcolor(bg);
_clearscreen(_GWINDOW);
}
A text window can be any size, even a single line. This is handy to define a title bar at the top
of the screen or a status line at the bottom of the screen. Again, I find this to be such a useful
addition to my programs that I'll frequently write functions to do it for me:
#include <conio.h>
#include <graph.h>
#include <string.h> /* for strlen */
This is the basics of many kinds of applications. Placing a text window towards the right of the
screen could be useful if you were writing a "monitor" program, such as part of a control
system, like this:
#include <conio.h>
#include <graph.h>
int
main()
{
_setvideomode(_TEXTC80);
clear_color(7, 1); /* white on blue */
_settextposition(2, 1);
Having already written our own window functions to do most of the repetitive work, this
program becomes very straightforward: clear the screen with a blue background, then print
"test" on the second line. There's a header line and a status line, but the interesting part is in
the middle where the program defines a text window near the right edge of the screen and
prints some sample text. The getch() function waits for the user to press a key on the
keyboard, useful when you need to wait until the user is ready:
#include <conio.h>
#include <graph.h>
int
main()
{
_setvideomode(_TEXTC80);
clear_color(7, 2); /* white on green */
_settextposition(2, 1);
_outtext("test");
print_header(14, 4, "SOLITAIRE"); /* br yellow on red */
textwindow_color(10, 10, 17, 22, 4, 7); /* red on white */
_settextposition(3, 2);
_outtext("hi mom");
print_status(7, 6, "press any key to quit..."); /* white on brown */
getch();
_setvideomode(_DEFAULTMODE);
return 0;
}
You could add other code to this sample program to print card values and suits, place cards on
top of other cards, and other functionality to create a complete game. But for this demo, we'll
just draw a single "card" displaying some text:
#include <conio.h>
#include <graph.h>
int
main()
{
_setvideomode(_TEXTC80);
clear_color(7, 1); /* white on blue */
_settextposition(2, 1);
_outtext("test");
print_header(15, 3, "PROGRAMMING IN CONIO"); /* br white on cyan */
textwindow_color(11, 36, 16, 46, 7, 0); /* shadow */
textwindow_color(10, 35, 15, 45, 7, 4); /* white on red */
_settextposition(3, 2);
_outtext("hi mom");
You often see this "shadow" effect used in DOS programs as a way to add some visual flair:
The DOS conio functions can do much more than I've shown here, but with this introduction
to conio programming, you can create various practical and exciting applications. Direct
screen access means your programs can be more interactive than a simple command-line
utility that scrolls text from the bottom of the screen. Leverage the flexibility of conio
programming and make your next DOS program a great one.
By Jim Hall
When I first started using DOS, I enjoyed writing games and other interesting programs using
BASIC, which DOS included. Much later, I learned the C programming language.
So it's probably not surprising that FreeDOS 1.3 RC4 includes a C compiler—along with other
programming languages. The FreeDOS 1.3 RC4 LiveCD includes two C compilers—Bruce's C
compiler (a simple C compiler) and the OpenWatcom C compiler. On the Bonus CD, you can
also find DJGPP (a 32-bit C compiler based on GNU GCC) and the IA-16 port of GCC
(requires a '386 or better CPU to compile, but the generated programs can run on low-end
systems).
1. You need to remain aware of how much memory you use. Linux allows programs
to use lots of memory, but FreeDOS is more limited. Thus, DOS programs used one of
four memory models (large, medium, compact, and small) depending on how much
memory they needed.
2. You can directly access the console. On Linux, you can create text-mode mode
programs that draw to the terminal screen using a library like ncurses. But DOS allows
programs to access the console and video hardware. This provides a great deal of
flexibility in writing more interesting programs.
I like to write my C programs in the IA-16 port of GCC, or OpenWatcom, depending on what
program I am working on. The OpenWatcom C compiler is easier to install since it's only a
single package. That's why we provide OpenWatcom on the FreeDOS LiveCD, so you can
DOS C programming
You can find documentation and library guides on the OpenWatcom project website to learn
all about the unique DOS C programming libraries provided by the OpenWatcom C compiler.
To briefly describe a few of the most useful functions:
From conio.h:
From graph.h:
DOS only supports sixteen text colors and eight background colors. You can use the values 0
(Black) to 15 (Bright White) to specify the text colors, and 0 (Black) to 7 (White) for the
background colors:
• 0—Black
• 1—Blue
• 2—Green
• 3—Cyan
• 4—Red
• 5—Magenta
• 6—Brown
• 7—White
• 8—Bright Black
• 9—Bright Blue
• 10—Bright Green
• 11—Bright Cyan
• 12—Bright Red
• 13—Bright Magenta
• 14—Yellow
• 15—Bright White
In this case, we'll iterate through each of the text colors, from 0 (Black) to 15 (Bright White).
As we print each line, we'll indent the next line by one space. When we're done, we'll wait for
the user to press any key, then we'll reset the screen and exit.
You can use any text editor to write your C source code. I like using a few different editors,
including FreeDOS Edit and Freemacs, but more recently I've been using the FED
Before you can compile using OpenWatcom, you'll need to set up the DOS environment
variables so OpenWatcom can find its support files. The OpenWatcom C compiler package
includes a setup batch file that does this for you, as \DEVEL\OW\OWSETENV.BAT. Run this
batch file to automatically set up your environment for OpenWatcom.
Once your environment is ready, you can use the OpenWatcom compiler to compile this "Hello
world" program. I've saved my C source file as TEST.C, so I can type WCL TEST.C to compile
and link the program into a DOS executable, called TEST.EXE. In the output messages from
OpenWatcom, you can see that WCL actually calls the OpenWatcom C Compiler (WCC) to
compile, and the OpenWatcom Linker (WLINK) to perform the object linking stage:
If you don't see any error messages when compiling the C source file, you can now run your
DOS program. This "Hello world" example is TEST.EXE. Enter TEST on the DOS command
line to run the new program, and you should see this very pretty output:
By Seth Kenlon
In 1972, Dennis Ritchie was at Bell Labs, where a few years earlier, he and his fellow team
members invented Unix. After creating an enduring OS (still in use today), he needed a good
way to program those Unix computers so that they could perform new tasks. It seems strange
now, but at the time, there were relatively few programming languages; Fortran, Lisp, Algol,
and B were popular but insufficient for what the Bell Labs researchers wanted to do.
Demonstrating a trait that would become known as a primary characteristic of programmers,
Dennis Ritchie created his own solution. He called it C, and nearly 50 years later, it's still in
widespread use.
First of all, C is a fairly minimal and straightforward language. There aren't very advanced
concepts beyond the basics of programming, largely because C is literally one of the
foundations of modern programming languages. For instance, C features arrays, but it doesn't
offer a dictionary (unless you write it yourself). When you learn C, you learn the building
blocks of programming that can help you recognize the improved and elaborate designs of
recent languages.
Finally, C is easy to get started with, especially if you're running Linux. You can already run C
code because Linux systems include the GNU C library (glibc). To write and build it, all you
need to do is install a compiler, open a text editor, and start coding.
On Windows, you can install a minimal set of GNU utilities, GCC included, with MinGW.
$ gcc --version
gcc (GCC) x.y.z
Copyright (C) 20XX Free Software Foundation, Inc.
In C, you create functions to carry out your desired task. A function named main is executed
by default.
#include <stdio.h>
int main() {
printf("Hello world");
return 0;
}
The first line includes a header file, essentially free and very low-level C code that you can
reuse in your own programs, called stdio.h (standard input and output). A function called
main is created and populated with a rudimentary print statement. Save this text to a file
called hello.c, then compile it with GCC:
$ ./hello
Hello world$
Return values
It's part of the Unix philosophy that a function "returns" something to you after it executes:
nothing upon success and something else (an error message, for example) upon failure.
These return codes are often represented with numbers (integers, to be precise): 0
represents nothing, and any number higher than 0 represents some non-successful state.
There's a good reason Unix and Linux are designed to expect silence upon success. It's so
that you can always plan for success by assuming no errors nor warnings will get in your way
when executing a series of commands. Similarly, functions in C expect no errors by design.
include <stdio.h>
int main() {
printf("Hello world");
return 1;
}
Compile it:
Now run it using a built-in Linux test for success. The && operator executes the second half of
a command only upon success. For example:
Now try your program, which does not return 0 upon success; it returns 1 instead:
The program executed successfully, yet did not trigger the second command.
You may also notice there's no string type. Unlike Python and Java and Lua and many others,
C doesn't have a string type and instead sees strings as an array of characters.
Here's some simple code that establishes a char array variable, and then prints it to your
screen using printf along with a short message:
#include <stdio.h>
int main() {
char var[6] = "hello";
printf("Your string is: %s\r\n",var);
You may notice that this code sample allows six characters for a five-letter word. This is
because there's a hidden terminator at the end of the string, which takes up one byte in the
array. You can run the code by compiling and executing it:
Functions
As with other languages, C functions take optional parameters. You can pass parameters from
one function to another by defining the type of data you want a function to accept:
#include <stdio.h>
int main() {
char a[6] = "hello";
printmsg(a);
return 0;
}
The way this code sample breaks one function into two isn't very useful, but it demonstrates
that main runs by default and how to pass data between functions.
To make this example program more dynamic, you can include the string.h header file,
which contains code to examine (as the name implies) strings. Try testing whether the string
passed to the printmsg function is greater than 0 by using the strlen function from the
string.h file:
#include <stdio.h>
#include <string.h>
int main() {
char a[6] = "hello";
printmsg(a);
return 1;
}
As implemented in this example, the sample condition will never be untrue because the string
provided is always "hello," the length of which is always greater than 0. The final touch to this
humble re-implementation of the echo command is to accept input from the user.
Command arguments
The stdio.h file contains code that provides two arguments each time a program is
launched: a count of how many items are contained in the command (argc) and an array
containing each item (argv). For example, suppose you issue this imaginary command:
$ foo -i bar
• argv[0] = foo
• argv[1] = -i
• argv[2] = bar
Imperative programming
C is an imperative programming language. It isn't object-oriented, and it has no class
structure. Using C can teach you a lot about how data is processed and how to better manage
the data you generate as your code runs. Use C enough, and you'll eventually be able to write
libraries that other languages, such as Python and Lua, can use.
To learn more about C, you need to use it. Look in /usr/include/ for useful C header files,
and see what small tasks you can do to make C useful to you.
Basics Variables
Include header files first, then define your Variable names can contain uppercase or
global variables, then write your program. lowercase letters (A to Z, or a to z), or
numbers (0 to 9), or an underscore (_).
Cannot start with a number.
/* comment to describe the program */ int Integer values (-1, 0, 1, 2, …)
#include <stdio.h>
/* definitions */ char Character values, such as
letters
int main(int argc, char **argv) {
float Floating point numbers (0.0,
/* variable declarations */
1.1, 4.5, or 3.141)
/* program statements */ double Double precision numbers, like
} float but bigger
Functions
Indicate the function type and name followed by Allocate memory with malloc. Resize with
variables inside parentheses. Put your function realloc. Use free to release.
statements inside curly braces.
int celsius(int fahr) { int *array;
int *newarray;
int cel;
cel = (fahr – 32) * 5 / 9; arr = (int *) malloc(sizeof(int) * 10);
if (arr == NULL) {
return cel; /* fail */
} }
if (newarray == NULL) {
/* fail */
}
arr = newarray;
free(arr);
By Jim Hall
The C programming language will turn fifty years old in 2022. Yet despite its long history, C
remains one of the top "most-used" programming languages in many "popular programming
languages" surveys. For example, check out the TIOBE Index, which tracks the popularity of
different programming languages. Many Linux applications are written in C, such as the
GNOME desktop.
After a brief flirtation with a subset called EPL (by Doug McIlroy of Bell Labs), Multics turned
to BCPL, a much simpler and cleaner language designed and implemented by Martin Richards
of Cambridge, who I think was visiting MIT at the time. When Ken Thompson started working
on what became Unix, he created an even simpler language, based on BCPL, that he called B.
He implemented it for the PDP-7 used for the first proto-Unix system in 1969.
BCPL and B were both "typeless" languages; that is, they had only one data type, integer. The
DEC PDP-11, which arrived on the scene in about 1971 and was the computer for the first real
Unix implementation, supported several data types, notably 8-bit bytes as well as 16-bit
integers. For that, a language that also supported several data types was a better fit. That's
the origin of C.
C was originally used only on Unix, though after a while, there were also C compilers for other
machines and operating systems. Mostly it was used for system-programming applications,
which covered quite a spectrum of interesting areas, along with a lot of systems for managing
operations of AT&T's telephone network.
Arguably, the most interesting, memorable, and important C program was the Unix operating
system itself. The first version of Unix in 1971 was in PDP-11 assembly language, but by the
time of the fourth edition, around 1973, it was rewritten in C. That was truly crucial since it
meant that the operating system (and all its supporting software) could be ported to a
different kind of computer basically by recompiling everything. Not quite that simple in
practice, but not far off.
You co-authored The C Programming Language book with Dennis Ritchie. How did
that book come about, and how did you and Dennis collaborate on the book?
I had written a tutorial on Ken Thompson's B language to help people get started with it. I
upgraded that to a tutorial on C when it became available. And after a while, I twisted Dennis's
arm to write a C book with me. Basically, I wrote most of the tutorial material, except for the
system call chapter, and Dennis had already written the reference manual, which was
excellent. Then we worked back and forth to smooth out the tutorial parts; the reference
manual stayed pretty much the same since it was so well done from the beginning. The book
was formatted with the troff formatter, one of many tools on Unix, and I did most of the
formatting work.
When did C become a thing that other programmers outside of Bell Labs used for
their work?
I don't really remember well at this point, but I think C mostly followed along with Unix for the
first half dozen years or so. With the development of compilers for other operating systems, it
began to spread to other systems besides Unix. I don't recall when we realized that C and Unix
were having a real effect, but it must have been in the mid to late 1970s.
The primary reason in the early days was its association with Unix, which spread rapidly. If you
used Unix, you wrote in C. Later on, C spread to computers that might not necessarily run
Unix, though many did because of the portable C compiler that Steve Johnson wrote. The
C remains a popular programming language today, some 50 years after its creation.
Why has C remained so popular?
I think C hit a sweet spot with efficiency and expressiveness. In earlier times, efficiency really
mattered since computers were slow and had limited memory compared to what we are used
to today. C was very efficient, in the sense that it could be compiled into efficient machine
code, and it was simple enough that it was easy to see how to compile it. At the same time, it
was very expressive, easy to write, and compact. No other language has hit that kind of spot
quite so well, at least in my humble but correct opinion.
How has the C programming language grown or changed over the years?
C has grown modestly, I guess, but I haven't paid much attention to the evolving C standards.
There are enough changes that code written in the 1980s needs a bit of work before it will
compile, but it's mostly related to being honest about types. Newer features like complex
numbers are perhaps useful, but not to me, so I can't make an informed comment.
Well, it's a good language for anything, but today, with lots of memory and processing power,
most programmers are well served by languages like Python that take care of memory
management and other more high-level constructs. C remains a good choice for lower levels
where squeezing cycles and bytes still matter.
C has influenced other programming languages, including C++, Java, Go, and Rust.
What are your thoughts on these other programming languages?
Almost every language is in some ways a reaction to its predecessors. To over-simplify a fair
amount, C++ adds mechanisms to control access to information, so it's better than C for
really large programs. Java is a reaction to the perceived complexity of C++. Go is a reaction
to the complexity of C++ and the restrictions of Java. Rust is an attempt to deal with memory
management issues in C (and presumably C++) while coming close to C's efficiency.
They all have real positive attributes, but somehow no one is ever quite satisfied, so there will
always be more languages that, in their turn, react to what has gone before. At the same time,
the older languages, for the most part, will remain around because they do their job well, and
Thanks to Brian for sharing this great history of the C programming language!
By Matthew Broberg
In the 1960s, Bell Labs in suburban New Jersey was one of the most innovative places of its
time. Jon Gertner, author of The idea factory, describes the culture of the time marked by
optimism and the excitement to solve tough problems. Instead of monetization pressures with
tight timelines, Bell Labs offered seemingly endless funding for wild ideas. It had a research
and development ethos that aligns well with today's open leadership principles. The results
were significant and prove that brilliance can come without the promise of VC funding or
an IPO.
The challenge back then was terminal sharing: finding a way for lots of people to access the
(very limited number of) available computers. Before there was a scalable answer for that, and
long before we had a shell like Bash, there was the Multics project. It was a hypothetical
operating system where hundreds or even thousands of developers could share time on the
same system. This was a dream of John McCarty, creator of Lisp and the term artificial
intelligence (AI), as I recently explored.
Joy Lisi Ranken, author of A people's history of computing in the United States, describes
what happened next. There was a lot of public interest in driving forward with Multics' vision of
more universally available timesharing. Academics, scientists, educators, and some in the
broader public were looking forward to this computer-powered future. Many advocated for
computing as a public utility, akin to electricity, and the push toward timesharing was a global
movement.
Up to that point, high-end mainframes topped out at 40-50 terminals per system. The
change of scale was ambitious and eventually failed, as Warren Toomey writes in IEEE
Spectrum:
Bell Labs pulled out of the Multics program in 1969. Multics wasn't going to happen.
Among the last holdouts from the Multics project were four men who felt passionately tied to
the project: Ken Thompson, Dennis Ritchie, Doug McIlroy, and J.F. Ossanna. These four
diehards continued to muse and scribble ideas on paper. Thompson and Ritchie developed a
game called Space Travel for the PDP-7 minicomputer. While they were working on that,
Thompson started implementing all those crazy hand-written ideas about filesystems they'd
developed among the wreckage of Multics.
That's worth emphasizing: Some of the original filesystem specifications were written by hand
and then programmed on what was effectively a toy compared to the systems they were using
to build Multics. Wikipedia's Ken Thompson page dives deeper into what came next:
"While writing Multics, Thompson created the Bon programming language. He also
created a video game called Space Travel. Later, Bell Labs withdrew from the
MULTICS project. In order to go on playing the game, Thompson found an old
PDP-7 machine and rewrote Space Travel on it. Eventually, the tools developed by
Thompson became the Unix operating system: Working on a PDP-7, a team of Bell
Labs researchers led by Thompson and Ritchie, and including Rudd Canaday,
developed a hierarchical file system, the concepts of computer processes and
device files, a command-line interpreter, pipes for easy inter-process
communication, and some small utility programs. In 1970, Brian Kernighan
suggested the name 'Unix,' in a pun on the name 'Multics.' After initial work on Unix,
Thompson decided that Unix needed a system programming language and created
B, a precursor to Ritchie's C."
As Walter Toomey documented in the IEEE Spectrum article mentioned above, Unix showed
promise in a way the Multics project never materialized. After winning over the team and
doing a lot more programming, the pathway to Unix was paved.
main( ) {
extrn a, b, c;
putchar(a); putchar(b); putchar(c); putchar('!*n');
}
a 'hell';
b 'o, w';
c 'orld';
Even if you're not a programmer, it's clear that carving up strings four characters at a time
would be limiting. It's also worth noting that this text is considered the original "Hello World"
from Brian Kernighan's 1972 book, A tutorial introduction to the language B (although that
claim is not definitive).
Typelessness aside, B's assembly-language counterparts were still yielding programs faster
than was possible using the B compiler's threaded-code technique. So, from 1971 to 1973,
Ritchie modified B. He added a "character type" and built a new compiler so that it didn't have
to use threaded code anymore. After two years of work, B had become C.
"For many years, the de facto standard for C was the version supplied with the Unix
operating system. In the summer of 1983 a committee was established to create an
ANSI (American National Standards Institute) standard that would define the C
language. The standardization process took six years (much longer than anyone
reasonably expected)."
• Parts of all major operating systems are written in C, including macOS, Windows, Linux,
and Android.
• The world's most prolific databases, including DB2, MySQL, MS SQL, and PostgreSQL,
are written in C.
• Many programming-language specifics begun in C, including Python, Go, Perl's core
interpreter, and the R statistical language.
Decades after they started as scrappy outsiders, Thompson and Ritchie are praised as titans
of the programming world. They shared 1983's Turing Award, and in 1998, received the
National Medal of Science for their work on the C language and Unix.
But Doug McIlroy and J.F. Ossanna deserve their share of praise, too. All four of them are true
Command Line Heroes.
By Erik O'Shaughnessy
I know, Python and JavaScript are what the kids are writing all their crazy "apps" with these
days. But don't be so quick to dismiss C—it's a capable and concise language that has a lot to
offer. If you need speed, writing in C could be your answer. If you are looking for job security
and the opportunity to learn how to hunt down null pointer dereferences, C could also be your
answer! In this article, I'll explain how to structure a C file and write a C main function that
handles command line arguments like a champ.
Let's do this.
A C program starts with a main() function, usually kept in a file named main.c.
/* main.c */
int main(int argc, char *argv[]) {
}
$ gcc main.c
$ ./a.out -o foo -vv
$
The main() function has two arguments that traditionally are called argc and argv and return
a signed integer. Most Unix environments expect programs to return 0 (zero) on success and
-1 (negative one) on failure.
The argument vector is guaranteed to always have at least one string in the first index,
argv[0], which is the full path to the program executed.
/* main.c */
/* 0 copyright/licensing */
/* 1 includes */
/* 2 defines */
/* 3 external declarations */
/* 4 typedefs */
/* 5 global variable declarations */
/* 6 function prototypes */
int main(int argc, char *argv[]) {
/* 7 command-line parsing */
}
/* 8 function declarations */
"Comments lie."
- A cynical but smart and good looking programmer.
Appealing to the inherent laziness of programmers, once you add comments, you've doubled
your maintenance load. If you change or refactor the code, you need to update or expand the
comments. Over time, the code mutates away from anything resembling what the comments
describe.
If you have to write comments, do not write about what the code is doing. Instead, write about
why the code is doing what it's doing. Write comments that you would want to read five years
from now when you've forgotten everything about this code. And the fate of the world is
depending on you. No pressure.
1. Includes
The first things I add to a main.c file are includes to make a multitude of standard C library
functions and variables available to my program. The standard C library does lots of things;
explore header files in /usr/include to find out what it can do for you.
The #include string is a C preprocessor (cpp) directive that causes the inclusion of the
referenced file, in its entirety, in the current file. Header files in C are usually named with a .h
extension and should not contain any executable code; only macros, defines, typedefs, and
external variable and function prototypes. The string <header.h> tells cpp to look for a file
called header.h in the system-defined header path, usually /usr/include.
/* main.c */
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <libgen.h>
#include <errno.h>
#include <string.h>
#include <getopt.h>
#include <sys/types.h>
#include
Stuff It Provides
File
stdio Supplies FILE, stdin, stdout, stderr, and the fprint() family of functions
stdlib Supplies malloc(), calloc(), and realloc()
unistd Supplies EXIT_FAILURE, EXIT_SUCCESS
libgen Supplies the basename() function
errno Defines the external errno variable and all the values it can take on
string Supplies memcpy(), memset(), and the strlen() family of functions
getopt Supplies external optarg, opterr, optind, and getopt() function
sys/types Typedef shortcuts like uint32_t and uint64_t
2. Defines
/* main.c */
<...>
#define OPTSTR "vi:o:f:h"
#define USAGE_FMT "%s [-v] [-f hexflag] [-i inputfile] [-o outputfile] [-h]"
#define ERR_FOPEN_INPUT "fopen(input, r)"
#define ERR_FOPEN_OUTPUT "fopen(output, w)"
#define ERR_DO_THE_NEEDFUL "do_the_needful blew up"
#define DEFAULT_PROGNAME "george"
This doesn't make a lot of sense right now, but the OPTSTR define is where I will state what
command line switches the program will recommend. Consult the getopt(3) man page to
learn how OPTSTR will affect getopt()'s behavior.
The USAGE_FMT define is a printf()-style format string that is referenced in the usage()
function.
I also like to gather string constants as #defines in this part of the file. Collecting them
makes it easier to fix spelling, reuse messages, and internationalize messages, if required.
Finally, use all capital letters when naming a #define to distinguish it from variable and
function names. You can run the words together if you want or separate words with an
underscore; just make sure they're all upper case.
3. External declarations
/* main.c */
<...>
An extern declaration brings that name into the namespace of the current compilation unit
(aka "file") and allows the program to access that variable. Here we've brought in the
definitions for three integer variables and a character pointer. The opt prefaced variables are
used by the getopt() function, and errno is used as an out-of-band communication channel
by the standard C library to communicate why a function might have failed.
4. Typedefs
/* main.c */
<...>
typedef struct {
int verbose;
uint32_t flags;
FILE *input;
FILE *output;
} options_t;
After external declarations, I like to declare typedefs for structures, unions, and
enumerations. Naming a typedef is a religion all to itself; I strongly prefer a _t suffix to
indicate that the name is a type. In this example, I've declared options_t as a struct with four
members. C is a whitespace-neutral programming language, so I use whitespace to line up
field names in the same column. I just like the way it looks. For the pointer declarations, I
prepend the asterisk to the name to make it clear that it's a pointer.
Global variables are a bad idea and you should never use them. But if you have to use a global
variable, declare them here and be sure to give them a default value. Seriously, don't use
global variables.
As you write functions, adding them after the main() function and not before, include the
function prototypes here. Early C compilers used a single-pass strategy, which meant that
every symbol (variable or function name) you used in your program had to be declared before
you used it. Modern compilers are nearly all multi-pass compilers that build a complete
symbol table before generating code, so using function prototypes is not strictly required.
However, you sometimes don't get to choose what compiler is used on your code, so write the
function prototypes and drive on.
As a matter of course, I always include a usage() function that main() calls when it doesn't
understand something you passed in from the command line.
case 'f':
options.flags = (uint32_t )strtoul(optarg, NULL, 16);
OK, that's a lot. The purpose of the main() function is to collect the arguments that the user
provides, perform minimal input validation, and then pass the collected arguments to
functions that will use them. This example declares an options variable initialized with default
values and parse the command line, updating options as necessary.
The guts of this main() function is a while loop that uses getopt() to step through argv
looking for command line options and their arguments (if any). The OPTSTR #define earlier
in the file is the template that drives getopt()'s behavior. The opt variable takes on the
character value of any command line options found by getopt(), and the program's response
to the detection of the command line option happens in the switch statement.
Those of you paying attention will now be questioning why opt is declared as a 32-bit int but
is expected to take on an 8-bit char? It turns out that getopt() returns an int that takes on a
negative value when it gets to the end of argv, which I check against EOF (the End of
File marker). A char is a signed quantity, but I like matching variables to their function return
values.
When a known command line option is detected, option-specific behavior happens. Some
options have an argument, specified in OPTSTR with a trailing colon. When an option has an
argument, the next string in argv is available to the program via the externally defined
variable optarg. I use optarg to open files for reading and writing or converting a command
line argument from a string to an integer value.
The command line signature for this program, were it compiled, looks something like this:
$ ./a.out -h
a.out [-v] [-f hexflag] [-i inputfile] [-o outputfile] [-h]
8. Function declarations
/* main.c */
<...>
void usage(char *progname, int opt) {
fprintf(stderr, USAGE_FMT, progname?progname:DEFAULT_PROGNAME);
exit(EXIT_FAILURE);
/* NOTREACHED */
}
int do_the_needful(options_t *options) {
if (!options) {
errno = EINVAL;
return EXIT_FAILURE;
}
if (!options->input || !options->output) {
errno = ENOENT;
return EXIT_FAILURE;
}
/* XXX do needful stuff */
return EXIT_SUCCESS;
}
Finally, I write functions that aren't boilerplate. In this example, function do_the_needful()
accepts a pointer to an options_t structure. I validate that the options pointer is not NULL
and then go on to validate the input and output structure members. EXIT_FAILURE returns
if either test fails and, by setting the external global variable errno to a conventional error
code, I signal to the caller a general reason. The convenience function perror() can be used
by the caller to emit human-readable-ish error messages based on the value of errno.
The big class of errors I am trying to avoid here is de-referencing a NULL pointer. This will
cause the operating system to send a special signal to my process called SYSSEGV, which
results in unavoidable death. The last thing users want to see is a crash due to SYSSEGV. It's
much better to catch a NULL pointer in order to emit better error messages and shut down
the program gracefully.
Some people complain about having multiple return statements in a function body. They
make arguments about "continuity of control flow" and other stuff. Honestly, if something
goes wrong in the middle of a function, it's a good time to return an error condition. Writing a
ton of nested if statements to just have one return is never a "good idea."™
Finally, if you write a function that takes four or more arguments, consider bundling them in a
structure and passing a pointer to the structure. This makes the function signatures simpler,
making them easier to remember and not screw up when they're called later. It also makes
calling the function slightly faster, since fewer things need to be copied into the function's
stack frame. In practice, this will only become a consideration if the function is called millions
or billions of times. Don't worry about it if that doesn't make sense.
When you are in the zone, sometimes you don't want to stop and write some particularly
gnarly bit of code. You'll come back and do it later, just not now. That's where I'll leave myself a
little breadcrumb. I insert a comment with a XXX prefix and a short remark describing what
needs to be done. Later on, when I have more time, I'll grep through source looking for XXX.
It doesn't matter what you use, just make sure it's not likely to show up in your codebase in
another context, as a function name or variable, for instance.
case 'f':
options.flags = (uint32_t )strtoul(optarg, NULL, 16);
break;
case 'v':
options.verbose += 1;
break;
case 'h':
default:
usage(basename(argv[0]), opt);
/* NOTREACHED */
break;
}
if (do_the_needful(&options) != EXIT_SUCCESS) {
perror(ERR_DO_THE_NEEDFUL);
exit(EXIT_FAILURE);
/* NOTREACHED */
}
return EXIT_SUCCESS;
}
void usage(char *progname, int opt) {
fprintf(stderr, USAGE_FMT, progname?progname:DEFAULT_PROGNAME);
exit(EXIT_FAILURE);
/* NOTREACHED */
}
int do_the_needful(options_t *options) {
if (!options) {
errno = EINVAL;
return EXIT_FAILURE;
}
if (!options->input || !options->output) {
errno = ENOENT;
return EXIT_FAILURE;
}
/* XXX do needful stuff */
return EXIT_SUCCESS;
}