0% found this document useful (0 votes)
284 views92 pages

Scalable C PDF

Uploaded by

John Foo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
284 views92 pages

Scalable C PDF

Uploaded by

John Foo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 92

Scalable

C (in progress)

Table of Contents
Introduction 0
Preface 1
Chapter 1 - Hello, World 2
Chapter 2 - The Scalable C Language 3
Chapter 3 - Packaging and Binding 4

2
Scalable C (in progress)

Scalable C - Writing Large-Scale Distributed C


Buy at Amazon.com
Source repo is https://github.com/hintjens/scalable-c, pull requests are welcome.

Cover font: Kontrapunkt by Bo Linnemann, Kontrapunkt A/S. Text fonts: EB Garamond by


Georg Duffner, MonospaceTypewriter by Manfred Klein..

Pieter Hintjens has been programming C since 1985. He is the author of many free software
products written in C: Libero (1991), SFL (1996), Xitami (1998), OpenAMQ (2004). In 2007
he founded the ZeroMQ community. His github profile is https://github.com/hintjens.

Other books by the same author: "ZeroMQ - Messaging for Many Applications" (O'Reilly),
"Culture and Empire: Digital Revolution" (Amazon.com), "The Psychopath Code"
(Amazon.com).

Introduction 3
Scalable C (in progress)

Why This Book Exists


The C programming language does not have a sense of humor. If you write in C, you know
that it does not forgive mistakes. It does not try to interpret what you mean. It does what you
tell it, no more and no less. In return, it gives you full control over the results of your work.

Modern languages focus on comfort, abstraction, and automation. C, which was born around
1970, focuses on minimalism, portability, and performance. Well-written C code can run on a
$1 embedded computer as well as on a massive server.

If you know C well enough to understand these trade-offs, then you know where C stops
working, as a language. C has many problems, yet three stand out:

While C lends itself to building libraries, it has no consistent API model. This makes C
code much harder to read and understand than it should be.

The standard approach for concurrency is POSIX threads that share their state. This is
complex and fragile. We know how to do this better, using message passing between
threads.

To compile and link C code for arbitrary platforms is a complex black art. This creates a
real cost for C projects. Even CMake, perhaps the best cross-platform answer, uses
autotools to bootstrap itself.

This adds up to extra effort and cost for anyone using C. It is uneconomic to write large
applications in C. Even for system-level applications, many people prefer C++, Go, and
Erlang. Yet there are good reasons to use C, which are not going away.

The most powerful argument for using C is that it works well with all other languages. This is
a result of its age, and its wide use as a low-level systems language. If you make a library in
C, you can offer it to developers in every one of the hundred most popular languages.

Over time, C's relative popularity is falling. The high costs of using C in the real world of the
21st century are throttling it.

Yet we have solved these problems. We have good, tested answers. Today these answers
are still well-hidden, and known only to a few people. This book aims to change that. It aims
to bring C into the 21st century and make it a cheap, useful material in which to build.

What is "Scalable C?"

Preface 4
Scalable C (in progress)

We use C most often to write libraries, which we then call from applications in other
languages. This layer of C libraries sits between the operating system and the application.

This layer provides security, user interfaces, audio and video, maths, graphics, databases,
communications, compression, and so on. I call this the "fabric layer."

For the most part, this fabric layer sees the world as a single processor. It has no concept of
concurrency. It cannot take advantage of many cores on a machine, let alone many
machines in a cloud. Every library has its own style, standards, and API model. Every library
has a custom build process.

A scalable technology can solve large problems as well as small ones. Our current fabric
layer is not scalable. It costs too much to write and to use.

What I will explain in this book is how to build a scalable fabric layer, written in "Scalable C."

Scalable C has specific properties:

It is cheap to create a Scalable C project.

It is cheap to use, with consistent and obvious APIs.

It is cheap to deploy, with powerful tools and packagers.

It is cheap to scale to many cores, with actor-based concurrency.

It is cheap to scale to many servers, with clustering across a cloud or data center.

It is cheap to build community, with a modern collaborative process.

Scalable C is standard portable C plus a mix of other technologies:

The CLASS RFC defines the Scalable C language style.

ZeroMQ provides message passing between threads and processes.

CZMQ provides a core C library for Scalable C.

Zyre provides local-area clustering for Scalable C projects.

zproject provides packaging (builds, bindings, and distributions).

zproto provides client-server meta-programming.

The C4.1 RFC defines a collaborative process for scalability.

How This Book Works

Preface 5
Scalable C (in progress)

This book takes the same approach that I take in distributed programming workshops. That
is, start with simple worked examples, and then add more and more depth. Each step aims
to answer a problem you'll hit soon, or have already hit.

We will see a lot of example code. All the examples work, and you can build and play with
them. The Scalable C repository holds this book, and the code. If you find things you want to
change, just send a pull request. I'll explain how that works, when we get started.

If you read the whole book and follow the examples, you will learn how to:

Write C code using the Scalable C style, called CLASS.

Build and package your C projects, using zproject.

Use the CZMQ generic list and hash containers.

Pass messages between threads and processes, using ZeroMQ.

Write non-blocking multithreaded C code as CZMQ actors.

Design good APIs and wire-level protocols.

Use git to collaborate with others on a project.

Build an open source community.

Make secure encrypted communications.

Build clustering across a local network, using Zyre.

Build multithreaded clients and servers, using zproto.

Generate C code using model oriented programming.

Use your C code from other languages, including Java.

Build and ship your C code for Android.

Write portable code that runs on all platforms.

This sounds like a lot, and it might be, if I had to explain everything from scratch. I'll keep
things simple by focusing on patterns that work, without too much argumentation. For
example I'll explain patterns for using git, that avoid the most common pitfalls. I expect you
to be able to learn git yourself.

Before You Start


Here is a list of ingredients:

Preface 6
Scalable C (in progress)

One working PC. It does not need to be new, or fancy.

An operating system you are comfortable with. Linux will give you the best results. OS/X
and Windows are usable if you have no choice.

An Internet connection, at least to get started.

A GitHub account. If you are not already registered on github.com, do that now.

Conversational Bash skills. You can run commands, install packages, and so on.

A basic knowledge of C. You at least understand pointers, and the standard library.

A basic knowledge of compute models. You have written programs as a job for a few
years at least.

Here's my current set-up:

A second hand X220 Thinkpad from LapStore. Costs about EUR 300, with an SSD. It's
not the lightest or fastest laptop. Yet the battery lasts all day and it runs Linux well.

Ubuntu Linux with default configuration.

To start with, you need at least these packages:

git-all -- git is how we share code with other people.

build-essential, libtool, pkg-config - the C compiler and related tools.

autotools-dev, autoconf, automake - the GNU autoconf makefile generators.

cmake - the CMake makefile generators (an alternative to autoconf).

Plus some others:

uuid-dev, libpcre3-dev - utility libraries.

valgrind - a useful tool for checking your code.

Which we install like this (using the Debian-style apt-get package manager):

sudo apt-get install -y


git-all build-essential libtool \
pkg-config autotools-dev autoconf automake cmake \
asciidoc uuid-dev libpcre3-dev valgrind

The LearnXinYMinutes project has good quick guides to many languages. Here are its
guides to Bash, to C, and to git.

Preface 7
Scalable C (in progress)

Before you use git, on a new laptop, always tell it your name, and email address. Use the
same email address for your GitHub account:

git config --global user.name "Your Name"


git config --global user.email [email protected]

Why not C++?


Don't laugh. This is a serious question people sometimes ask, even when "C" is clearly in
the title of the book. The answer is roughly: "C++ encourages you to make worse code than
even C does."

Learning a large language (and C++ is a large language) is like memorizing the first
thousand prime numbers. It is to fill your brain with junk without benefit. Yes, it is good to
learn, for the sake of learning. Yet to learn complexity is like joining a cult. You start to think
the knowledge is worth something for its own sake.

The C language is small and yet it takes years to master it. I wrote this book to speed people
along that path. Yet inevitably, your first projects will be weak, no matter how smart you are.
If you're coding every day you'll be decent after five years, and good after ten. And after
twenty years you may become great.

Yet during that process, if you can keep it going, you must be making useful things, from day
one. In a small language this is doable. You can learn enough to contribute to projects, or
start your own, in a few days or weeks. It is like learning to tap a metal triangle. It adds to an
orchestra, if you stay on rhythm.

C++ is a language that speaks to the inner intellectual. The more C++ you know the worse
you become at working with others. First, because your particular dialects of C++ tend to
isolate you. Second, because you sit in an ivory tower that few can approach. This is a
problem with all highly abstract languages.

Any language that depends on inheritance leads you to build large monoliths. Worse, there
are no reliable internal contracts. Change one piece of code and you can break a hundred.

I'll explain later how we design classes in C, so we get neatly isolated APIs. We don't need
inheritance. Each class does some work. We wrap that up, expose it to the world. If we need
to share code between classes, we make further APIs. This gives us layers of classes.

This gives us a neat, compact syntax. Let's take one example to compare C++ and C. We'll
make a linked list and push some values to it, then print them out.

First, in C++:

Preface 8
Scalable C (in progress)

#include <iostream>
#include <list>
using namespace std;

int main ()
{
list<string> List;
List.push_back ("apple");
List.push_back ("orange");
List.push_front ("grape");
List.push_front ("tomato");
cout << List.size () << ": ";

list<string>::iterator i;
for (i = List.begin (); i != List.end (); ++i)
cout << *i << " ";
cout << endl;
return 0;
}

And now in Scalable C:

#include <czmq.h>

int main (void)


{
zlist_t *list = zlist_new ();
zlist_append (list, "apple");
zlist_append (list, "orange");
zlist_push (list, "grape");
zlist_push (list, "tomato");
printf ("%zd: ", zlist_size (list));

char *fruit = (char *) zlist_first (list);


while (fruit) {
printf ("%s ", fruit);
fruit = (char *) zlist_next (list);
}
puts ("");
zlist_destroy (&list);
return 0;
}

The C code is a little more verbose. It has to pass the list object to every method. It cannot
do tricks like overloading "++" to move the list pointer. It must destroy its own objects. Yet it's
clear, and explicit.

Sure, there are more compact ways of writing this example, both in C and in C++. That isn't
the point.

Preface 9
Scalable C (in progress)

I'm not claiming that Scalable C can do everything C++ can. Nor will it be as compact. Yet it
is immediately familiar, explicit, and transparent. I'd much rather write and read 100 lines of
code like this than 10 lines of code that rely on inheritance, operator overloading, and other
syntactic tricks.

Preface 10
Scalable C (in progress)

Chapter 1. Hello, World


In this chapter we'll build, test, and publish a full C application that does nothing. Above all
we'll learn how to use git, a most important tool for working with other people. Even if you
know git, it's worth reading this chapter to learn how to use it without pain.

Problem: what do we do next?


Deciding what to do next is always a tricky thing. I like to solve this using a simple and
effective pattern I call "problem-solution." It works like this:

Take the next most urgent problem and write it down.

Make a minimal plausible solution.

Test that solution against real-life.

If the solution works, keep it. Otherwise throw it out.

Repeat until exhausted, broke, or dead.

So this will be the structure of the book. Every section states a problem, and then solves it.
This approach makes it easy for you to know why I'm explaining a particular topic. More to
the point, it makes it easier for me to write the book.

Remember this lesson:

Don't add features or explore maybes. Solve real problems one by one with minimal,
testable solutions.

Problem: where do we start?


Luckily the software industry has learned

Create a single source called hello.c, anywhere on disk:

Chapter 1 - Hello, World 11


Scalable C (in progress)

#include <stdio.h>

int main (void)


{
puts ("Hello, World");
return 0;
}

Now compile, link, and run this program. I'll assume we're using Ubuntu, for all command
examples:

gcc -o hello hello.c


./hello

And you should see that familiar Hello, World printed on your console. If you are using
OS/X or Windows, it won't be this easy. I'll repeat my advice to install Linux.

Problem: we need to organize our work


Just scattering source files around your laptop's hard disk is a bad idea for many reasons. A
productive programmer works on hundreds and thousands of source files, over years. There
is a natural pattern to this work, which is the project. A typical project grows for a few years
and then quietens down, in a neat curve. Newer projects use older projects, and replace
them. Cycle of life stuff.

Solution: use one directory per project.

Often we'll collect these projects together into groups. A project directory will be as rich as it
needs to be. Let's make this happen:

mkdir -p $HOME/projects/hello

Now move hello.c to $HOME/projects/hello and compile and run, as before. If it


doesn't work, get some rest. Things are only going to get harder from here on.

Problem: we need to backup and share our


work

Chapter 1 - Hello, World 12


Scalable C (in progress)

Laptops tend to die or get stolen. Anything you save to a laptop hard drive is liable to get
lost. There are many ways to back up your work, and only one that's worth teaching, which
is to use git. In principle git is for "revision control", meaning you can check who changes
what, when, and why. It is a good habit to use it for everything you create that looks like text.
(There are better ways to back up your photos, videos, and dubstep mixes.)

Solution: create one git repository per project.

Log in to your GitHub account, then click the big '+' sign on the top right. Choose 'New
repository' and then enter the name "hello". Don't change any other settings. Click the
"Create repository" button.

Next we'll link our projects/hello directory to this repo. I'll assume your GitHub name is
"urlogin." This means your repo's address is https://github.com/urlogin/hello.

Here's how we link our directory to the repo:

cd $HOME/projects/hello

# Make current directory a git repo


git init .

# Create a 'remote' called 'origin', pointed to GitHub


git remote add origin https://github.com/urlogin/hello

# Tell git we intend to commit our source


git add hello.c

# Do the actual commit with a nice message


git commit -m "Added source file"

# Push the 'master' branch to the remote called 'origin'


git push origin master

Jargon file:

"repo" is how the cool kids say "repository." But you knew that.
"commit" is how you save local changes to your local repo.
"remote" is what git calls another repo. Stick to the same name for both repos.
"push" is what git calls sending changes from one repo to another repo.
"pull" is getting changes from a remote to your local repo.
"branches" are a thing git does to confuse you. We always use one branch called
"master." I'll explain more about that later.

Remember this lesson:

If you haven't committed your work and pushed it to GitHub, your work is already dead.

Chapter 1 - Hello, World 13


Scalable C (in progress)

Problem: git asks for login and password


Ah, welcome to the annoying world of security. You can always grab stuff from a public
repository without authentication. Yet when you want to push commits to a remote, you must
have permission. Entering your name and password over and over gets painful.

If you are serious about security, you use two-factor authentication. This means your puny
passwords don't even work. So you need a more magical solution.

Solution: use the SSH protocol and a SSH key.

1. Follow the GitHub article on SSH keys


2. Use SSH instead of HTTPS for remotes.

# Delete the remote called "origin"


git remote rm origin
# Create it again, using SSH
git remote add origin [email protected]:urlogin/hello
# Check that it works
git push origin master

Jargon file:

"SSH" - the secure shell protocol, often used when A wants to talk to B without C
snooping.

"Two-factor authentication" or 2FA - keeps crooks out of your account even if they get
your password. Learn it, and use it.

Problem: the C preprocessor


A realistic C program starts with loads of #include statements. In technical terms we call this
"making humans do work that computers can do faster and better." This used to be
fashionable in the 20th century. These days it's about as smart as stopping the CPU so we
can inspect and change its registers.

For example here is a "random C program" I got off the Internet:

#include <time.h>
#include <stdlib.h>
srand(time(NULL));
int r = rand();

Chapter 1 - Hello, World 14


Scalable C (in progress)

It makes sense, right? Wrong. It's like asking taxi drivers to memorize the street map of their
city. We have GPS. Doing extra work just because our ancestors did it is cargo-cult
programming. "Oh, boy, I've got lots of #includes! My code is the greatest!"

I'm going to hammer this point because every C application I've ever seen does this. Except
the ones I'm responsible for, that is. The rationale for forcing us to include random files left
and right is "efficiency." It is one of those things that made sense when computers read their
input from punched cards.

So a large C program has to make dozens of include statements. Half of them may be
redundant, and the only way to know is to try to remove them, or learn the whole code base
by heart.

Oh, it gets worse. Header files aren't standard across different platforms. So real C code
gets full of conditional #ifdef blocks. After all, this is how you make portable code, right?
Wrong. It's how you make crazy.

99.99% of people doing an insane thing does not make it less insane.

When I started writing large C frameworks in the late 1980's this madness annoyed the heck
out of me. I took the brutal and lazy solution which was something like this:

#include <everything.h>

Which included every header that I felt was useful, and did all the non-portable #ifdef stuff.
So with one leap, I got a consistent environment in every program, and every application. No
pain, much gain. In 25 years I've not found a better approach.

Solution: shove the preprocessor junk into a single header file.

These days we've moved on, so this looks somewhat different:

#include <czmq.h>

Jargon file:

"header" - for historical reasons, C splits its code into two pieces. The "header" defines
function signatures ("prototypes"), constants, and data types. The "source" contains the
actual functional code. A clean project does this in a consistent way (and we'll come to
that). Most projects are kind of random about it.

"include file" - another name for "header."

"punched cards" - the 1960's and 70's USB stick, capable of holding a massive 80 bytes
per card.

Chapter 1 - Hello, World 15


Scalable C (in progress)

Remember this lesson:

If code looks ugly, it's not scalable. Work on it until it looks nice.

Problem: I don't have CZMQ!?


CZMQ lives at http://czmq.zeromq.org. It is the core library for Scalable C. This book started
out as a guide for CZMQ. It does a lot more than get rid of #include statements. One step at
a time though.

First, what version of CZMQ does one use? It is a question that used to matter. Over the
years we've learned to build living code that doesn't need stable releases.

Put this another way: you should always be able to use the latest version of CZMQ, the git
master. Sometimes, the git master will have errors. Then, your prerogative is to raise hell,
send patches, and help fix things. That sounds daunting. It's not, though. I'll even walk you
through it, later in this chapter.

Solution: grab the latest CZMQ git master from github.

Here's how we grab the latest git master:

cd $HOME/projects
git clone https://github.com/zeromq/czmq

Note that we're using HTTPS again here. SSH would also work:

git clone [email protected]:zeromq/czmq

Except, don't. If you had commit rights to CZMQ, the second form would let you push
changes straight to the repo. That's a "no no" for reasons I'll explain later. So remember this
lesson:

When you clone projects from outside your own account, always use the HTTPS form.

Next, we build and install CZMQ. There are two ways, and I'll show both of them. Here is the
more traditional way:

./autogen.sh && ./configure


make -j 4 && make check
make install && sudo ldconfig

And here is the more modern way, using CMake:

Chapter 1 - Hello, World 16


Scalable C (in progress)

cmake .
make -j 4 && make test
make install && sudo ldconfig

The -j 4 option to make runs parallel jobs to make use of a dual or quad core box. CMake
runs somewhat faster. If you're going to learn either of these two, CMake is easier, and
autoconf is better for job security.

Jargon file:

"git master" - the common name for the latest version of a project living on GitHub (or a
similar site).

"cloning" - means making a local copy of a remote repo. This also sets up a remote
called "origin" that points back to the original repo.

"ldconfig" - fixes up dynamic link libraries after you install them. Run this /usr/local/lib
already has a different version of the library you are installing.

Remember this lesson:

Git master should be almost perfect, almost all the time.

Problem: building CZMQ fails! I need libzmq...


Ah, yes. Sorry to make you work backwards here. Let's grab the projects that CZMQ needs,
so we can build it. There are two: libzmq, the ZeroMQ messaging library, and libsodium, for
encryption.

Solution: install libzmq and libsodium first

Chapter 1 - Hello, World 17


Scalable C (in progress)

cd $HOME/projects

# Install libsodium
git clone https://github.com/jedisct1/libsodium
cd libsodium
./autogen.sh && ./configure
make -j 4 && make check
make install && sudo ldconfig
cd ..

# Install libzmq
git clone https://github.com/zeromq/libzmq
cd libzmq
./autogen.sh && ./configure
make -j 4 && make check
make install && sudo ldconfig
cd ..

# Build CZMQ again...


cd czmq
./autogen.sh && ./configure
make -j 4 && make check
make install && sudo ldconfig
cd ..

Remember this lesson:

If you have several versions of a shared library, weird stuff can happen. Don't be afraid
to delete stuff from /usr/local/lib.

Problem: 'make install' fails with a permission


error
Welcome again to the weird world of security. Linux assumes there are several of you, all
sharing that precious laptop. This is one thing Windows got righter: the "personal" part of
"PC".

On Linux, /usr/local/ is the normal place to install headers and libraries you build.
Building and installing software is not some magical system administration task. It is part of
our daily work as developers. Yet to copy files into /usr/local tree we must invoke "sudo."
Using "sudo" for daily work is fragile. If you run this, for instance:

sudo make check install

Chapter 1 - Hello, World 18


Scalable C (in progress)

Then any test files that make check produces have root permissions. So, future make
check steps will fail. Also sudo asking for passwords, makes it harder to script. Aagh!

Solution: make /usr/local writable.

Do I need to say "personal computer" again? Do this on your personal machine, your laptop
or your workstation, where you are the only user. Don't do it on shared servers. People will
tell you it's insane to make these system directories writable by anyone. Imagine that
anyone could use your kitchen! The chaos! The panic! The pancakes and pizzas!

Computers capable of running Linux and a C compiler cost $20. We haven't shared our
development boxes since the late 20th century.

Making your default install space writable is a brutal and effective solution:

cd /usr/local
sudo chmod -R a+w *
cd $HOME/projects/libsodium
make install

Jargon file:

"sudo" - run a command as "super user", so with root permissions.

Remember this lesson:

The computer should not make your life harder without good reason. Also, people often
have a poor assessment of risks and benefits.

Problem: I'm on Windows and these


commands don't work
Indeed. Windows boasts a thing called "PowerShell." The name already gives it away. This
thing is Bash reinvented by drunken Martians raised on old episodes of the Big Bang
Theory. This is how typical PowerShell users explain things:

Most people know how easy it is to use Windows PowerShell... Want to see all your
environment variables and their values? This command should do the trick: Get-
ChildItem Env:

I'm not kidding. You can google it. Every other command shell in the universe uses set. But
hey, why make it simple when you can have job security, amirite?

Chapter 1 - Hello, World 19


Scalable C (in progress)

Turns out that libsodium, libzmq, CZMQ, and our other projects all support Visual Studio. If
you don't have Visual Studio, grab the free 2013 Community Edition. Then, look in
builds/msvc.

For instance to build CZMQ, do this:

cd builds\msvc\vs2013
.\build.bat

You will want to clone and build libsodium and libzmq first. A warning: libsodium master
sometimes does not build on Windows. Check out the latest stable release tag if you get
compile errors.

Having said that, remember this lesson:

Linux is the native environment for C development.

Problem: 'make check' failed on CZMQ


So you found a bug in CZMQ? Congratulations! Your next step is to spend a while tracking it
down, then fixing it. Hang on, you may say, what's this about asking me to fix your bugs?

Lending a hand to a project that is profitable for you is good manners and smart. It's even
smarter to take any excuse you can to learn your tools. CZMQ is the original Scalable C
library. We built it to be easy to read, understand, and use.

Solution: send us a pull request with your fix.

Pull requests are the life blood of an open source project. Some projects make a big deal out
of reviewing and criticizing pull requests. This tends to annoy contributors and empower
elitist maintainers. It also creates a "hang for a penny" mentality. If you have to fight for
several weeks to get a patch accepted, you might as well make large breaking patches.

CZMQ uses the pattern of treating pull requests as good by definition. Some patches (small,
focused, non-disruptive) are better than others. We still merge them all as fast as we can.
This means even if you send an insane pull request, we'll merge it. (We would then tag you
as "Sends insane pull requests" and revert your work. This is another story I'll tell later.)

Let me walk you through the steps of making a pull request to CZMQ:

Go to https://github.com/zeromq/czmq.

Click the "Fork" button at the top right and fork to your personal account.

When that's done, go to your laptop and clone it (your forked project, not the laptop):

Chapter 1 - Hello, World 20


Scalable C (in progress)

cd $HOME/projects
# Let's pretend we never cloned this
rm -rf czmq
git clone [email protected]:urlogin/czmq
cd czmq
cmake . && make -j 4

Next, make your changes in the cloned CZMQ.

Commit your changes using a good message. A good commit title starts with "Problem:"
and then explains the solution.

To commit all changes to a project, do this:

git commit -a

And then enter your commit message and save (type ZZ, if you are in vim):

Problem: random number generator is predictable

The random number generator we use in the zrandom


class is using only 3 bits of randomness. This fails at
test time.

Solution: use pi bits of randomness

You now have a new commit on your local repo. You can see it using git log. To send this
to your remote repo, do this:

git push origin master

And then open your browser to https://github.com/urlogin/czmq and click the "New pull
request" button. If you did things right, GitHub will show you your commit message. Click OK
to make the pull request.

One of the crack CZMQ maintainers will spot your pull request, and merge it, often in a few
minutes. If they spot something wrong, they'll tell you.

Do try to keep one commit per change. If you have ten small commits to fix a single problem,
learn the git reset command. Undo the last commits, then make a single new commit.

These git commands can be useful:

Chapter 1 - Hello, World 21


Scalable C (in progress)

# Show most recent commits


git log
# Redo the last commit
git commit --amend
# Undo any git add commands
git reset
# Undo all commits since 6e6042
git reset 6e6042

Problem: CZMQ master has changed, but my


clone is out of date
This happens all the time, since the master of any living project will change. You need to get
new commits back to your clone, and do this as often as necessary.

Solution: pull changes from the CZMQ upstream repo.

You first add a remote that points to the upstream repo:

git remote add upstream https://github.com/zeromq/czmq

And then every time you want to "refresh" your local fork, you do this:

git pull --rebase upstream master


git log

The --rebase option stops git from adding spurious "Hey I merged some stuff, lookieme!"
commits. If you don't use this option, your git history gets messy. It should be the default. It's
not. So just always use it. Or, do add this to your $HOME/.bashrc:

alias gdn='git pull --rebase origin master'


alias gup='git push origin master'
alias gdu='git pull --rebase upstream master'

And then you can type gdn, gdu and gup.

Jargon file:

"upstream repo" - the "real" repo that you forked from.

"rebasing" - a synonym for smoking whatever it was the git developers were on when
they wrote the git pull command.

Chapter 1 - Hello, World 22


Scalable C (in progress)

Problem: git isn't working


Ah, git. We love you because you made repositories cheap and scalable. And we hate you
because you act weird a lot of the time. There are several ways that you confuse us. Most
often we just ask you to do something for us, and you tell us your whole life story, and break
down in a confused mess.

The most common problem with a git repository is conflicting commits from different places.
This can show as different problems:

git pull fails. Someone else changed the same files at you, at the same time. This is
rare, and git explains exactly what you need to do to resolve the conflicts.

git push fails. Your remote fork has a history that conflicts with your local repository
history. Maybe you did a git reset or a git commit --amend. Make sure your
remote fork has no commits you care about (try git pull for fun). Then nuke its
history with git push --force origin master.

Nothing works any more. Save your changes somewhere safe. Then, delete your local
repo and/or remote fork, and start over. This remedy is often easier than trying to fix up
stuff.

Solution: Google the error message.

Remember this lesson:

You are not alone. Most sane people who use git hate it. It just hurts less than all the
alternatives.

Problem: how do I clean my project?


There are a few ways to do this, depending on how "dirty" it is:

To save all work in progress so you can fetch remote changes, use git stash. Then
after you've fetched remote changes, use git stash apply.

To remove all untracked files, use git clean -d -f -x. Warning: this will delete any
work you've not added and committed.

To get back to a specific commit 6e6042 and wipe everything since then, do git
reset --hard 6e6042.

You can always delete a repository and clone or fork it again. Hint: don't delete it, just
rename it in case you decide you want to get something out of it later.

Chapter 1 - Hello, World 23


Scalable C (in progress)

Solution: Google the error message.

Remember this lesson:

Don't Panic.

Problem: after a "git pull", I lost commits!


This is one of the "wtf, git, WTF?!" moments. Again, Don't Panic.

It can happen for all kinds of reasons, from sunspots to brainfarts. Sometimes a "git pull --
rebase" will trash your work. I won't explain the reasons because they're beyond me.
Something something six dimensions portal something alternate monsters something.

Solution: git's history can lie. Use git reflog to see all commits git knows about.

Use git reflog to find the commit 6e6042 with your precious work.

Then, git checkout 6e6042 to switch to that commit.

Save the files you need to, then delete your repository, re-clone it, and copy the
changed files back into it.

I'm sure there are more elegant ways. It doesn't matter: this happens once a year and damn
elegance. Recover your work, wipe the slate, start again, and drink to forget.

Problem: git is confusing!


Funny how git manages to stay the center of attention. Its command syntax is inconsistent,
complex, and arbitrary. It has far too many commands, and often makes us feel stupid.

One of git's neat features is how git checkout, git revert, and git reset all seem
to do roughly the same thing in different ways. I'm being sarcastic. It's not neat, it's
pathological.

Git is at its worst when you use branches. It is ironic as these are one of git's main so-called
features. It turns out that if you don't use branches, about 90% of git's commands and
complexity disappear. What is more fun is that you don't need branches at all. It is better
than that: working without branches is actually faster and more scalable.

Solution: stop using git branches and do all work on master.

Chapter 1 - Hello, World 24


Scalable C (in progress)

It is not quite as simple as that, as I explain at the end of this chapter. To make a project
succeed on master you need a combination of techniques. Most of all, you need to learn to
work with many people in the same space. This requires a new perception that can take time
to develop. It is one of the most important lessons of this book. Scalable C is a social
language. I will teach this little by little, from different directions.

Dialectics
In which I tell the stories and arguments that gave us the tools in this chapter.

The Problem of Branches


Git's ability to work with "branches" is one of its central features. Branches let you isolate
different flows of work. Before git, using branches was hard work. Git made branches
cheaper. As a result, most developers use them more. A lot more.

You'd think this was a good thing, right? Well, my experience tells the opposite story.
Branches are to software like cars are to cities. They make our work more complex, slower,
and less effective. They tie into old, corrupt traditions of freedom through hardship. They
support the theory of programming as pain.

Branches epitomize our cargo cult approach to programming. We do it because we believe it


will bring us success. We do not ask why, only how. We make the ritual sacrifice of our time
and effort because we believe: if it hurts it must be healthy.

Branches existed before git, and they were always nasty. The original rationale for branches
is two-fold. First, git's predecessors (like Subversion) were thin-client, fat-server. This meant
you made commits straight to the main repository. No ifs or buts. This made it impossible to
work on stuff in isolation.

Subversion's branches were like fast copies, within a repository. Let's say I wanted to work
on /stuff/main/czmq for a week. I could branch it to /stuff/branches/pieter/experiment/czmq/
and then work there.

So branches provide isolation for large and disruptive sets of changes. In 2005 or so, git
came along. Git reinvented the concept of repository, making it small and lightweight. Every
clone provides automatic and perfect isolation. I just clone a repository somewhere, and
work there.

Git clones do isolation far better than branches do. Each clone is a full copy. Git offers a
clear model for moving changes between clones (push, pull). It is consistent and simple and
hard to get wrong.

Chapter 1 - Hello, World 25


Scalable C (in progress)

Yet we learned the "branch" concept, and we got used to it. And thus we got branches in git
on top of its per-repository isolation.

Branches make a true mess of things.

Look at any on-line guide to git branches, and you will see the familiar "cliff of insanity." The
text starts with simple concepts like commits. About halfway through it begins to explain git's
internal model. By the end you feel like you are reading an alien language.

Poor git. Not a single of its commands seems intact. Here's a piece of the git-log help
page:

--first-parent -- Follow only the first parent commit upon seeing a merge commit. This
option can give a better overview when viewing the evolution of a particular topic
branch, because merges into a topic branch tend to be only about adjusting to
upstream from time to time, and this option allows you to ignore the individual commits
brought in to your history by such a merge.

Such a mass of synthetic concepts.

Git has over 150 commands according to git help -a. If we stop using branches, we
need only a dozen commands. I've explained these in this chapter. They are: add, clean,
clone, commit, init, log, push, reflog, remote, reset, revert, stash.

Most shared repositories use long-lasting branches. We have rituals on how to use these.
Vincent Driessen's git-flow may be the best known. It has master and develop "base
branches." It has feature branches, release branches, hotfix branches, and support
branches. Many teams have adopted git-flow, which even has git extensions to support it.

This complexity demands training and consensus. It lets every project to be different. It
keeps out newcomers away. It pulls power towards central repositories, and then towards
their administrators. That is ironic, when git is about decentralization.

The cost of this complexity might be acceptable if it solves a worse problem. Yet it doesn't.

The second reason for branches is the assumption we need to make large and disruptive
sets of changes. It has always been a poor way of working. Already in 1999 my team built
large apps through continuous integration and delivery. Every change was small, and done
straight onto the master.

For isolation, we broke large projects into pieces. We defined formal APIs and protocols
between the pieces. We had no more that one or two people working on a piece. It worked
without flaw.

Chapter 1 - Hello, World 26


Scalable C (in progress)

We have learned a lot in the last 20 years about how to work together. This knowledge is
difficult for many to accept. It often goes against our fundamental assumptions and
teachings. And yet the data is there. Let me state some key emergent truths:

Good software is the product of collaboration between many people, rather than
individual brilliance or corporate muscle. You are on the Internet, built by collaboration
between millions of people. The many well-funded competitors to this public project are
all dead.

Software is a set of answers to problems we do not know, yet. This is a profound truth.
We believed, decades ago, that software was a finite problem. Today we know it is
infinite. This means software is never finished. Individual projects can start, and end.
What we produce together is an ever-growing and decentralized economy of small
pieces.

The key to such an economy is contracts. It is what underpins the Internet's success.
Contracts permit arbitrary change in one part of a system without breaking other parts.

The most successful software economies are learning machines. Not only do they grow
by learning, they do this at a measurable speed. That speed depends on how long it
takes for projects to grow and adapt to new needs. I call this the "change latency."
Change latency is the sum of many costs. Some are inevitable (it takes time to think).
Some are avoidable (waiting for approval to think).

The key latency cost is time-to-failure. We're taught that success is good, failure is bad.
Yet success teaches us nothing. Software is the accumulation of thousands of theories.
How many of those have we actually tested against reality? Failure is the only basis for
science.

What this comes down to is a subtle yet critical lesson about thinking and learning. Every
change is a theory. Most theories are wrong. Anything that lets us disprove bad theories
faster will produce better software. And we'll get that software with less cost, and faster.
Anything that makes it slower or harder to disprove changes will produce worse software.

So now we come to a theory of change. This underpins my whole approach to software


development:

Anyone should be free to make a change, based on their perception of costs and
benefits. (So, open source with an explicit right of participation.)

Changes should be small, focused, and argued according to the problem they claim to
solve. (So, one commit per change.)

Changes should be safe: no change should ever damage working parts of the whole
system. (So, contracts between layers.)

Chapter 1 - Hello, World 27


Scalable C (in progress)

Changes should enter public view ("reality") and use as soon as possible. (So, rapid
and unhindered merging to master.)

Anyone should be free to challenge, undo, or improve a change, at no significant cost.


(So, open source with an explicit right of participation.)

Branches encourage an opposing theory of change:

Only qualified people can make changes, based on careful analysis.

Changes may be broad and deep, especially within large systems without internal
contracts.

Changes should stay away from public view until well-tested (using artificial test code).

We assume changes to be valid, if they pass their (artificial) tests. The cost of arguing
or removing a change can be significant.

Banning branches forces us to work in smarter ways. There are no guarantees of success.
Yet the economics will push us to smaller, safer, cheaper commits. This produces better
software.

If you have worked this way, you will appreciate it. If you are working the old fashioned way,
my arguments will confuse and annoy you. All I can say is, give it a shot in your next project.
If it fails, let me know where and how.

Conclusions
Sorry for all the git. It was inevitable, as this tool has become so central to our work and
identity as developers. In the next chapter I'll start to explore what Scalable C actually looks
like.

Chapter 1 - Hello, World 28


Scalable C (in progress)

Chapter 2. The Scalable C Language


In this chapter we'll look at the C style we use in Scalable C. Like all styles it's a mix of taste
and pragmatism. I'll explain this using the problem-solution approach. This lets you critique
our decisions, and improve on our answers.

Problem: blank page syndrome


C has few abstractions. It's a blank page language: you can write code in any shape and
form. Yet this creates many problems. The worst problem is that every developer does it
their own way. Every project is unique. Often, even inside a project there is little or no
consistency.

The economics work against creating new projects. It is cheaper and easier to extend
existing ones, as you can use the work already done. This is a Bad Thing. Creating new
projects must cost nothing. This frees us to experiment, reshape, copy, and learn.

One reason git beat old Subversion hands down is that it erased the cost of creating a code
repository. In the Old Times, creating a repository and setting it up for remote access was
days of work. In my firm we could only afford one repository (we were poor if not humble). All
projects sat inside that.

To fix blank page syndrome, we look at C projects and we realize, they could all look much
the same. Sure, they all look different today. Yet that's just historical accident. With a little
care and design we can model them all around the same template. Then, to create a new
project, we just grab an empty template.

Solution: use a standard project template.

We already saw the basics for that:

One project = one git source code repository. Is that obvious? It wasn't, a few years
ago.

Each project has a unique name. The name space is GitHub, though it can be a given
language community. I doubt that Java developers care what names Perl projects use.

Problem: how do I explain my project to


others?

Chapter 2 - The Scalable C Language 29


Scalable C (in progress)

You could hire a designer, and build a beautiful web site. Yet the essence of "scalable" is So
Cheap It Costs You Nothing To Fail. Hand-crafted web sites aren't scalable.

GitHub to the rescue: stick a README in your project root, and it appears on your project's
home screen.

Solution: write a minimal README.

You'll want to use README.md, which uses Markdown formatting. Your README has to
explain at least:

The goal of the project (or better, the broad problem it aims to solve).

The license (under what terms people can use, distribute, and remix the code).

The contribution policy (how people can contribute their patches).

And then if you like:

A style guide (what the code should look like).

How to use the project's tools and APIs.

Problem: my public project has no license


Many public projects on GitHub don't use a license. Don't follow their example. Without a
license, others cannot use, distribute, or remix your code. It doesn't matter that you've
published it. If your code has no license, only uninformed people use it, or send you patches.
The failure to license code the right way can kill a project.

For reasons I'll explain in the dialectics, I recommend the Mozilla Public License version 2.0
(MPLv2) for public works.

Solution: use the Mozilla Public License version 2.0.

Copy the whole license text into a file called LICENSE. Put this into the root directory of your
project. Then, add the following blurb to the header of every source file:

This Source Code Form is subject to the terms of the Mozilla Public
License, v. 2.0. If a copy of the MPL was not distributed with this
file, You can obtain one at http://mozilla.org/MPL/2.0/.

Remember this lesson:

"Most people do X" is not a recipe for success.

Chapter 2 - The Scalable C Language 30


Scalable C (in progress)

Problem: how do I manage copyrights?


I'll assume you are making public software, and you accepted my recommendation to use
the MPLv2. We now come to the question of ownership. The copyright to any non-trivial
work (thus, ownership of code) lies with its author, a person or business. By default, no-one
can use or distribute the work without the owner's OK.

A license grants others the rights to use, remix, and distribute the work under certain
conditions. It is like putting up a sign saying, "You may walk on my lawn if you don't damage
it."

Asking contributors to give copyrights to a project is clumsy and ponderous. It is simpler that
they license their contributions under the project license. This creates a collective work
owned by many people, under a single license. If you use the MPLv2 and the GitHub fork
and merge model, then patches are by default also licensed under MPLv2.

Thus, you can merge them without asking the contributor for a license grant, and without
risk.

You do need to watch out for "unsafe" patches. This means, ones that change the project
LICENSE or the blurb in any source, or which add sources with new blurbs.

Solution: everyone keeps ownership of their own copyrights.

A key side-effect to this arrangement is that it is expensive to change the license on an


existing work with many owners. You need explicit permission from every contributor. Or,
you must rewrite or remove their patches. This side-effect is often desirable, as it is a poison
pill against hostile takeover.

Problem: how do I manage contributions?


You need a way to collect patches and merge them onto master. Some projects use email
lists. Some projects have maintainers who pick patches, review them, merge the ones they
like.

You need to avoid commits straight to master, as these are silent. It is more fun to have a
ping-pong between the person who wrote a patch, and another human. This is a nominal
maintainer.

My pattern for success is to get "pull requests" onto master, then to merge them as fast as
possible. One can discuss them after merging.

Solution: use pull requests and merge with haste.

Chapter 2 - The Scalable C Language 31


Scalable C (in progress)

I'll explain the "with haste" part in the dialectics of this chapter. There are a few rules:

You never merge your own pull requests. Every project needs at least two minds.

It is better to make a new pull request with changes, than to discuss a commit. The
former creates a team; the latter creates an argument.

Continuous integration testing (CI) is a Good Idea yet it's not essential. Errors are an
opportunity for others to get involved.

The only good reason to refuse a change is, "the author is a bad actor and we banned
them."

Remember this lesson:

People are more important than code.

Problem: how do I keep a consistent code


style?
It is painful to read code that has no style. A good project looks like it has a single author.
Consistency is gold. Yet every contributor comes with their own habits.

A solution some projects use is to clean up code using a code beautifier. This does create a
consistent style. Yet that does more harm than good, in my experience. It turns out that
"cannot respect project style" is key data for detecting bad actors. It's a specific case of their
general disrespect for social norms and rules.

Thus it is better to document the project's style, and ask people to respect it. They won't, and
so you can fix their patches and they should learn. If they don't, over time, you start to build
a case for banning them.

Solution: use a style guide document.

You should be totalitarian about style. Every space and dot matters. Compare these two
fragments of C:

int i;
for( i=0 ; i<10; i++ )
{
printf ("%d\n", i);
}

and

Chapter 2 - The Scalable C Language 32


Scalable C (in progress)

int counter;
for (counter = 0; counter < 10; counter++)
printf ("%d\n", counter);

Remember this lesson:

Consistency matters.

I think there are some basic rules, such as using whitespace and punctuation as we do in
English. Code should be compact as screen space is always precious, yet not cryptic. It
makes no sense to use short variable names like 'i' and then put { on a line by itself. I'll come
to the specifics of a Good Style for C as we continue.

Problem: where do I put my sources?


Finally, a non-contentious problem.

Solution: put headers into include, and sources into src.

If we have private headers (that only sources in this project use), place them in src as well.
This way, include contains our public API.

Problem: how do I organize my code?


Even a C application (a command-line tool, perhaps) needs some internal structure. Some
tools exist as massive single C files. It's not a good way to work. It is far better to build up
libraries, which the final application uses.

For example, I've written a messaging broker called Malamute. It's a C application. Here is
the command line malamute.c tool (stripped down to show the essence):

#include <malamute.h>
int main (void)
{
...
zactor_t *server = zactor_new (mlm_server, "Malamute");
...
zactor_destroy (&server);
return 0;
}

Chapter 2 - The Scalable C Language 33


Scalable C (in progress)

All the actual server code is in a class called mlm_server. The command line tool parses
arguments, mucks about with configs, then starts the server. It runs until interrupted, then
destroys the server (ending it).

This is a clean and powerful way to write services and other code. In fact, all C code except
the thin user interface.

Solution: organize your code into classes.

Remember this lesson:

Everything is a class. You can definitely make singleton methods (which do not work on
a specific instance).

By freaky coincidence, we called the style guide for Scalable C "CLASS." What can I say...
acronyms came back into fashion around 2001.

Problem: what compilers can I rely on?


In general, every C compiler worth using will support the C99 standard. We use two specific
C99 features a lot: in-line declarations and in-line comments.

So we can write this:

// Declare and initialize list in one step


zlist_t *list = zlist_new ();

Instead of the old C89 style:

/* All declarations at start of function */


zlist_t *list;
...
/* Code starts after all declarations */
list = zlist_new ();

On Windows, Microsoft never got around to upgrading their C compiler to C99, so we have
to use the misnamed "Visual" C++. Luckily C++ is almost a pure superset of C99. (Some
unkind folks say that the C99 committee stole the few bits of C++ that weren't utter mind
rotting garbage. That seems unfair. "Stole" is such a harsh word.)

Solution: use C99 on real operating systems, and C++ on Windows.

And further, only use C99 syntax that is a pure subset of C++. Otherwise, no portability. It is
rather useful to be able to use C++ compilers to build your projects.

Chapter 2 - The Scalable C Language 34


Scalable C (in progress)

Remember this lesson:

Don't use C++ keywords like interface as variables.

Problem: how do I name my source files?


Let me ask you a question. Imagine I show you this code:

zactor_t *server = zactor_new (mlm_server, "Malamute");

Better still, don't imagine it, since I just showed you the code. Twice, since you weren't
paying attention the first time. Where would you expect to find the method called
zactor_new?

The best solutions to problems are the most obvious ones, if they work. This takes out the
guesswork. The most obvious place to find this method is in a file called src/zactor.c. It
would be bizarre to put every method into its own source file. It would be silly to put more
than one public class into one source file. (While it is obvious to put private classes into the
source file that uses them.)

Use the class name as the source file name.

So for a class called zactor we want src/zactor.c with the code, and
include/zactor.h with the public API. That is, function prototypes, typedefs, and
constants.

Remember this lesson:

Be fanatic about consistency. Your users will love you for surprising them in nice ways
only.

Problem: I need to name my classes


Naming is like all hard problems: break it down, and it becomes easy. As often, look for the
obvious and most usable answers rather than the "best" or "most consistent" answers.

A "best" name for a human is a 12-digit number that encodes their date of birth and acts like
a global roaming phone number. Yet it is neither obvious nor usable. A person needs a
unique name within their close family (a "personal name"). Then, a family name that
identifies them to strangers (a "family name"). Then, decoration to make their name unique
(middle initials, titles). Then, short names for their social networks (GitHub login).

Chapter 2 - The Scalable C Language 35


Scalable C (in progress)

When choosing a name, the more often we use a name, the shorter it should be. This is why
we like short personal names, and tolerate long family names. The other way around is
surprising to us.

A class needs a unique name within their library. Try to find a single word that expresses
what the class does. It then needs a family name that identifies it to strangers. We use this
family name most often of all, so it must be even shorter than the class name.

Solution: use a unique prefix for classes in a project.

You do not need global uniqueness. Somewhere out there, people may be writing C code
with the same class names. That is fine so long as your prospective users aren't pulling in
both libraries.

The prefix I used for CZMQ was "z" since this started life as a ZeroMQ wrapper, and I
wanted the shortest possible prefix. For Zyre I chose "zyre" since that is short, and unique,
and clear. For Malamute I chose "mlm" since "malamute" felt too long.

I'll use "myp" as the prefix, in example code that follows. We usually use an underscore
between the prefix and the rest of the name.

Remember this lesson:

Use simple English words for class names, then prefix them with the project prefix.

Problem: how do we invoke class methods?


C has no support for classes. So we have to invent this. People have tried various
approaches. One way is to create an object that contains pointers to functions. You might
hope to invoke methods like this:

myobject->method (arguments)

Except the method still needs the object to work with, so it looks like this:

myobject->method (myobject, arguments)

In theory you could get rid of the myobject argument. You'd need to create a structure that
holds the object reference together with each method pointer. If we were generating code,
this is how I might do it. Yet we want a design that fits our hand, and which is simple and
obvious. Code generation often adds too much of its own complexity.

Solution: construct a full method name out of project prefix, class, and method.

Chapter 2 - The Scalable C Language 36


Scalable C (in progress)

So we get:

myp_myclass_mymethod (myobject)

From experience, people get this style at once, and it works. It is a little more to type. Yet it
has the advantage that construction, destruction, and methods all have a consistent style.
Take a look at this fragment, without comments or explanation:

mlm_client_t *writer = mlm_client_new ();


mlm_client_set_plain_auth (writer, "writer", "secret");
mlm_client_connect (writer, "tcp://127.0.0.1:9999", 1000, "writer");
mlm_client_set_producer (writer, "weather");
mlm_client_sendx (writer, "temp.moscow", "10", NULL);
mlm_client_sendx (writer, "temp.london", "15", NULL);
mlm_client_sendx (writer, "temp.madrid", "32", NULL);
mlm_client_destroy (&writer);

Remember this lesson:

The eye likes patterns in columns. Use this to your advantage.

Problem: how do we isolate our objects?


The natural way to represent a random constructed "thing" in C is a structure. You can, as
POSIX often does, make these structures public, and document them. The problem with this
is that it creates a complex and fragile contract. What happens if the caller modifies a field?
How do you extend and evolve the structure over time?

Solution: use an opaque structure, and getter-setter methods.

C lets us make "opaque structures" which callers know nothing about except their name. In
the public header file include/myp_myclass.h, we write:

typedef struct _myp_myclass_t myp_myclass_t;

In the class source file src/myp_myclass.c we define the structure and provide methods
to work with it:

Chapter 2 - The Scalable C Language 37


Scalable C (in progress)

struct _myp_myclass_t {
...
char *myprop;
...
};

// Get myprop property. Note that it's defined as 'const' so


// the caller cannot modify it.
const char *
myp_myclass_myprop (myp_myclass_t *self)
{
assert (self);
return self->myprop;
}

// Set myprop property


void
myp_myclass_set_myprop (myp_myclass_t *self, const char *myprop)
{
assert (self);
free (self->myprop);
self->myprop = strdup (myprop);
}

Problem: how do we manage memory?


C has no garbage collection, and it's not something you can add into a language. Yet
allowing random blocks of memory and strings to float around your code is fragile. It leads to
fuzzy internal contracts, memory leaks, bugs.

After much experimentation, we learned how to hide almost all memory management inside
classes. That is:

Every class has a constructor and a destructor.

The constructor allocates the object instance.

Further methods can allocate properties and object structures (lists, and such).

When you call the destructor, it frees all memory that the class allocated.

The caller never sees this work, it hides inside the class. This means we can change it as
we like, so long as we don't change the methods (the class API).

Solution: hide all allocations inside the class.

Remember this lesson:

Chapter 2 - The Scalable C Language 38


Scalable C (in progress)

The power of abstraction comes from hiding irrelevant details.

Problem: how do we return freshly-allocated


data?
Here is a method that returns a fresh buffer holding some content:

byte *
myp_myclass_content (size_t *content_size)
{
...
*content_size = ...
byte *content = malloc (*content_size);
...
return content;
}

The author wants to return a buffer, yet also needs to return the buffer size. So, they add an
argument which is a pointer to a returned content_size.

When you call this method, it's not immediately obvious what it's doing:

size_t content_size;
byte *content = myclass_content (&content_size);
...
free (content);

If we're designing from the user's perspective (always a better idea), we'd want to get a
buffer object that we could destroy. We don't need to invent a buffer type, since CZMQ gives
us a zchunk class. So, we can write:

zchunk_t *content = myclass_content ();


...
zchunk_destroy (&content);

Which is rather cleaner. It is also fully abstract. Perhaps zchunk consists just of a size and
data. As it turns out, it has other, useful properties. Such as, the ability to resize chunks and
append data to them.

Solution: return objects, not blocks of memory.

Chapter 2 - The Scalable C Language 39


Scalable C (in progress)

The only exception that works is strings, which are a native C object. It is safe to return a
fresh string and tell the caller to free it when done. Inventing a more abstract string type is
fun, yet it breaks the standard C library. I don't recommend doing it.

Remember this lesson:

A method should return a single value, or nothing at all.

Problem: how do we pass the object to


methods?
Not all methods work on objects. Some are "singletons" which just means "not a class
method but that other kind of thing we used to call a 'function' and now call 'singletons'."

Apart from singletons, all methods take an object reference. This is a pointer. It is the thing
that constructors (the _new method) return. As objects are abstract and hidden inside their
classes, we work with them only via methods. There are exceptions -- private classes -- that
I'll explain later.

In C there is no real convention for the order of arguments. The standard C library often puts
destination arguments first. This perhaps comes from right-to-left assignment. That in turn is
a hangover from assembler. MOV X, Y. A good designer aims to make the order obvious,
unsurprising. Yet that can lead to inconsistency. What's the obvious order for "plot X,Y on
map M?" Is it mylib_plot (x, y, map)?

The obvious rule when we imitate objects is to pass the object reference as first argument.
So we'd say mymap_plot (map, x, y).

Solution: pass the object reference as first argument to methods.

Remember this lesson:

Don't surprise your future self.

Problem: what do we call the object reference,


in a method?
Solution: use 'self' inside methods to refer to the object reference.

Remember this lesson:

Don't use C++ keywords like this as we need to be nice to C++ compilers.

Chapter 2 - The Scalable C Language 40


Scalable C (in progress)

Problem: how does a constructor work?


A constructor must allocate the memory for an object, and then initialize it. This is easy to do
once you've learned a few subtle and non-obvious rules:

Try to keep constructors simple, and only pass arguments if it is a natural part of the
constructor.

Use the zmalloc macro to allocate and nullify memory. It means you don't need to
initialize individual properties. This is like calloc with some extra wrapping. Take a look
at czmq_prelude.h if you want to know more.

Aim to initialize all properties to null/zero/false/empty by default. This means choosing


names with care. For example if you have an active yes/no property, and the object
starts active, then use "disabled" instead of "active" as property name.

If your object contains large blocks of memory, do not use zmalloc as it takes more time.
Instead, use malloc and then initialize properties one by one.

If memory allocation fails, in general, give up with an assertion. In specific cases you
can hope to catch and deal with the error. Most often you can't. Too little memory is a
configuration error in most cases.

Solution: use the standard constructor style.

So let's look at a the standard constructor style:

struct _myp_myclass_t {
char *myprop;
zlist_t *children;
};

myp_myclass_t *
myp_myclass_new (void)
{
myp_myclass_t *self = (myp_myclass_t *) zmalloc (sizeof (myp_myclass_t));
assert (self);
self->zlist = zlist_new ();
return self;
}

Note how the code does a cast from zmalloc. We need this on Windows to keep the C++
compiler happy.

Problem: how does a destructor work?

Chapter 2 - The Scalable C Language 41


Scalable C (in progress)

A destructor does the opposite of the constructor. That's a comfortable statement, isn't it.

Yet it's not obvious. The biggest gotcha with destructors in C is how to make them
idempotent. It is something the standard C library got wrong. Let me show you:

byte *buffer = malloc (100);


free (buffer);
...
free (buffer);

Wham! You have corrupted the heap. What happens next is anyone's guess. The standard
advice is to add buffer = NULL; after the free. Yet if a developer is weak enough to lose
track of their pointers, will they remember to nullify them? No, they won't.

We need a style that removes the guess work. It's easy and it works well. My team invented
this (as far as I know, in 2006. It was part of another object oriented C language as a
platform for OpenAMQ:

safe_free (&buffer);

Solution: pass a pointer to the object reference, so the destructor can nullify it.

This gives us the following destructor template:

void
myp_myclass_destroy (myp_myclass_t **self_p)
{
assert (self_p);
if (*self_p) {
myp_myclass_t *self = *self_p;
zlist_destroy (&self->children);
free (self);
*self_p = NULL;
}
}

Remember this lesson:

If you see '&' before an argument, that means "destructive"

The normal use for '&' is to return values by reference. That is a bad idea in most cases, as
I'll explain later.

Problem: how do we deal with exceptions?

Chapter 2 - The Scalable C Language 42


Scalable C (in progress)

Speaking of exhaustion, let's discuss what we do when things don't work as planned.
Classic C error handling assumes we're tired/dumb enough to make silly requests, yet smart
enough to handle complex responses. I've used plenty of systems that returned dozens of
different error codes. It becomes a leaky and fuzzy contract.

The theory that rich exception handling makes the world a better place is widespread. It's a
bogus theory, in my experience. Simplicity is always better than complexity.

To get to specific answers, we must untangle the different kinds of failure in software. We
can then deal with them one-by-one.

Solution: use simple, foolproof exception handling.

Let's break down the kinds of exceptions we tend to hit, and solve each one in the simplest
way.

Problem: nothing to report


In a real time system, "nothing" is such a common case that it's not exceptional. The
simplest solution is to return "nothing" to the caller. If there are different kinds of "nothing"
that we must distinguish, turn these into meaningful pieces of the API.

While you may feel compelled to tell the caller why nothing happened ("timeout error!"), this
is like talking to strangers about your private life. It's what you don't say that lets people
respect you.

Solution: return NULL or zero.

Examples:

Return next item on list, or NULL if there are no more.


Return next message received, or NULL if there is none.
Return number of network interfaces, or zero if there is no networking.

When you do this well, your API fits like a soft glove. For instance, imagine these two
methods for iterating through the users in a group:

myp_user_t *myp_group_first (myp_group_t *group);


myp_user_t *myp_group_next (myp_group_t *group);

Here is how I print the names of each user in a group:

Chapter 2 - The Scalable C Language 43


Scalable C (in progress)

myp_user_t *user = myp_group_first (group);


while (user) {
printf ("%s\n", myp_user_name (user));
user = myp_group_next (group);
}

Which is tidy, safe and hard to get wrong.

Remember this lesson:

Design your API so that it's a pleasure to use.

Problem: caller passed us garbage


Library authors (as we strive to be, when we write C) get this a lot. Things crash with weird
errors. It's always our fault. We hunt and dig, and finally we discover the cause. The calling
code, our dear users, passed us garbage. We didn't check it, and our own state got
corrupted.

Even the standard C libraries have this problem. What does code do, if you call free ()
twice on the same pointer? The results are not defined. It may do nothing. It may crash
immediately. It may run a while, then start to do strange stuff.

Passing garbage to library functions is a common mistake, especially with beginners. There
are three things you should aim to do, as library author:

Design your APIs to remove the potential for obvious mistakes.

Be cynical about what people give you, and use techniques to detect mistakes.

When you detect a mistake in your calling code, assert immediately and without pity.

Solution: detect garbage, then fail fast.

I've explained our destructor pattern, and how we nullify the caller's reference. This fixes the
common mistake of trying to work with a destroyed object. Code can still do that, and it will
pass NULL to a method.

It is trivial and costs nothing to check for NULL, so you will see this in all well-written
methods:

Chapter 2 - The Scalable C Language 44


Scalable C (in progress)

void *
myp_myclass_mymethod (myp_myclass_t *self)
{
assert (self);
...
}

Since we use strong types, it is hard to pass random data to a method. One must do extra
work like adding a cast. That excludes innocent mistakes.

Why assert, instead of returning an error code? There are a few good reasons:

If a developer is making such mistakes, they won't be capable of handling errors.

If the code is faulty, it is irresponsible to continue running it. Bad Things can happen.

The fastest way to fix the problem is to assert and tell the developer exactly when it
broke.

An assert that creates a core dump and call stack gives a developer the means to fix
common mistakes.

Remember this lesson:

Developers make mistakes. You cannot expect perfection. Asserts are a good teacher.

Problem: the outside world passed us garbage


We assert when calling code makes mistakes so that production code should always work.
Do not assert when the outside world gets it wrong.

Here's an example to illustrate. We're writing a HTTP server. It has a routine to parse a
HTTP request and return us all the values in a neat hash table. Now, the outside world
(arbitrary browsers) can and will often send us garbage. Our parsing routing must never
crash. Rather, it should treat garbage recognition as its main job.

If little Bobby Tables taught us anything, it is that all data received from the outside world is
toxic garbage until proven otherwise. Any fool can write a parser for correct input. The real
art in parser writing is to deal with garbage.

Solution: treat garbage as the problem to solve.

To deal with garbage input depends on how well you know the culprit:

When you get garbage from total strangers on the Internet, you discard it.

Chapter 2 - The Scalable C Language 45


Scalable C (in progress)

When you get garbage from your dear users, you try to tell them what they did wrong.
Then you discard it.

So in the second case we return an "invalid" response to the caller, and provide the details
via some other means. Here is how I'd design this for a HTTP parser:

// http_client_t holds a connection to a remote web browser


// client is an instance of that class
http_request_t *request = http_client_parse (client);
if (request) {
... start to process the request
}
else {
zsys_debug ("invalid HTTP request from %s: %s",
http_client_address (client),
http_client_parse_error (client));
http_client_destroy (&client);
}

Remember this lesson:

Some garbage is malicious, and some is just ignorant.

Problem: bad input caused my code to crash


The security industry calls such vulnerabilities "lunch." Don't feed the security industry.

Solution: be paranoid about foreign data.

There are a few basic rules to observe:

Always treat compiler warnings as fatal. Modern C compilers do a good job of telling
you if your code looks like it is doing stupid things. Listen to the compiler.

Don't assign void pointers to typed pointers without a cast. Dereferencing the wrong
pointer type will cause trouble. The cast is optional in C99, yet it forces you to double-
check your code. C++ (as on Windows) insists on the cast.

Do compile your code on different platforms, often. Different compilers catch different
mistakes.

Always use return in non-void functions (and never do this in void functions).

Never use a variable as a format string in printf-style calls. It invites disaster. A good
compiler will complain if you try to do this.

When you read input from the network, assume the sender is a malicious psychopath. If
the input is too long, chop it and throw away the excess.

Chapter 2 - The Scalable C Language 46


Scalable C (in progress)

Learn which system calls are unsafe. Like gets () for example. Again, good compilers
will warn you. Use 'man' to learn about library calls.

Problem: our own state is garbage


As well as checking for caller mistakes, we use asserts to check internal consistency. After
all, we also make errors in our code, at a constant rate. These often show up as data with
impossible values.

Solution: use asserts to catch impossible conditions.

Some people may complain that a library filled with assert statements is untrustworthy.
Ignore such people. They are poor contributors, and worse clients. The truth is that a C
library which does not use assertions to self-check is unreliable.

Remember this lesson:

The faster you fail, the faster you can recover.

When you use assertions, do no work in an assertion (a so-called "side-effect"). Naive users
looking for a cheap yet meaningless kick may remove assertions. Any side-effects also
disappear. This is an example of what not to do:

// This is unsafe as whole assert () may disappear


// if the user is foolish
assert (myp_myclass_dowork (thing) != -1);

Problem: a library misbehaved


A working piece of code can stop working for the stupidest reasons. One classic cause is
when a sub-library changes its behavior. ZeroMQ used to be guilty of this until we banned
such changes. (Changing a version number doesn't help applications that break.)

The user can't do much except complain and report an error message to the developers.
Then the wailing and gnashing of teeth begins. After a while, maybe, there is a new release
that works again.

Solution: if components don't behave as documented, assert.

Remember this lesson:

Make sure you blame the library in question, in any error message.

Problem: system ran out of resources

Chapter 2 - The Scalable C Language 47


Scalable C (in progress)

This is I think the hardest problem to handle. Most developers are not aware of the specific
limits of every operating system. On OS/X there is a default limit of 255 sockets per process.
A busy server will soon run out.

In theory a server can adapt its behavior to the capabilities of the system. Yet in practice that
is close to impossible. Even if your code handles "out of memory" failures, modern systems
use virtual memory. Long before malloc calls start to fail, your program is thrashing in and
out of swap.

Trying to recover from resource exhaustion makes code more complex. That makes it more
fragile, and more likely to have hidden errors. This is not a good path towards stable, long-
running code.

Solution: if you do run out of memory, assert.

There are several winning strategies to deal with resource exhaustion:

Print a helpful error message, then assert. This forces someone to re-tune the system.

Preallocate all resources (sockets, memory, threads) in a pool, then work only from that
pool.

Use deliberate strategies to reduce resource consumption, such as bounded queues.

Remember this lesson:

When your system runs above 50% capacity, it is already overloaded. Always aim for
under 50% use of disk, memory, CPU, and network.

Problem: we need consistent return values


I've already argued against returning values via parameters. In C, functions return one thing.
Here are the rules that work best, in my experience:

Return nothing.

Return success/failure as int, with values zero and -1.

Return yes/no as bool, with values true and false (works best if the method takes the
form of a question).

Return a fresh string to the caller as char *; caller owns and must free such strings.

Return a constant string to the caller as const char *; the caller may not change or
free these.

Chapter 2 - The Scalable C Language 48


Scalable C (in progress)

Return a ordinal value (positions, quantities, indexes) as size_t.

Return an object property (works best if the method has the name of the property).

Return other integer values using the least surprising type.

Return a composed value (list, hash, array, buffer) as a fresh object instance. Try to
avoid returning composed values that the user may not change, as this is asking for
trouble.

Remember this lesson:

Design your APIs by using them. Be intolerant when an API is irritating.

Problem: how do I export my APIs?


After lots of writing, compiling, testing, cursing, and repeating, you get two things. One, a
"library file" that contains your precious "object code," which is the compiled version of your
source code. These terms were invented by mad scientists at IBM in the 1970s.

Libraries come in two flavors: static libmyp.a and dynamic libmyp.so on Linux. If you
are curious, use the file command to ask Linux what any given file is. Here's the kind of
fun you can have with file:

$ file /usr/local/lib/libmyp.la
/usr/local/lib/libmyp.la: libtool library file,
$ file /usr/local/lib/libmyp.a
/usr/local/lib/libmyp.a: current ar archive
$ file /usr/local/lib/libmyp.so
/usr/local/lib/libmyp.so: symbolic link to `libmyp.so.0.0.1'
$ file /usr/local/lib/libmyp.so.0.0.1
/usr/local/lib/libmyp.so.0.0.1: ELF 64-bit LSB
shared object, x86-64, version 1 (SYSV),
dynamically linked, BuildID[sha1]=007...
not stripped

I'll explain in “Packaging and Binding” how we build and install these. Don't stress, it's
simpler than you might think. (Hint: magic.)

As well as these library files, your users need header files to define prototypes for all the
methods you export.

Solution: export your API as a single public header file.

Chapter 2 - The Scalable C Language 49


Scalable C (in progress)

In practice we use one main header file plus one header file per class. Take a look at
/usr/local/include and you'll see what I mean. If this mass of header files distresses
you, take a pill. There is no cost. In older projects we used to generate single project header
files with all classes included inline. That turns out to be more work than it's worth.

The project header file goes into include/myproject.h. The library files will be
libmyp.something.

Your project may also produce command line tools (aka "binaries" or "mains"). You may
want to install some of these too.

Remember this lesson:

Give your users a single header file that does everything.

This means, for instance, including all dependent header files. It's just polite.

Problem: how do I version my API?


This is one of the harder problems to solve, and people have been gleefully solving it badly
for a long time.

Look at the Smart Peoples' Choice for Versioning, aka Semantic Versioning. It starts by
saying, "increment the major version when you make incompatible API changes." Yay,
breaking user space is legal, yay!

This teaches us an important lesson about the stupidity of smart people. Breaking user
space is not OK. It doesn't matter what numbers you stick on things. Yes, vendors do this all
the time. No, it's still not OK.

There are several difficulties in versioning an API:

Different pieces of the API evolve at different speeds. Some are stable while others are
experimental. So, sticking a single version number on the API is like giving a family of
thirteen children a single first name. It's so simple, yet so wrong.

Software versions are often a marketing tool. People like to see general progress. So,
smart projects make new releases to create buzz. It is a valid problem: no buzz, no
users. Yet it has nothing to do with API versions.

Shareable libraries, under Linux, get named with an "ABI version" which has nothing to
do with the software version. Ah, and sometimes the library version is just one digit. And
sometimes it is three digits. It depends on what distribution you use.

Chapter 2 - The Scalable C Language 50


Scalable C (in progress)

The science of API versioning has a way to go. I've proposed that we version individual
methods and classes using a "software bill of materials." As you'll learn later, we're
developing the tools for this.

For today, the best solution we've found is to not break APIs that people depend on.

Solution: don't break user space.

If you do need to change stable APIs, do it by adding new classes and methods, and
deprecating the old ones.

This means a new version of your library is always backwards compatible with older ones. At
least where it matters. Then, the actual numbers you use become secondary.

Remember this lesson:

Versioning is an unsolved mess.

Problem: I need to define my software version


somewhere
Ignoring the ABI version (as far as we can) makes life simpler. The ABI/API problem will
come back to bite us again. One thing at a time though. It's our software version that people
care most about. We need a way to stamp this into the code.

Solution: define the version in your public header file.

Here is our standard way of doing this:

// MYPROJ version macros for compile-time API detection


#define MYPROJ_VERSION_MAJOR 1
#define MYPROJ_VERSION_MINOR 0
#define MYPROJ_VERSION_PATCH 0

#define MYPROJ_MAKE_VERSION(major, minor, patch) \


((major) * 10000 + (minor) * 100 + (patch))
#define MYPROJ_VERSION \
MYPROJ_MAKE_VERSION(MYPROJ_VERSION_MAJOR, \
MYPROJ_VERSION_MINOR, \
MYPROJ_VERSION_PATCH)

Once we've defined it like this, we can extract the version number in build scripts, and use it
in the API.

Remember this lesson:

Chapter 2 - The Scalable C Language 51


Scalable C (in progress)

Put the version number in a single place only, or you will make mistakes as you change
it.

Problem: my users demand documentation


As they should. Documentation makes or breaks a project. We all know this: shitty docs
means shitty code. Look at the code someone writes, and you get an instant "like" or
"dislike" emotion. Pay attention to this emotion! It will save you from pain, if you listen to it.

People have tried to automate API documentation using tools like doxygen. The results tend
to be mediocre. Look at CZMQ's documentation. It's far simpler and yet at once familiar.

As I keep saying, when we write C, we build APIs. That means we talk to other
programmers. The most accurate language for explaining a C API is more C. Period.

When we reach for documentation we are looking for something specific. The
documentation must give us the fastest path to this answer. No waffle or preamble.

In an ideal world, the answers lie in the source code. Reading source code is not a failure of
documentation. It is a success of style. This chapter is all about structure and readability.
The goal is to produce source code that people can enjoy reading for profit.

Code is language, and the classes and methods we write are a form of literature. I'm not
being poetic. This is key to writing systems that survive over the long term.

Solution: focus on code quality, and extract key pieces as documentation.

The key pieces we need are:

The public API for a class and method. This must show the prototype, plus a few lines
of explanation. It does not need to be pretty in the "ooh sans-serif and pastels!" sense.
In fact, if it looks like C code it's easier to read and understand.

Examples of using the API. These must be simple, reusable, and clean. Also, they must
work. That means, they must be part of the project, built and tested with classes.

External examples are also great, especially if you want to build larger teaching projects. I've
done a lot of this. Yet it comes second to API man pages. People need to learn one step at a
time.

Remember this lesson:

The best way to teach code is to show code.

Chapter 2 - The Scalable C Language 52


Scalable C (in progress)

Problem: how do I test my API?


When someone says "trust me, I've tested it," your natural reaction should be cynical. So
tests that are part of a project are only good up to a point. Any smart user builds their own
tests.

Yet we need to know if a patch broke something. When we work in groups, this translates to
"I trust your patch so long as it didn't break our test cases." In the ZeroMQ core library we
turned this around to encourage people to write test cases. "If you write a test case for
method X, there's less chance someone will break it in the future."

When working with others, test cases are a form of insurance. They also teach users how to
use the API. More users means extra lives. The more thousands of people use a piece of
code, the better its chances of survival.

Solution: every class has a test method.

We can then call the test methods when we do "make check" and in continuous integration
testing. This turns out to be a good place to stick our example code too.

The test method needs no error handling. If any given test fails, it asserts. This kills the crab
and makes sure someone steps up to fix things. Or not, if no-one cares. Both are valid
scenarios.

Remember this lesson:

When writing a test method, you are teaching others how to use the API. Make it
readable.

Problem: how do I actually produce the docs?


This rule applies to generated documentation: garbage in, garbage out. We still want to
generate the docs, for several reasons:

It is the safest and fastest way to produce accurate docs.

It lets us produce many targets from the same inputs.

It encourages a literate coding style.

It exposes poor code, so we can fix or remove it.

In technical terms:

We scan the class sources and headers for specific sections of code and text.

Chapter 2 - The Scalable C Language 53


Scalable C (in progress)

We merge these with templates to produce text files in various formats.

We call external tools like asciidoc to convert these into further formats.

We publish the results on-line, or in our git repository, or as man pages.

We use a tool called gitdown to do all this. It also produces a detailed README.md file with
class and method documentation. Install that tool, you will appreciate it, and we'll depend on
it later.

I need to explain how to tag your sources to tell gitdown what is what. Each tag sits on a
line by itself, with or without a comment:

In the class header, mark the public API with <tt>@interface</tt>, ending with
<tt>@end</tt>.

In your class source, explain the class using <tt>@header</tt> to mark a summary,
<tt>@discuss</tt> for details, and <tt>@end</tt> to finish.

In the test method, mark example code with <tt>@selftest</tt> and <tt>@end</tt>.

Take a look at any CZMQ source or header to see what I mean. It looks like this (from
zuuid.h):

// @interface
// Create a new UUID object.
CZMQ_EXPORT zuuid_t *
zuuid_new (void);

// Create UUID object from supplied 16-byte value.


CZMQ_EXPORT zuuid_t *
zuuid_new_from (const byte *source);
...
// Self test of this class.
CZMQ_EXPORT void
zuuid_test (bool verbose);
// @end

And this (from zuuid.c):

Chapter 2 - The Scalable C Language 54


Scalable C (in progress)

@header
The zuuid class generates universally-unique IDs (UUIDs) and provides
methods for working with them. A UUID is a 16-byte blob, which we print
as 32 hex chars.
@discuss
If you build CZMQ with libuuid, on Unix/Linux, it will use that
library. On Windows it will use UuidCreate(). Otherwise it will use a
random number generator to produce convincing imitations of UUIDs.
Android has no uuid library so we always use random numbers on that
platform.
@end

And later,

// @selftest
// Simple create/destroy test
assert (ZUUID_LEN == 16);
assert (ZUUID_STR_LEN == 32);

zuuid_t *uuid = zuuid_new ();


assert (uuid);
assert (zuuid_size (uuid) == ZUUID_LEN);
assert (strlen (zuuid_str (uuid)) == ZUUID_STR_LEN);
zuuid_t *copy = zuuid_dup (uuid);
assert (streq (zuuid_str (uuid), zuuid_str (copy)));
...
zuuid_destroy (&uuid);
// @end

Remember this lesson:

Literate code is good code. This means, write the code as if you are documenting it.

Problem: I need private classes


Any realistic project needs private classes. Not every API is worth exporting, or desirable to
export. There are two main cases we need to cover:

Classes shared by other classes in the project, yet deemed too "internal" to offer to
users.

Classes used in a single source file only.

In both cases, keeping the class private lets us change it as we like.

Problem: my library has private classes

Chapter 2 - The Scalable C Language 55


Scalable C (in progress)

A private class can follow almost the same style as a public class, except:

Its header file should be in src and not in include.

The project header file won't include it.

So we need a second include file in src that includes all private class headers.

Solution: use two project headers, one public and one private.

In CZMQ we call these include/czmq_library.h and src/czmq_classes.h. The


project source files use the private project header. Calling applications use the public project
header.

Remember this lesson:

Your exported API is in include. All other sources go into src.

Problem: my source file has private classes


When we start to manage data structures, we often need classes to hold individual pieces. It
is simplest to write these in the source file. We can get away with less abstraction, and less
work.

We define a private class as a structure:

// This is one peer


typedef struct {
char *name;
char *address;
zsock_t *sock;
} s_peer_t;

And then we write a constructor and destructor:

Chapter 2 - The Scalable C Language 56


Scalable C (in progress)

static s_peer_t *
s_peer_new (char *name, char *address)
{
s_peer_t *self = (s_peer_t *) zmalloc (sizeof (s_peer_t));
assert (self);
self->name = strdup (name);
assert (self->name);
self->address = strdup (address);
assert (self->address);
return self;
}

static void
s_peer_destroy (s_peer_t **self_p)
{
assert (self_p);
s_peer_t *self = *self_p;
if (self) {
zstr_destroy (&self->name);
zstr_destroy (&self->address);
zsock_destroy (&self->sock);
free (self);
*self_p = NULL;
}
}

We can write methods for this private class:

static int
s_peer_connect (s_peer_t *self)
{
assert (self);
self->sock = zsock_new_client (self->address);
return self->sock? 0: -1;
}

And we can access and work with its properties without getter/setter methods:

s_peer_t *peer = s_peer_new ("server", "ipc://@/server");


s_peer_connect (peer);
zmsg_t *msg = zmsg_recv (peer->sock);
...

As the class is private, changes are low-risk. The compiler will catch errors immediately. We
stick to the constructor/destructor pattern because it hides heap access. Getters/setters are
overkill.

A few notes:

Chapter 2 - The Scalable C Language 57


Scalable C (in progress)

Don't use the project or class prefix in private class types, or methods. There is no
need. Use simple short names. This makes your code more readable, and shareable.

Use a prefix "s_" on private class types and methods. This is shorthand for "static"
which in C means "private" when used on functions.

Define the class and its methods at the start of your source. This removes the need to
write prototypes, which is always annoying in C.

Remember this lesson:

You can use the CLASS style even on simple in-line private classes.

Problem: is my code thread-safe?


Thread-safe code can handle calls from many threads at once without crashing. "Re-entrant"
code is a similar thing, though just within one thread. For example, an interrupt handler that
calls code that calls the same interrupt handler again.

To start with, re-entrant C code must not use static variables. Each entry to a function gets
its own stack, so local variables (held on the stack) are safe. If the function uses the heap,
and stores its references in local variables, that is also safe.

Thread-safe C code must at least be re-entrant. It then also needs rules to prohibit the
sharing of data between threads. Or, it needs mutexes around code that works on shared
state.

I've built large concurrent servers (OpenAMQ) that used mutexes to share data between
threads. Trust me when I say you don't want to use this approach. We spent as long hunting
down threading issues as we did writing the original code.

Conventional multi-threading is a nightmare. The code seems to work, then as you run it
under load, with more and more threads, it starts to crash. You cannot serialize everything,
or you might as well run on one thread.

There are nicer, smarter ways of building concurrent C architectures. In Scalable C we use
actors and messages, a design taken from Erlang and Akka. It simple to understand, and to
use. I'll come to this later in the book.

So we make code thread-safe by banning static and global variables. And then by banning
any attempts at using shared state. That means an ironic and yet satisfying ban on mutexes.

Solution: ban static/global variables, and mutexes.

Chapter 2 - The Scalable C Language 58


Scalable C (in progress)

In Scalable C, we allocate object instances on the heap, then we store those references on
the stack. It is nice and safe. Unless two threads get hold of the same reference. Then we're
back to pain and angst.

There are some system calls that aren't thread safe. One culprit is basename. You just
need to learn these over time, and avoid them.

Don't use static variables inside functions, ever. The static here does not mean
"private," it means "unsafe."

If you need to pass data between threads, use ZeroMQ messages. Do not use shared
mutable state. Do not use locks, mutexes, and so on.

The one exception is in cross-thread layers. We do this in a few cases in CZMQ. Then
we need mutexes. I'm not going to explain how we do this. If you need it, read zsys.c.
Otherwise, please don't.

This code is re-entrant and thread-safe:

int myfunction (int argument)


{
// Each call to myfunction has its own copy and buffer
int copy = argument * 3;
byte *buffer = (byte *) malloc (copy);
return buffer;
}

This code will likely crash if used from several threads:

// The entire process shares the same 'buffer'


byte *buffer = (byte *) malloc (copy);

int myfunction (int argument)


{
// Each call to myfunction shares the same copy
static int copy = argument * 3;
return buffer;
}

Remember this lesson:

A Scalable C developer never shares mutable state between threads.

Problem: my code does not build on system X!

Chapter 2 - The Scalable C Language 59


Scalable C (in progress)

Writing portable code is like not dating crazy people. It sounds boring and pragmatic. A little
insanity is fun, no? Well, no. Pain may be educational, if you can learn to step out of the
experience. Yet if you aren't careful, it will damage you. I'm talking about the way vendors
suck you in with promises and lies, only to trap you and rip you off.

One of C's strengths is its portability, yet vendors keep pushing weird non-portable APIs. I've
been building portable libraries and tools for around 30 years. It is something of a black art,
yet all "black art" means is "lacks documentation."

The payoffs of full portability are worth gold:

You will reach a far wider market for your work, as your code will run on any platform
your clients might use.

You can work with a more diverse crowd of people, rather than appeal only to those
who use a given operating system.

Your code will survive as operating systems die, which happens many times in the life of
good code.

You can work faster and with less stress, as portable code tends to be cleaner and
simpler.

The main rules for building portable code, in any language are:

Isolate all system-specific knowledge in a single layer.

Create portable abstractions that hide system details. Write as much of these yourself
as you need to.

Ban the use of non-portable code in applications.

Solution: create a portability layer and enforce its use.

One benefit of libzmq is that it hides non-portable networking calls under a single standard
API.

CZMQ takes this a step further. It does several things for you:

It pulls in system headers so you don't need to (in include/czmq_prelude.h).

It detects the system type so your portability layer can be smart (in
include/czmq_prelude.h).

It hides differences between systems, e.g. defining macros to hide library dialects. See
include/czmq_prelude.h.

It wraps various system functions in a single API (in the zsys class).

Chapter 2 - The Scalable C Language 60


Scalable C (in progress)

It creates higher level abstractions for non-portable work (the zactor, zbeacon,
zclock, zdir, zfile, ziflist and zuuid classes).

It defines a set of types and macros that you can use in all code: byte, uint, streq
and strneq are the most useful ones. See include/czmq_prelude.h for details.

You should understand and follow these rules:

Only write non-portable code in private classes, so your public API is always 100%
portable.

Build and test your code on at least Linux and Windows, often, to catch portability faults.

Read and take the time to understand include/czmq_prelude.h. It will pay off.

Remember this lesson:

Don't use #ifdefs in your C code to do crazy system stuff. If you have to do this crazy
system stuff, do it in a private class and abstract it away.

Problem: what coding style do I use?


Tastes vary and style is often personal. Yet there are patterns that work well, and those that
don't. I've collected good patterns for years. What follows is my best advice for writing clear,
legible C code.

Compare this chunk of code:

if ( i==0 )
{
printf ( "succeeded" );
}
else
{
if ( i==-1 )
{
printf ( "failed" );
} else {
printf ("uncertain");
}
}

With this one:

Chapter 2 - The Scalable C Language 61


Scalable C (in progress)

if (status == 0)
printf ("succeeded");
else
if (status == 1)
printf ("failed");
else
printf ("uncertain");

Which one is easier to understand? I find it ironic how people will use short useless names
like i and yet waste precious space with parentheses no-one cares about.

Solution: aim above all at readability, and a good signal-to-noise ratio.

Here is my list of recommendations. I'll explain my reason in each case. Often the argument
is "closer to natural language," which means less work to write, and read. This reduces
mistakes.

Do not use "magic numbers" (numeric constants) in code. Numbers say nothing and
create space for mistakes (change in one place, yet not in another). Define constants in
the project headers.

Use all uppercase for macro names, unless they act as functions, in which case use
lowercase. This tells the reader immediately when you're using a constant.

Use all lowercase for variable and function names. It is closer to natural language, and
thus easier to type and read than MixedCase.

Use underscores to separate parts of a name. Again, this is closer to natural language.

Indent four spaces per level, and do not use tabs unless the case demands it (as in
Makefiles). Tabs are a hangover from ancient computers.

Use variable names that explain themselves. Do not use names like i or p. The only
story these tell is "the author was lazy."

Fold long lines at around 80-100 characters. This ensures legibility: our eyes are good
at reading in columns and poor at reading long lines.

Do not enclose single-statement blocks in brackets. This is again for legibility. Single-
statement blocks are more common than you would think. CZMQ has 1,750 if
statements of which over 1,000 have single-statement blocks. It is worth prioritizing
these.

if (comma == NULL)
comma = surname;

Chapter 2 - The Scalable C Language 62


Scalable C (in progress)

In else statements, put the else on a line by itself, and align with the previous if.
Aligns if keywords when selecting between choices.

if (command == CMD_HELLO)
puts ("hello");
else
if (command == CMD_GOODBYE)
puts ("goodbye");
else
if (command == CMD_ERROR) {
puts ("error");
rc = -1;
}

Use while (true), with break statements to write open-ended loops. Avoid
do..while as it's hard to write in a nice way.

while (true) {
zmsg_t *msg = zmsg_recv (pipe);
if (!msg)
break; // Interrupted
// Process incoming message now
}

Use while loops with first/next tests to iterate through lists. You set-up the condition,
enter the loop, and re-test the condition at the block. This creates a consistent style that
is easy to write and read. Consistency means fewer errors.

// Scan a name for commas


char *comma = strchr (surname, ',');
while (comma) {
*comma = ' ';
comma = strchr (surname, ',');
}

// Iterate through a list of objects


s_peer_t *myclass = (s_peer_t *) s_peer_first (myclass);
while (myclass) {
// Do something
myclass = (s_peer_t *) s_peer_next (myclass);
}

Use for (index = 0; index < max; index++) to iterate through arrays. This
creates a consistent style that is easy to write and read. Your brain's pattern matching
sees this as a single pattern. Don't be cute and do more work in the for statement (like
increment other variables). All this does is interfere with that pattern matching.

Chapter 2 - The Scalable C Language 63


Scalable C (in progress)

for (index = 0; index < array_size; index++) {


// Access element [index]
other_var++; // Do this in the body
}

Use blank lines between functions, and to group code into blocks of 6-8 lines if needed.
This matches the natural language pattern of a paragraph. Avoid single lines of code
surrounded by white space unless they must stand out.

Put a blank line after a single-statement if but not after a parenthesis. The parenthesis
already provides white space and you do not want to waste vertical space. Vertical
screen space is always precious.

Do not use extra spacing or tabs (no!) to create vertical alignment. It looks cute yet is
annoying to keep up. Train your brain to pattern match from the left, using consistent
method names.

Follow the English rules for punctuation as far as possible. This is partly to reuse our
English pattern matching, and partly for pragmatic reasons.

Chapter 2 - The Scalable C Language 64


Scalable C (in progress)

// Unary operators stick to their operands


char_nbr++;

// Binary operators have spaces before and after


comma = comma + 1;

// ? and : stick to the left


comma = comma? comma + 1: strchr (name, '.');

// ( ) push inwards like hands


for (char_nbr = 0; *char_nbr; char_nbr++)
char_nbr++;

node = (node_t *) zmalloc (sizeof (node_t));


if (!node)
return -1;

// [ and ] push inwards like awkward hands


comma = name [char_nbr];

// { introduces a multi-statement block


// } gets its own line for vertical alignment
if (condition) {
do first thing
do other thing
}

// -> is glue that creates a longer name


self->name = strdup (name);

// * is a unary operator so sticks right


void *reference = **name;

In conditional code always do the normal flow first, and exception handling last. Resist
the common pattern of checking for failure, then falling through to normal flow. It hides
the critical path from the reader.

Use return at any point to leave a function, if there is no cleanup. This is neater than
trying to collect various exit routes into a single one at the end.

Use goto the end of the function, if you have complex clean-up after an error. You
rarely see this in hand-written code as it usually means a function is too complex. In
generated code, it's more common.

Dialectics
Choosing an Open Source License

Chapter 2 - The Scalable C Language 65


Scalable C (in progress)

There is a lot of debate about open source licenses. It is often uninformed, naive, and
wishful. I'm not blaming people. Copyrights and legal issues aren't fun and we all start with
happy, wrong assumptions.

If you expect people to be "ethical," you will learn disappointment. The license is a tool for
getting certain results. Don't complain if your fork can't cut the meat, or your knife stabs your
tongue. Rather, learn to use a knife and a fork.

If you use a "liberal license" (BSD or MIT/X11), do not expect people to share their forks and
patches. They may. Most will not. The license tells them they do not need to. If you depend
on reciprocity, use a share-alike license.

Solution: learn how licenses work or find someone who knows this.

There are at least five cases to choose from:

You are making private commercial software with the explicit goal of making profits. You
have no intention to build a community. You want every user to pay, in cash or credit. In
that case you use a proprietary license designed by your expensive lawyers. Contact
me if you want expensive help on that.

You are making public software, and want to benefit other public software projects. You
wish to grow a large, strong community. You have no intention of profit-taking. You
prefer to exclude private commercial software projects. In this case you use the GPLv3
license.

You are making public software with the goal of dumping your code into the market. You
have no intention of growing a community. You have no intention of profit-taking. Your
main goal is to hurt competitors. In this case you use the MIT/X11 or BSD license.

You are making public software with the explicit goal of growing a community. You wish
to see your code used as far and wide as possible. You wish to make profits. You want
businesses to use your software and become clients. You want their engineers as
contributors. You want to rope your competitors in as partners. In this case you use the
MPLv2 license.

You are making public software with the goal of huge profits. You expect the
"community" to make your software for you. You wish to see your code used
everywhere. You want to make hundreds of millions in support licenses. You want to
destroy your competitors. In this case you stop taking whatever drugs you're on, and
come back to the Real World.

How to Merge Patches

Chapter 2 - The Scalable C Language 66


Scalable C (in progress)

I'll contrast conventional "pessimistic merging" with "optimistic merging." My strong advice is
to merge as soon as you see a pull request, with optimism. This advice comes from
experience, not wishful thinking.

Conventional merge strategies enforce deliberate, single-threaded, slow thinking. Optimistic


merging allows more casual, concurrent, fast thinking. The results appear to be better.

Standard practice (Pessimistic Merging, or PM) is to wait until continuous integration (CI)
testing clears, then do a code review. One then tests the patch on a branch, and provides
feedback to the author. The author may fix the patch and the test/review cycle starts again.
At this stage the maintainer can (and often does) make value judgments such as "I don't like
how you do this" or "this doesn't fit with our project vision."

In the worst case, patches can wait for weeks or months before a maintainer merges them.
Or they are never accepted. Or, maintainers reject them with various excuses and
argumentation. Or, the author vanishes, leaving the maintainers with a distressing choice.

PM is how most projects work, and I believe most projects get it wrong. Let me start by
listing the problems PM creates:

It tells new contributors, "guilty until proven innocent," a negative message that creates
negative emotions. Contributors who feel unwelcome will always look for alternatives.
Driving away contributors is bad. Making slow, quiet enemies is worse.

It gives maintainers power over new contributors, which many maintainers abuse. This
abuse can be subconscious. Yet it is widespread. Most maintainers strive to remain
important in their project. If they can keep out potential competitors by delaying and
blocking their patches, they will.

It opens the door to discrimination. One can argue, a project belongs to its maintainers,
so they can choose who they want to work with. My response is: projects that are not
inclusive deserve to die, and by competition, will die.

It slows down the learning cycle. Innovation demands rapid experiment-failure-success


cycles. Someone identifies a problem or inefficiency in a product. Someone proposes a
fix. Someone else tests the fix and accepts or rejects it. We have learned something
new. The faster this cycle happens, the faster and accurately the project can move.

It gives outsiders the chance to troll the project. It is as simple as raising an objection to
a new patch. "I don't like this code." Discussions over details can use up much more
effort than writing code. It is far cheaper to attack a patch than to make one. These
economics favor the trolls and punish the honest contributors.

It puts the burden of work on individual contributors, which is ironic and sad for open
source. We want to work together yet we're told to fix our work alone.

Chapter 2 - The Scalable C Language 67


Scalable C (in progress)

Now let's see how this works when we use Optimistic Merge. To start with, understand that
not all patches nor all contributors are the same. We see at least four main cases in our
open source projects:

Good contributors who know the rules and write excellent, perfect patches.

Good contributors who make mistakes, and who write useful yet broken patches.

Mediocre contributors who make patches that no-one notices or cares about.

Trollish contributors who ignore the rules, and who write toxic patches.

PM assumes all patches are toxic until proven good. Whereas in my experience, most
patches tend to be useful, and worth improving. This is easy to measure from git history. In
CZMQ's history, for instance, there are 36 reverts out of 3,200 commits. Most of these are to
fix mistakes, not bad patches.

Let's see how each scenario works, with PM and OM:

PM: depending on unspecified, arbitrary criteria, the merge may be fast, or slow. At
least sometimes, a good contributor will leave with bad feelings. OM: merges are
always fast. Good contributors feel happy and appreciated. They continue to provide
excellent patches as long as they are using the project.

PM: contributor retreats, fixes patch, comes back somewhat humiliated. OM: second
contributor joins in to help first fix their patch. We get a short, happy patch party. New
contributor now has a coach and friend in the project.

PM: we get a flamewar and everyone wonders why the community is so hostile. OM:
the mediocre contributor is largely ignored. If patch needs fixing, it'll happen rapidly.
Contributor loses interest and eventually the patch is reverted.

PM: we get a flamewar which troll wins by sheer force of argument. Community
explodes in fight-or-flee emotions. Bad patches get pushed through. OM: existing
contributor immediately reverts the patch. There is no discussion. Troll may try again,
and eventually may be banned. Toxic patches remain in git history forever.

In each case, OM has a better outcome than PM.

In the majority case (patches that need further work), Optimistic Merge creates the
conditions for mentoring and coaching. And indeed this is what we see in ZeroMQ projects,
and is one of the reasons they are such fun to work on.

For more details, read ZeroMQ RFC 22, C4.1: the Collective Code Construction Contract.

Conclusions
Chapter 2 - The Scalable C Language 68
Scalable C (in progress)

If you read this chapter you are now familiar with the structure and style of a Scalable C
project. Much of the work we do here has been automated. In the next chapter I'll explain the
tool responsible, zproject. Learn this tool, for it is your sorcerer's apprentice.

Chapter 2 - The Scalable C Language 69


Scalable C (in progress)

Chapter 3. Packaging and Binding


In the last chapter I explained a lot of rules and conventions for writing a Scalable C project.
It looks like a lot to remember. The good news is that if we are consistent, it pays off. For
example if we always put our sources into src and our headers into include, it is easier to
reuse build scripts between projects.

Speaking of build scripts...

Problem: Infinite Sucking Pits of Darkness


I'm speaking of Makefiles. Wikipedia tells us Make was invented by Bell Labs in 1976.
Wikipedia lies! The real truth is that Makefiles are digital demon devisements from the
darkest depths of Dis. Some say Bell Labs was the portal through which they clawed their
way into our innocent world. We still don't know their true purpose. All we know is, they are
eternal and cannot be killed. And we know the dying sound our soul makes as it leaves our
body.

Makefiles spawned an entire legion of descendant demons called the autotools. There
are said to be ancient scrolls that provide the incantations to tame autotools demons. A
piece of one landed on my desk. This is what it said:

# Resolve issue #355, "client wants to replace me"


AC_ARG_WITH([pkgconfigdir],
AS_HELP_STRING([--with-pkgconfigdir=PATH],
[Path to the pkgconfig directory [[LIBDIR/pkgconfig]]]),
[pkgconfigdir="$withval"],
[pkgconfigdir='${libdir}/pkgconfig'])
AC_SUBST([pkgconfigdir])

Makefiles and build systems that laughing call themselves "makefile generators" as if that
made things simpler are inevitable. There is no escape to an alternate universe, if you are
writing C code. Oh, please someone tell me how "you need make to reduce build times." I
need a good laugh while Travis CI trundles through a fifteen-minute build, every time I push
a commit to GitHub.

That being said...

Solution: make it someone else's problem.

Chapter 3 - Packaging and Binding 70


Scalable C (in progress)

Happily, this solution actually worked. It comes as close to killing makefiles as possible, after
about 25 years of research. As often, the brilliance and genius comes from the collective
mind.

What my team, at iMatix did, many years ago, was to build a way to generate code from high
level models. We used this a lot and got good at it. Our gsl language makes it possible to
develop DSLs (domain specific languages) quickly, and then build backends that turn these
DSLs into code.

What the ZeroMQ community did, over about two years, was build a DSL for packaging, and
write dozens of backends for it. This tool is called zproject, and it is what I'll explain in this
chapter.

Remember this lesson:

Never give up. If you wait long enough, the ZeroMQ community may solve your
problem for you.

Problem: I don't got zproject


Solution: get it from GitHub.

# Install gsl first


git clone https://github.com/imatix/gsl
cd gsl/src
make -j 4 && sudo make install
cd ..
# Install zproject
git clone https://github.com/zeromq/zproject
cd zproject
./autogen.sh && ./configure
make && make install

Remember this lesson:

Once you go master, you never go back.

Problem: we need an example


The fastest way to learn any new tool is by example. Let's make a minimal project by hand,
then apply magic. Our project is called "Global Domination." Right now version 0.1 is small
and modest. It is a skeleton project that fits the rules of Chapter 2, and does nothing more.

Solution: get the code from GitHub.

Chapter 3 - Packaging and Binding 71


Scalable C (in progress)

git clone https://github.com/scalable-c/globdom


cd globdom
git reset --hard version-0.1
cat README.md

The minimal project contains one empty class, and supporting files:

LICENSE -- MPLv2 license text


README.md -- this file
include/globaldom.h -- project public header
include/gdom_server.h -- Global Domination server API
src/gdom_server.c -- Global Domination server
src/gdom_classes.h -- project private header
src/gdom_selftest.c -- project selftest tool
build.sh -- build and test Global Domination
.gitignore -- tell git what files to ignore

To build and test GlobDom 0.1, run build.sh.

Remember this lesson:

Learn the basics of Bash, it will save your life many times.

Problem: people expect "./configure && make"


It is possible, and I've done this on real projects, to work without Makefiles. You compile
stuff, chuck it into libraries, and link your executables. Yet it bounces off the wall of
expectations. Also, the endless weirdness of the real world. Any real build process gets
complex. And so people turn to Makefile generators, much like the victim of a street mugging
turning to Somali warlords for help.

Let me be frank, for a change. I do not like the GNU build system, even after mastering it. It
uses a flat yet vast macro language to generate Makefiles by sheer brute force. It may be
powerful, yet so are the technicals those Somali warlords like to drive to work. The only good
thing about autotools is that if (and this is a large if) you can master it, or find someone
who's done this, then it is solid.

Happily we figured out how to use autotools' considerable power from a position of blissful
ignorance. Let me show you how.

Solution: tell zproject what your project looks like.

First, create a file project.xml in the globdom root directory, like this:

Chapter 3 - Packaging and Binding 72


Scalable C (in progress)

<project
name = "Global Domination"
script = "zproject.gsl"
prefix = "gdom" >
<use project = "czmq" />
<class name = "gdom_server" />
</project>

This should be self-explanatory. If you've heard bad things about XML, or been traumatized
by it in the past, my sympathies. Give some people a set of hammers, and they think they're
rock star drummers. XML is not a programming language. It is however great for writing
models to generate code from. You'll learn the fun and profit in this.

Second, run the gsl command to build the project model:

globdom> gsl project.xml


GSL/4.1c Copyright (c) 1996-2016 iMatix Corporation
gsl/4 I: Processing project.xml...
gsl/4 M: Building GNU build system (autotools)
gsl/4 M: Building CMake build system (cmake)

And now, the "configure/make" thing works. Run "./autogen.sh" first, as that produces the
configure script:

./autogen.sh
./configure
make -j 4
make check

Remember this lesson:

If everyone expects cake, give them cake.

Problem: sorry, I meant "cmake..."


No problems. As you saw, zproject supports both CMake and the GNU build system. So:

cmake .
make -j 4
make test

Solution: zproject targets both autotools and CMake.

Chapter 3 - Packaging and Binding 73


Scalable C (in progress)

If autotools are the Somali warlords of build systems, then CMake is the Texas politician who
promises wealth and power. CMake still wants your soul, yet it is has much more charisma.
"I can take care of Visual Studio for you!" it says, smiling, playing off our inner fears.

My main gripe with CMake is that it is just a better build scripting system. It doesn't change
the basic fact that I don't want to write build scripts because they're always doing the same
bloody work!

Solution: don't script when you can model.

With zproject we don't write scripts. Instead we document our projects as abstract XML
models. Then, we can run arbitrary backends on this model, each doing what it must. It is a
profound and valuable shift.

Just to finish my complaints about CMake, it lacks a "clean" command. And since it leaves
trash lying all over the place, and since that trash really gets in the way sometimes, this is
bad. The solution people use is to build in a temporary directory. It isn't great.

Still, since we get CMake support for free, why complain.

Remember this lesson:

Even if you hate a particular build system, someone out there is addicted to it.

Problem: what do I add to git?


Ah, our directory is now a mess of different files. We have some made by hand, some
produced by autotools, some by CMake, and some by the compiler and linker.

We cannot add everything to git, because many of these files change every time we
compile, and do not belong in the repository. Yet we need the basic build scripts in git,
otherwise no-one will know how to use our code.

Solution: put the output of zproject in git.

Let's rewind. First, save project.xml. Then reset the clock using git clean. Then run
zproject again:

mv project.xml ..
git clean -d -f -x
mv ../project.xml .
gsl project.xml

Let's look at what zproject actually generated for us. Run git status to see all new and
changed files:

Chapter 3 - Packaging and Binding 74


Scalable C (in progress)

globdom> git status


On branch master
Your branch is up-to-date with 'origin/master'.

Changes not staged for commit:


(use "git add <file>..." to update what will be committed)
(use "git checkout -- <file>..." to discard changes in working directory)

modified: src/gdom_classes.h
modified: src/gdom_selftest.c

Untracked files:
(use "git add <file>..." to include in what will be committed)

CMakeLists.txt
Findczmq.cmake
Findlibsodium.cmake
Findlibzmq.cmake
Makefile.am
autogen.sh
configure.ac
doc/
include/gdom_library.h
include/global_domination.h
project.xml
src/.valgrind.supp
src/Makemodule.am
src/libgdom.pc.in
version.sh

Two of our hand-written files got smashed by generated versions. That's intentional. We
won't modify these by hand ever again, as they track the project. Then we got a lot of new
files, for autotools and for CMake. And then we got a project header called
include/global_domination.h. Cute! But useless! Let's tolerate that for now, and fix it
later.

Add these files to git and commit:

git add .
git commit -m "Problem: git repo doesn't contain build scripts

Solution: add everything that zproject generates"

Now run ./autogen.sh && ./configure && make check again. You should lots of
output, ending like this:

Chapter 3 - Packaging and Binding 75


Scalable C (in progress)

/bin/bash ./libtool --mode=execute ./src/gdom_selftest


Running global domination selftests...
* gdom_server: OK
Tests passed OK
...

Remember this lesson:

When using git clean, save any hand-written files first.

Problem: git status shows lots of junk


Building your project will produce lots of files and directories scattered around. Running git
clean too often is a bad idea, as it will wipe any new files you've written.

Solution: use a more complete .gitignore file.

It gets tedious to write a complete .gitignore file. Happily we have a tool whose intention is
precisely to do the tedious things involved in building Scalable C projects. Here is how we
get a complete .gitignore file:

rm .gitignore
gsl project.xml

You'll see this in the output:

gsl/4 M: Generating initial .gitignore file

Now type git status and the image comes in focus:

git status
On branch master
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git checkout -- <file>..." to discard changes in working directory)

modified: .gitignore

no changes added to commit (use "git add" and/or "git commit -a")

Let's add this and commit:

Chapter 3 - Packaging and Binding 76


Scalable C (in progress)

git add .
git commit -m "Problem: .gitignore needs more beef

Solution: delete and regenerate via zproject"

Remember this lesson:

zproject generates .gitignore for us, if we don't have it.

Problem: I need to make a new class


Global domination is well on the way! Now we'd like to add a client class. We'll offer two
APIs. One is for those who wish to run the GlobDom server in their code. The second is for
those who want to access it, over the network. So let's make a client class. Like our server
class, it'll do nothing, yet. First draw the outline, then fill it in.

Solution: add new classes to project.xml.

Here is how we define the client class in project.xml:

<project
...
<class name = "gdom_client" />
</project>

Then we run gsl project.xml again and see what git status gives us. There's a few
changes to build scripts, then two new files:

Untracked files:
(use "git add <file>..." to include in what will be committed)

include/gdom_client.h
src/gdom_client.c

Take a quick look at these generated files. The include/gdom_client.h header defines
a bare minimum for a typical class:

Chapter 3 - Packaging and Binding 77


Scalable C (in progress)

/* =========================================================================
gdom_client - class description

Copyright (c) the Authors


=========================================================================
*/

#ifndef GDOM_CLIENT_H_INCLUDED
#define GDOM_CLIENT_H_INCLUDED

#ifdef __cplusplus
extern "C" {
#endif

// @interface
// Create a new gdom_client
GDOM_EXPORT gdom_client_t *
gdom_client_new (void);

// Destroy the gdom_client


GDOM_EXPORT void
gdom_client_destroy (gdom_client_t **self_p);

// Print properties of object


GDOM_EXPORT void
gdom_client_print (gdom_client_t *self);

// Self test of this class


GDOM_EXPORT void
gdom_client_test (bool verbose);
// @end

#ifdef __cplusplus
}
#endif

#endif

And src/gdom_client.c implements these three methods with little more than air and
used chewing gum. Let's do the usual sanity check:

make check

Remember this lesson:

zproject generates classes for you if you need them.

Chapter 3 - Packaging and Binding 78


Scalable C (in progress)

Problem: generated sources don't have a


license
These two new files need a license blurb at their start. Now, we could add the license blurb
by hand. zproject does not touch the class header or source once those files exist. Plus,
there are other files that zproject gives us, like gdom_selftest.c, which also need a license
blurb.

Solution: specify a blurb in our project model.

We'll put the blurb into a separate XML file so it doesn't clutter our project model. Create a
file license.xml with this content:

<license>
Copyright (c) the Contributors as noted in the AUTHORS file.
This file is part of Global Domination. Resistance is useless.

This Source Code Form is subject to the terms of the Mozilla Public
License, v. 2.0. If a copy of the MPL was not distributed with this
file, You can obtain one at http://mozilla.org/MPL/2.0/.
</license>

And now include that into your project.xml thus:

<project ...>
<include filename = "license.xml" />
...
</project>

Now, let's check that this works:

rm src/gdom_client.c
rm include/gdom_client.h
gsl project.xml

If gsl complains, fix the XML syntax and try again. Now look at the src/gdom_client.c
file. It should start like this:

Chapter 3 - Packaging and Binding 79


Scalable C (in progress)

/* ===================================================================
gdom_client - class description

Copyright (c) the Contributors as noted in the AUTHORS file.


This file is part of Global Domination. Resistance is useless.

This Source Code Form is subject to the terms of the Mozilla Public
License, v. 2.0. If a copy of the MPL was not distributed with this
file, You can obtain one at http://mozilla.org/MPL/2.0/.
===================================================================

Take a peek at src/gdom_selftest.c and you will see the same blurb.

Remember this lesson:

Stop deleting src/gdom_client.c. From now you, you edit this by hand.

Problem: the client wants to run a server


As luck would have it, Global Domination's sales team already has a potential customer,
who's offering good money for a server. You just need a demo, and the sooner the better.

Solution: write a server program.

What the server program does is create a gdom_server instance, print an optimistic
message, wait for Ctrl-C, and then exit. How hard can it be?

Let's start by adding a 'main' element to project.xml:

<main name = "gdomd">Global Domination Demon</main>

Now regenerate the project with the usual gsl project.xml. You'll notice that zproject
tells you:

gsl/4 M: Generating skeleton for src/gdomd.c

The skeleton src/gdomd.c doesn't do anything useful, or even pretend to. Let's open it up
in an editor and add some body to it:

Chapter 3 - Packaging and Binding 80


Scalable C (in progress)

...
// Insert main code here
if (verbose)
zsys_info ("Welcome to Global Domination v0.1");
zsys_info ("starting Global Domination server...");
gdom_server_t *server = gdom_server_new ();
assert (server);
while (!zsys_interrupted) {
sleep (1);
}
zsys_info ("terminating Global Domination server...");
gdom_server_destroy (&server);
return 0;
}

The zsys_info method is one of a bunch that do system logging. The others are:

zsys_error - log error condition - highest priority.


zsys_warning - log warning condition - high priority.
zsys_notice - log normal condition - normal priority.
zsys_info - log information message - low priority.

Run man zsys to see some of the other methods in the CZMQ zsys class.

Let's build and test our new server:

make
src/gdomd -v

It should say:

gdomd - Global Domination Demon


I: 16-01-20 18:11:07 Welcome to Global Domination v0.1
I: 16-01-20 18:11:07 starting Global Domination server...

Remember this lesson:

Better a hundred small steps than one giant leap.

Problem: update the version number


After such a lot of work we should release a new version. We defined the version number in
our public header file, include/globdom.h. It should be simple to bump it:

Chapter 3 - Packaging and Binding 81


Scalable C (in progress)

Edit include/globdom.h to say #define GLOBDOM_VERSION_MINOR 2


Print the version number from the header:

zsys_info ("Welcome to Global Domination v%d.%d",


GLOBDOM_VERSION_MAJOR, GLOBDOM_VERSION_MINOR);

Run make and... we get compile errors like this:

gdomd.c: In function ‘main’:


gdomd.c:46:21: error: ‘GLOBDOM_VERSION_MAJOR’ undeclared

The reason is that our project header files are confused. We have a mix of hand-written files
(the original include/globdom.h) and generated files doing the same thing. Take a look
at include/gdom_library.h and you'll see it does the same (and more) as we did by
hand.

Solution: define the version and header in our project model.

First, let's fix the header mismatch. Add this header attribute to the project item:

<project
name = "Global Domination"
script = "zproject.gsl"
prefix = "gdom"
header = "globaldom.h">
...

Now delete these two files:

rm include/global_domination.h
rm include/globaldom.h

The first is junk and has to go. The second is our hand-written project header. When we
delete it, and then build the project again, zproject gives us a new skeleton header. It does
nothing except pull in gdom_library.h, which has all of our public API. This split lets us
add hand-written code (to globaldom.h) while also generating the public API (in
gdom_library.h).

It sounds tricky yet once things are working, you can more or less forget about it. Rebuild the
project with gsl project.xml as usual. You should see this output:

gsl/4 M: Generating skeleton for include/globaldom.h

Chapter 3 - Packaging and Binding 82


Scalable C (in progress)

Now look at the version macros in include/gdom_library.h:

#define GDOM_VERSION_MAJOR 0
#define GDOM_VERSION_MINOR 0
#define GDOM_VERSION_PATCH 0

To define a version in the project, we add a 'version' item like this:

<version major = "0" minor = "2" />

Run gsl project.xml again and see how include/gdom_library.h changes. Now
we can fix our main program, and make the project:

zsys_info ("Welcome to Global Domination v%d.%d",


GDOM_VERSION_MAJOR, GDOM_VERSION_MINOR);

make
src/gdomd -v

Remember this lesson:

Simple is hard.

Problem: someone's patch broke my code


Congratulations, you got a contributor! It is a precious thing, someone joining your project.
To criticize their work is clumsy, and foolish. Yet most open source projects treat patches like
lollipops offered by strangers in a park. I've already discussed why optimistic merging is
sane and effective.

Yet, it's nice to give people rapid feedback on their work. It might take us a day to see
someone's patch and spot a mistake. We have computers to do this kind of stuff.

Solution: enable Travis CI on your project.

"CI" is short for "build everything and run make check, and see if it works." I'm sure there are
other CI systems yet Travis wins for being simple, fast, and reliable.

To enable Travis, add this line to your project.xml:

<target name = "travis" />

Chapter 3 - Packaging and Binding 83


Scalable C (in progress)

Run gsl project.xml again and you'll see this output:

gsl/4 I: Processing project.xml...


gsl/4 M: Building GNU build system (autotools)
gsl/4 M: Building CMake build system (cmake)
gsl/4 M: Building Travis CI scripts (travis)
gsl/4 M: Generating skeleton .travis.yml script

In all cases, zproject builds the autotools and cmake targets. We've now added one more
target, travis. The .travis.yml script is where we tell Travis what to do.

Do git status and you'll see two new files:

Untracked files:
.travis.yml
ci_build.sh

The ci_build.sh file is always generated, whereas we will modify .travis.yml by hand, as we
decide to expand our test cover. Here is what the skeleton Travis script looks like:

# Travis CI script
language: c

os:
- linux

sudo: false

env:
- BUILD_TYPE=default
#- BUILD_TYPE=android
#- BUILD_TYPE=check-py
#- BUILD_TYPE=cmake

addons:
apt:
packages:
- valgrind

before_install:
- if [ $TRAVIS_OS_NAME == "osx" ] ; then stuff

# Hand off to generated script for each BUILD_TYPE


script: ./ci_build.sh

You can, if this is your kind of soup, go and learn Travis' scripting language.

Chapter 3 - Packaging and Binding 84


Scalable C (in progress)

Go to travis-ci.org (not .com!).


Click "Log in with GitHub" at the top right.
Authorize Travis for your organizations.
When you return to Travis, click your user icon at the top right.
Click "Sync account" so Travis knows what repositories you have.
Find your Global Domination project.
Click the toggle to enable Travis.

Nothing happens until you push the .travis.yml file to your repository. So:

git add .
git commit -m "Problem: someone's patch broke my code

Solution: enable Travis CI"


git push origin master

Now Travis should start building and testing your project. Every time you push commits, it
will run the .travis.yml script and report any errors.

Remember this lesson:

Travis is there to help you, not add more stress.

Problem: Travis isn't building my project :(


The first time you try to use Travis it's not obvious. Just enabling builds on your repo does
not start a build. So if you carefully pushed your new .travis.yml file, and then enabled builds,
nothing happened.

You need to push a commit before Travis wakes up. You don't need to change anything. Git
lets you create empty commits.

Solution: push an empty commit.

Run these commands:

git commit --allow-empty -m "Problem: Travis isn't building my project

Solution: push an empty commit"


git push origin master

Remember this lesson:

With enough empty commits you can draw pictures on your GitHub profile.

Chapter 3 - Packaging and Binding 85


Scalable C (in progress)

Problem: my client is using Windows


Consider this an opportunity, not a problem. "We need support for Visual Studio 2010," says
your client. "Hmm, that could take a week or more. Is that OK?" you reply. The client doesn't
blink. "We could make it work with VS2015 too, that way you're compatible with the next
Windows 10 service pack," you continue. The client still doesn't blink. "We put our best
people on it... it'll cost more but you'll get it faster." The client finally blinks. "OK, make it
happen," they say.

Solution: zproject does Visual Studio.

Here's a way to see all the targets that zproject supports. This is a growing list:

gsl -target:? project.xml

Here is the kind of thing that zproject will report:

Valid targets are:


android Native shared library for Android
autotools GNU build system
cmake CMake build system
cygwin Cygwin build system
debian packaging for Debian
docker packaging for Docker
java Java JNI binding
java-msvc MSVC builds for Java JNI binding
mingw32 Mingw32 build system
nuget Packaging for NuGet
python Python binding
qml QML binding
qt Qt binding
redhat Packaging for RedHat
ruby Ruby binding
travis Travis CI scripts
vs2008 Microsoft Visual Studio 2008
vs2010 Microsoft Visual Studio 2010
vs2012 Microsoft Visual Studio 2012
vs2013 Microsoft Visual Studio 2013
vs2015 Microsoft Visual Studio 2015

So to support VS2010 and VS2015, we add these two targets to our project.xml:

<target name = "vs2010" />


<target name = "vs2015" />

Chapter 3 - Packaging and Binding 86


Scalable C (in progress)

And then we regenerate the project packaging with gsl project.xml. Here is what
zproject reports now:

globdom> gsl project.xml


GSL/4.1c Copyright (c) 1996-2016 iMatix Corporation
gsl/4 I: Processing project.xml...
gsl/4 M: Building GNU build system (autotools)
gsl/4 M: Building CMake build system (cmake)
gsl/4 M: Building Travis CI scripts (travis)
gsl/4 M: Building Microsoft Visual Studio 2010 (vs2010)
gsl/4 M: Building Microsoft Visual Studio 2015 (vs2015)

You can generate a single target using the -target command line switch. E.g.:

gsl -target:vs2010 project.xml

When you run git status you will now see a new subdirectory called builds. Add this to
your repository, and commit it:

git add builds


git add -u
git commit -m "Problem: my client is using Windows

Solution: add Visual Studio targets"


git push origin master

Your repository now holds project files for two versions of Visual Studio.

Remember this lesson:

zproject supports a lot of targets.

Problem: my client wants a stable release


Hold on to your horses, cowboy! One thing at a time. What the client wants and what the
client needs are often two different things. Don't throw the word "stable" around lightly. It
means different things to different people:

It means stuff that is firm, robust, and won't crash.


It means stuff that is resistant to change.
It means stuff that isn't screamingly insane.
It means stuff you put your horses in.

Chapter 3 - Packaging and Binding 87


Scalable C (in progress)

Our code is definitely one of these, and definitely not one of these. The other two, meh. As a
compromise, let's "tag" the release. This is somewhat like shooting an orange dart, labeled
"Little Buffy", into a wild buffalo. The tag is a short way of saying "that raging mountain of
anger heading straight for you," or as we say in the trade, commit #bbc2c1.

Solution: tag this version.

Here is how we tag the current commit and send that off to GitHub:

git tag -m "This is version 0.2" version-0.2


git push --tags origin master

Here's a subtle yet important detail. We change the version number, then we do work, and
then we tag it. And then we change the version number again. That means after making our
"stable release", we right away update the version number, then regenerate everything, then
push it to GitHub again.

In project.xml:

<version major = "0" minor = "3" />

And then:

gsl project.xml
git add -u
git commit -m "Problem: code version needs updating

Solution: update to version 0.3"


git push origin master

Remember this lesson:

The version in git master is always the next version you intend to release.

Problem: how does my client get a tagged


release?
Solution: use GitHub's release page.

GitHub offers one-click downloads of the tarballs and zip files for any given release of a
project. There really isn't a better way of distributing tarballs.The ZeroMQ project uses a
download server, yet that's an historical artifact, not a preference over GitHub.

Chapter 3 - Packaging and Binding 88


Scalable C (in progress)

If you want a specific tagged release from git, there are two ways. Either way you first clone
the git repository. Then you can do either of these:

git checkout version-0.2


git reset --hard version-0.2

I tend to use the second form, as it leaves me with a clean state. It is exactly as if I rewound
history. The first form leaves the repository in a "detached head" state. Unless you are a
masochist, forget it. The hard reset is clean and effective. You can make commits, and they'll
be in the right place. Just do git pull origin master before sending your commits
back to GitHub.

Remember this lesson:

When it comes to git, ignorance is bliss.

Problem: how do I build my project on


Windows?
If you use Linux in your work, then Windows can be a bit of a mystery. I especially hate the
Visual Studio user interface. It greeted me by asking me to register a new user account. It
then tried to show me "cool" stuff about my project. The only cool stuff in a C project is the
code, which seemed impossible to find. Windows 1.0 introduced flat frames without overlap,
and Visual Studio has been trying to get back there ever since.

Happily you can entirely ignore Visual Studio's aspirations to be an "environment", and treat
it just like any other C/C++ compiler.

Solution: Windows has a command line.

I'll summarize what you need to have, know, and do in order to build your project via the
command line:

You need a PC running Windows 8 or 10 with Internet connection.

You need a Visual Studio compiler. You can use the Community Edition for free.

Ensure the {[programfiles}} environment variable is set properly. Use the command line
(or PowerShell! if you want a Kung Fury-style retro experience).

You need to install git and/or unzip for the command line. I'd recommend to build from
git.

Using git, clone Global Domination plus all dependent projects:

Chapter 3 - Packaging and Binding 89


Scalable C (in progress)

git clone https://github.com/scalable-c/globdom


git clone https://github.com/zeromq/czmq
git clone https://github.com/zeromq/libzmq

Build the projects in order:

cd libzmq\builds\msvc\vs2010
.\build.bat
cd ..\..\..\..
cd czmq\builds\msvc\vs2010
.\build.bat
cd ..\..\..\..
cd globdom\builds\msvc\vs2010
.\build.bat
cd ..\..\..\..

Laugh with glee as you realize you just accomplished months of work in a few hours.

Remember this lesson:

It is already amazing the horse can dance. Don't expect it to also keep a rhythm.

Problem: Java
Ah, thither yon sweet Java! Thy perfumed voice lulls my dreams with paths untold! Let me
count the ways I love thee... OK, done.

The main problem with Java is that people use the language. And not just a few people
either. It is the COBOL of the 21st Century, built by Big Software and sold to managers. If
C++ suffers from "hey, how abstract can we make this stuff before our brains strangle
themselves?" syndrome, then Java suffers from "just one more path, I promise!" pain.

Inevitably, a quiet and yet determined character will approach you on a dark street corner
and offer you money. "Just give me Global Domination in Java," they'll say. "Just once,"
they'll insist, "How bad can it be?"

The answer is, pretty bad. Java took the traditional way of integrating native C libraries into
"higher-level" languages, which is to say "pain and grief." The official name for this torture
chamber is JNI, which stands for "Just Nasty, Innit?" Like many parts of the thankless job of
programming, it seems OK once you're used to it.

Yet writing JNI code is a mind-numbing exercise that belies the delicacy of it. You need all
the grace of an elephant hopping from rock to rock in a molten lava pool. Get it wrong and
badness happens. Here's a slice of JNI:

Chapter 3 - Packaging and Binding 90


Scalable C (in progress)

JNIEXPORT jstring JNICALL


Java_org_zeromq_zyre_Zyre__1_1name (JNIEnv *env, jclass c, jlong self)
{
char *name_ = (char *) zyre_name ((zyre_t *) (intptr_t) self);
jstring return_string_ = (*env)->NewStringUTF (env, name_);
return return_string_;
}

I'm not going to explain this. You can find enough JNI tutorials on-line to decipher it. What
I'm going to do is explain how to generate everything you need to make it Someone Else's
Problem.

Solution: use the 'java' target.

Add this to project.xml:

<target name = "java" />

And then gsl project.xml as usual. You'll see this output from zproject:

gsl/4 M: Building Java JNI binding (java)

And when you type git status, you will only project.xml changed, and nothing else.
There is a new directory bindings/jni and it is empty.

Problem: bindings/jni is empty


To generate a binding we need a little more than just the class name. We need a description
of the methods in the class. We call this the "API model" and it sits in a subdirectory called
"api."

Solution: write an API model for gdom_client.

Create a new file api/gdom_client.xml with this content:

Chapter 3 - Packaging and Binding 91


Scalable C (in progress)

<class name = "gdom_client">


Global Domination client API
<constructor>
Create a new Global Domination client
</constructor>
<destructor>
Destroy a Global Domination client
</destructor>
</class>

And then run gsl project.xml as usual. It looks just like before, yet take a look at
bindings/jni now. It's full of life. Let's quickly add the newly generated files to our
repository:

git add api bindings


git add -u
git status

Chapter 3 - Packaging and Binding 92

You might also like