0% found this document useful (0 votes)
186 views

Linux Device Drivers

linux device drivers

Uploaded by

Rishabh Jain
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
186 views

Linux Device Drivers

linux device drivers

Uploaded by

Rishabh Jain
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Student notes

Linux device drivers

Sample chapter

05. Scheduling and concurrency

Nijmegen, The Netherlands


©

Copyright © AT Computing 2005, 2006


Version: 2c
2 Course "Linux device drivers" — 05. Scheduling and concurrency

Student notes

■ The student-notes in this sample chapter contain fragments from the document
that we deliver with this course.

A very important aspect of driver code (actually of any kernel code) is that it can be
executed by several processes more or less simultaneously. Three possible
coincidences must always be kept in mind:
• We are on a multi-CPU machine and two CPU’s are each running a (different)
process that calls our driver more or less exactly at the same time.
• Hardware interrupts can happen at any time, and start the execution of
interrupt routine code. This code might share variables with the mainline of our
code. The situation is alike for single- and multi-CPU machines.
• Process scheduling can happen at any moment, i.e. even in between two
statements of our driver code. This is so-called ‘‘pre-emption’’; it is not enabled in
all kernels. The situation is alike for single- and multi-CPU machines.
Imagine what happens if our driver code contains the simple piece of code as shown
on the sheet, where the variable involved has a static memory allocation. On a
multi-CPU-machine the danger is obvious. On a single-CPU machine you must
assume that process rescheduling (or execution of interrupt routine code) can
happen at the point just in between the two statements of our example.
The rescheduling itself is not dangerous as such! The danger comes when the newly
scheduled process runs through these same two statements, because the first process
is still in between them, and will resume at that point when it gets our next
scheduling turn.
Admittedly, the chances are very small that a coincidence happens here, but
Murphy’s law is cruel, and in any case you must realise that testing and debugging
on such a timing-dependent situation is absolutely impossible. You get this right by
design, or it will never be right at all.

The safety-implementation that can be found on the sheet illustrates a common


solution. The waiting points in this example mean that the current process gives up
the CPU, and another (any other) process gets scheduled. Although the new process,
because of its arbitrary selection, will usually not call upon our driver at all, the
possibility must be taken into account that it does nevertheless.
Note the two waiting points in this example. The first one is paired with a wakeup at
the exit of the routine, the other one is paired with a wakeup point somewhere else
in the driver, usually in the interrupt routine of this driver and/or in the handling of
the user’s ˆC (and similar) keystrokes and kill-commands.
This safety-implementation is an elementary one. Every call by a user process
results in one action on the device. If you think of printing data on a terminal
©

screen, it is easy to understand the logic behind this.


Course "Linux device drivers" — 05. Scheduling and concurrency 3

Figure 1.
Course "Linux device drivers" — 05. Scheduling and concurrency 4

Student notes
A more sophisticated scheme is usually taken when multiple processes work on a
single device, or even one single process calls upon a single device repeatedly.
Especially if the process has no interest in synchronising after every device call, the
following design speeds up things considerably:

routine–entrypoint()
{
"Front gate": while any other process is already inside
the first part of this routine, *wait* here.
.........
..........
Hook the request onto a request queue.
If the device is not running already, then start it now.
If there are other processes waiting at the front gate,
allow one of them to continue (so it can hook its request too).

*Wait* until our request has been taken off the queue
and has been finished. Usually the device interrupt routine,
which is activated if the current request is finished,
will wake the process that has submitted that request.
Then the interrupt routine takes the next request (if there
is any) off the queue and restarts the device immediately.

"Second gate": while any other process is already inside


the second part of this routine, *wait* here.
..........
..........
We will leave the routine now. If there are other processes
waiting at the second gate, allow one of them to continue.
}

Note that this routine has three waiting points present. Two of them are paired by
wakeups shown here, and one of them is woken up from the interrupt routine. If you
use this construction with the two gates, take care that static variables are not
mangled up by a coincidence of one process running in the first part, and another
process in the second part simultaneously. Separate constructs exist to aid in
protecting such a common-use variable.

In the above example the ‘‘second gate’’ is not always necessary, depending on how
important it is that a process knows about its request being finished. Especially
some output-type requests are not that critical at all and thus leaving the routine
could be done immediately after we have hooked our request onto the queue. In that
©

case, there is no wakeup in the interrupt routine either.


Course "Linux device drivers" — 05. Scheduling and concurrency 5

Figure 2.
Course "Linux device drivers" — 05. Scheduling and concurrency 6

Student notes
For the ‘‘wait at the gate’’ situation of the previous schemes, Linux has a very
effective and simple-to-use mechanism based on a semaphore technique.
You first need a semaphore: a static variable that you must create and initialise
somewhere in your driver (usually in the open routine) as follows:

struct semaphore sem–gate; /* the data structure */


sema–init(&sem–gate, 1); /* the initialisation */

Initialising a semaphore to a value of 1 resembles setting a railroad signal post to


‘‘safe to continue’’.

Following this we need of course two operations to be performed on the semaphore:


changing it from ‘‘safe to continue’’ to ‘‘stop here now’’, and the other way around. In
computer science theory the first operation (commonly called a ‘‘semaphore-P’’) is
well known to be a tricky one, because it must be implemented as an ‘‘atomic test-
and-set’’. It must prevent the possibility that two contestants look at it more or less
at the same moment, conclude both that the current setting is ‘‘safe’’, and then both
go full speed ahead into the critical area where they are not supposed to be at the
same time together. You must also realise that this first operation implicitly has the
possibility of blocking a contestant on the spot, which happens when somebody else
has passed by the semaphore point first. The second contestant must then wait
until his predecessor leaves the critical area. The operation of leaving the area and
resetting the semaphore to ‘‘safe to continue’’ for the followers behind you, is
commonly called a ‘‘semaphore-V’’; it never blocks.
Linux has the following operations available, implemented to take care of all
atomicity needed:

down–interruptible(&sem–gate); /* semaphore−P */
..........
up(&sem–gate); /* semaphore−V */

Actually the down–interruptible is one of a family of similar operations. The one


used here has the advantage that it can be woken up both by its pairing up call, but
also by signals such as ˆC keystrokes from the user. You ‘‘notice’’ the latter case by
looking at the return value:

if (down–interruptible(&sem–gate))
return −ERESTARTSYS;

The ERESTARTSYS tells the higher kernel layers that their call to our driver was
interrupted and cancelled by some external cause. The higher kernel layers may
decide to restart the system call (and thus call our driver again) or to pass the error-
©

condition back to the user level as an EINTR errno-value.


Using an uninterruptible semaphore-test (i.e. the plain down routine) should not be
done light-heartedly. It basically means that the only way to be allowed to proceed is
via your own corresponding up routine. If something goes wrong with the device,
and the process that is currently using it does not relinquish it (e.g. because the
device somehow will not interrupt anymore), all other processes waiting at the
gate(s) will form a rigid queue, absolutely unkillable, and a reboot is the only way
out.
Course "Linux device drivers" — 05. Scheduling and concurrency 7

Figure 3.
Course "Linux device drivers" — 05. Scheduling and concurrency 8

Student notes
The previous schemes use a construct where the device is started (or the request is
queued waiting for the device to get at it), and then the routine gives up the CPU and
‘‘hibernates’’ until the device is done with our request. Waking up is then done from
the device interrupt routine if that one finds out that the device is all done with our
request indeed. This sleep/wakeup mechanism in Linux uses a so called wait queue
data structure. A wait queue is like a bulletin board, on which you can put a note to
say that a certain process is waiting for some future event to happen. The bulletin
board can carry multiple notes if multiple processes are queueing up for a sequence
of (similar) events. If a process puts itself up on the board (queue), that has the side
effect of giving up the CPU and being put in a ‘‘hibernating’’ sleep. At a later point in
time, some piece of kernel code must perform the wakeup-call. It is obvious that
that piece of kernel code must not be part of the hibernating thread. Good examples
are driver interrupt routines (they get activated because the hardware forces them
to), or system calls issued by other processes.
A wakeup-call is not the same as an immediate rescheduling. Wakeup means that
the process(es) listed on a wait-queue are taken off, and their process bookkeeping
will be marked as: ‘‘I hope for a CPU to become available a.s.a.p.’’.

The declaration and initialisation of a wait-queue:

wait–queue–head–t my–waitq; /* the data structure */


init–waitqueue–head(&my–waitq); /* the initialisation */

alternative declaration plus initialisation at compile time:


DECLARE–WAIT–QUEUE–HEAD(my–waitq);

Then the available sleep-operation is as follows:

wait–event–interruptible(my–waitq, condition);

The above has int condition; and a wakeup only has effect if the condition
evaluates to ‘‘true’’ at that very moment.
Some possible wakeup-variations are shown on the sheet. The sync version will not
trigger immediate (re)scheduling, and is used if the current process knows it will
reach another rescheduling point soon. The non-sync version may cause an
immediate scheduling ‘‘on the spot’’, especially in multi-CPU machines.
Both of these wakeup versions also have a version without the ‘‘interruptible’’. But if
you use the interruptible versions of the sleep exclusively (as we recommend), you
better use the equivalent wakeup as well.

When you expect that many sleepers may queue up simultaneously, it usually is not
a good idea to wake them all up simultaneously as well. There are variations of the
©

wakeup-calls exactly for this purpose; they selectively wake up one, or a few sleepers
from the queue, and let the other ones sleep on until some future wakeup can take
care of them. See the Rubini book for more details.
Course "Linux device drivers" — 05. Scheduling and concurrency 9

Figure 4.
Course "Linux device drivers" — 05. Scheduling and concurrency 10

Student notes
A much more difficult situation arises when we have a queue of data items in
between the producer and the consumer. Then we must not protect the data, but the
queue manipulating-operations. This is a very classic pattern in many programming
environments. Not only does it exist in device driver writing, but it also appears e.g.
in the programming of user-level multithreaded applications.

The following three sheets show a step-by-step motivation of how to construct your
producer and consumer code. The last of these three sheets shows the final and
correct approach.
©
Course "Linux device drivers" — 05. Scheduling and concurrency 11

Figure 5.
Course "Linux device drivers" — 05. Scheduling and concurrency 12

Student notes
©
Course "Linux device drivers" — 05. Scheduling and concurrency 13

Figure 6.
Course "Linux device drivers" — 05. Scheduling and concurrency 14

Student notes
The final approach:

producer consumer

for (ever) { for (ever) {


produce data item lock queue−semaphore
lock queue−semaphore while (qcount == 0) {
put data item in queue unlock queue−semaphore **
increment qcount go to sleep **
wakeup consumer lock queue−semaphore
unlock queue−semaphore }
} take data item from queue
decrement qcount
unlock queue−semaphore
consume data item
}

The tricky part is on the consumer side at the ** markers. Assume what happens
when a rescheduling takes place just in between the unlock and the sleep. In
other words: what happens if the producer’s wakeup call sounds while the consumer
is still on its way to the sleeping situation. The wakeup will not have effect since he
is not asleep (yet), but the producer is convinced that the consumer will start
consuming!

Conclusion: the unlock+sleep must be tied together in an atomic way. The Linux
kernel even went one step further: a macro is available that implements the sleep-
statement while keeping an eye on the condition up to the very last moment:

while (some conditional test) {


unlock queue−semaphore **
go to sleep **
lock queue−semaphore
}

The sleep-step in this is implemented in a special macro, with precautions to make


sure that premature wakeup-calls are handled properly. There even is a choice of
two versions, depending on whether we want an interruptible sleep (preferred) or an
uninterruptible one:

wait–event–interruptible(waitq, conditional test);


wait–event(waitq, conditional test);
©

Older kernel code sometimes uses calls to routines add–wait–queue,


set–current–state, schedule() and remove–wait–queue, which together split
this work in more detailed steps.

It is wise to keep the while in this code, to be prepared for the situation that maybe
some other consumer process got woken up as well, and has picked the newly
produced element from the queue before you got your chance.
Course "Linux device drivers" — 05. Scheduling and concurrency 15

Figure 7.
Course "Linux device drivers" — 05. Scheduling and concurrency 16

Student notes
We started this chapter with the following example:

if (my–variable > 0)
my–variable−−; /* will this (n)ever get negative? */

A semaphore can be put around this to prevent simultaneous execution by multiple


processes, but that is an expensive solution (and only permitted in ‘‘current process’’
context, because semaphore-P may sleep). Semaphores are fine to guard big chunks
of code against simultaneous execution, but for just the above case a cheaper
solution is desired. That’s where spin-locks come in.

You may think of spin locks as simplified semaphores, without the built-in process
scheduling. If you have to wait you’ll be put in a tight CPU-loop instead. But you
must wait for somebody else to clear the way. How does he gets his chance to
execute when you are tight-looping the CPU?

There are multiple possibilities:


• The other party runs on another CPU in a multiprocessor system.
• You have burned your time slice and get scheduled out.
• Some interrupt came in that caused pre-emptive scheduling (see below), so you
loose the CPU and the other party can go on.

It must be noted that the 2.6 Linux-kernel does have pre-emptive scheduling
(although this is not always enabled).

A spinlock must be initialised before use. This can be done at runtime (see example
on the sheet) or at once during the declaration:

spinlock–t my–lock = SPIN–LOCK–UNLOCKED;

The operation spin–lock–irqsave obtains the lock and disables interrupts. This
one must be used if (one of) the other contestants is an interrupt routine. The
parameter flags will be filled with information that you must safeguard and give
back later. Although this version is meant for use outside interrupt code, it does no
harm to use it inside. It is just a little inefficient.
The operation spin–unlock–irqrestore is the reverse of spin–lock–irqsave
and must be located in the same routine as the corresponding spin–lock–irqsave.

■ If you forget to implement a spin lock where one should have been, the
debugging may be very difficult. You must get this right by design. If in doubt,
apply the general rule that every global/static variable (except the purely read-
only ones) needs the protection of a spinlock.
©

It is very important to know that when some kernel-code holds a spin-lock,


rescheduling is not allowed on that CPU. Holding a spin-lock disables the pre-
emptive scheduling, so you must keep the timing as short as possible. Long periods
of spin lock holding have a bad influence on the reaction times of your system.
It is equally important to avoid sleeping/rescheduling while holding a spin lock.
Course "Linux device drivers" — 05. Scheduling and concurrency 17

Figure 8.
Course "Linux device drivers" — 05. Scheduling and concurrency 18

Student notes
Sometimes your driver wants a little timeout, e.g. to give the hardware some
breathing pause. The most important decision to take then is: ‘‘is your delay-wish
long enough to justify a rescheduling’’. Once again, keep in mind that a process
rescheduling costs thousands, if not tens of thousands of instructions to be executed.

If you want to reschedule, you may go as follows:

while (your condition not met)


schedule();

This routine just does what its name implies: it activates the scheduler which will
look around for another process to give CPU-capacity. Keep in mind that, if the
machine is not busy at all, your process may be the only candidate and gets the CPU
back immediately. Therefore, the while checking in this example is indispensable.

For shorter delays, a ‘‘busy waiting’’ may be more appropriate:

mdelay(NN); /* milliseconds */
udelay(NN); /* microseconds */
ndelay(NN); /* nanoseconds; 2.6 kernel */

Keep in mind that these routines just ‘‘burn’’ CPU-cycles, so they must be used with
caution. Also, although their names suggest the contrary, they will never be more
precise than the basic precision of your hardware clock can provide (which may be
very unprecise).
©
Course "Linux device drivers" — 05. Scheduling and concurrency 19

Figure 9.
Course "Linux device drivers" — 05. Scheduling and concurrency 20

Student notes
If you want some driver action (function) to be called at a future moment, you can
use a kernel timer function. The basic principle is simple: it lets you piggy-back a
function of your choice onto the activity of the hardware clock interrupt handling.

The first thing you need is of course a separate function in your driver. It runs in
the context of the clock interrupt handling, so you cannot access the current process,
you cannot sleep or reschedule, etc. etc. The function will be called with an
unsigned long as argument. You determine what this will hold, e.g. an index to a
data structure that you have prepared elsewhere in your driver. Be prepared for the
future, when an unsigned long may not be large enough anymore to cast a pointer
into. Obviously the routine does not return any value, because of its asynchronous
nature. If it does produce results, it must store them via pointer references.

The second thing you need is a bookkeeping structure, to hold information for the
clock handler about your routine (find the layout on the sheet). Your driver must
allocate such a structure and initialise it by calling the init–timer function. This
initialisation takes care of the ‘‘do not touch’’ fields in the structure.
When you want to prepare the function for action, fill in its address, its argument
value and the desired starting moment (in jiffies), and then call the add–timer
function. This will hook your structure into the chain of structures that the
hardware clock handler traverses with each clock interrupt (100, 250 or 1000 times
per second). There will be just one single call; it is your own responsibility to redo
the add–timer again for a subsequent call.

If you want to retract a requested timer before it expires, use the del–timer–sync
function. It guarantees that the function itself is not running when it returns; hence
the sync in the name. When the actual expiration has happened already, value 0
will be returned to the caller. A return value of 1 indicates that your function was
still waiting, and you took it away before it got its chance. The function del–timer
does not guarantee anything in case your deletion-request happens to coincide with
an active run of the timer-function itself (possibly on another CPU). Therefore this
function is better not used at all.

In case you repeatedly want to delete a timer followed by a new addition, it is better
to combine these two in a call to the mod–timer function. This function takes
proper care of all race conditions that possibly might exist. Its return value behaves
similar to the return value of the del–timer–sync function.
©
Course "Linux device drivers" — 05. Scheduling and concurrency 21

Figure 10.
Course "Linux device drivers" — 05. Scheduling and concurrency 22

You might also like