Mastering Linux Debugging Techniques
Mastering Linux Debugging Techniques
Advanced search
IBM home | Products & services | Support & downloads | My account
Related content:
The following are examples of the types of things you may believe to be true:
Dynamic Probes debugging
facility
● At a certain point in the source code, a variable has a certain value.
● At a given point, a structure has been set up correctly. Linux software debugging
● At a given if-then-else statement, the if part is the path that was executed. with GDB
● When the subroutine is called, the routine receives its parameters correctly.
Subscribe to the
developerWorks newsletter
Finding the bug involves confirming all of these things. If you believe that a certain variable should
have a specific value when a subroutine is called, check it. If you believe that an if construct is Also in the Linux zone:
executed, check it. Usually you will confirm your assumptions, but eventually you will find a case Tutorials
where your belief is wrong. As a result, you will know the location of the bug.
Tools and products
Debugging is something that you cannot avoid. There are many ways to go about debugging, such as Code and components
printing out messages to the screen, using a debugger, or just thinking about the program execution
Articles
and making an educated guess about the problem.
Before you can fix a bug, you must locate its source. For example, with segmentation faults, you need to know on which line of
code the seg fault occurred. Once you find the line of code in question, determine the value of the variables in that method, how the
method was called, and specifically why the error occurred. Using a debugger makes finding all of this information simple. If a
debugger is not available, there are other tools to use. (Note that a debugger may not be available in a production environment, and
the Linux kernel does not have a debugger built in.)
This article looks at a class of problems that can be difficult to find by visually inspecting code, and these problems may occur only
under rare circumstances. Often, a memory error occurs only in a combination
Useful memory and kernel debugging tools
of circumstances, and sometimes you can discover memory bugs only after
There are various ways to track down user-space
you deploy your program.
and kernel problems using debug tools on Linux.
Build and debug your source code with these
Scenario 1: Memory debugging tools tools and techniques:
As the standard programming language on Linux systems, the C language
gives you a great deal of control over dynamic memory allocation. This
User-space tools:
freedom, however, can lead to significant memory management problems, and
these problems can cause programs to crash or degrade over time.
● Memory tools: MEMWATCH and YAMD
● strace
Memory leaks (in which malloc() memory is never released with
● GNU debugger (gdb)
corresponding free() calls) and buffer overruns (writing past memory that
● Magic key sequence
has been allocated for an array, for example) are some of the common
problems and can be difficult to detect. This section looks at a few debugging
Kernel tools:
tools that greatly simplify detecting and isolating memory problems.
int main(void)
{
char *ptr1;
char *ptr2;
ptr1 = malloc(512);
ptr2 = malloc(512);
ptr2 = ptr1;
free(ptr2);
free(ptr1);
}
The code in Listing 1 allocates two 512-byte blocks of memory, and then the pointer to the first block is set to the second block. As
a result, the address of the second block is lost, and there is a memory leak.
test1
gcc -DMEMWATCH -DMW_STDIO test1.c memwatch
c -o test1
When you run the test1 program, it produces a report of leaked memory. Listing 2 shows the example memwatch.log output file.
...
double-free: <4> test1.c(15), 0x80517b4 was freed from test1.c(14)
...
unfreed: <2> test1.c(11), 512 bytes at 0x80519e4
{FE FE FE FE FE FE FE FE FE FE FE FE ..............}
MEMWATCH gives you the actual line that has the problem. If you free an already freed pointer, it tells you. The same goes for
unfreed memory. The section at the end of the log displays statistics, including how much memory was leaked, how much was
used, and the total amount allocated.
YAMD
Written by Nate Eldredge, the YAMD package finds dynamic, memory allocation related problems in C and C++. The latest
version of YAMD at the time of writing this article was 0.32. Download yamd-0.32.tar.gz (see Resources). Execute a make
command to build the program; then execute a make install command to install the program and set up the tool.
Once you have downloaded YAMD, use it on test1.c. Remove the #include memwatch.h and make a small change to the
makefile, as shown below:
YAMD shows that we have already freed the memory, and there is a memory leak. Let's try YAMD on another sample program in
Listing 4.
int main(void)
{
char *ptr1;
char *ptr2;
char *chptr;
int i = 1;
ptr1 = malloc(512);
ptr2 = malloc(512);
chptr = (char *)malloc(512);
for (i; i <= 512; i++) {
chptr[i] = 'S';
}
ptr2 = ptr1;
free(ptr2);
free(ptr1);
free(chptr);
}
./run-yamd /usr/src/test/test2/test2
Listing 5 shows the output from using YAMD on the sample program test2. YAMD tells us that we have an out-of-bounds
condition in the for loop.
MEMWATCH and YAMD are both useful debugging tools and they require different approaches. With MEMWATCH, you need
to add the include file memwatch.h and turn on two compile time flags. YAMD only requires the -g option for the link statement.
Electric Fence
Most Linux distributions include a package for Electric Fence, but you can download it as well. Electric Fence is a malloc()
debugging library written by Bruce Perens. It allocates protected memory just after the memory you allocate. If there is a fencepost
error (running off the end of an array), your program will immediately exit with a protection error. By combining Electric Fence
with gdb, you can track down exactly what line tried to access the protected memory. Electric Fence can detect memory leaks as
another feature.
Listing 6 shows that the ioctl call caused the mkfs program that was used to format a partition to fail. The ioctl
BLKGETSIZE64 is failing. (BLKGET-SIZE64 is defined in the source code that calls ioctl.) The BLKGETSIZE64 ioctl is
being added to all the devices in Linux, and in this case, the logical volume manager does not support it yet. Therefore, the mkfs
code will change to call the older ioctl call if the BLKGETSIZE64 ioctl call fails; this allows mkfs to work with the logical
volume manager.
Start gdb by using the gdb program name command. The gdb will load the executable's symbols and then display an input
prompt to allow you to start using the debugger. There are three ways to view a process with gdb:
● Use the attach command to start viewing an already running process; attach will stop the process.
● Use the run command to execute the program and to start debugging the program at the beginning.
● Look at an existing core file to determine the state the process was in when it terminated. To view a core file, start gdb with
the following command:
gdb programname corefilename
To debug with a core file, you need the program executable and source files, as well as the core file. To start gdb with a core
file, use the -c option:
The gdb shows what line of code caused the program to core dump.
Before you run a program or attach to an already running program, list the source code where you believe the bug is, set
breakpoints, and then start debugging the program. You can view extensive gdb online help and a detailed tutorial by using the
help command.
kgdb
The kgdb program (remote host Linux kernel debugger through gdb) provides a mechanism to debug the Linux kernel using gdb.
The kgdb program is an extension of the kernel that allows you to connect to a machine running the kgdb-extended kernel when
you are running gdb on a remote host machine. You can then break into the kernel, set break points, examine data, and so on
(similar to how you would use gdb on an application program). One of the primary features of this patch is that the remote host
running gdb connects to the target machine (running the kernel to be debugged) during the boot process. This allows you to begin
debugging as early as possible. Note that the patch adds functionality to the Linux kernel so gdb can be used to debug the Linux
kernel.
Two machines are required to use kgdb: one of these is a development machine, and the other is a test machine. A serial line (null-
modem cable) connects the machines through their serial ports. The kernel you want to debug runs on the test machine; gdb runs on
the development machine. The gdb uses the serial line to communicate to the kernel you are debugging.
1. Download the appropriate patch for your version of the Linux kernel.
2. Build your component into the kernel, as this is the easiest way to use kgdb. (Note that there are two ways to build most
components of the kernel, such as a module or directly into the kernel. For example, Journaled File System (JFS) can be
built as a module or directly into the kernel. Using the gdb patch, we can build JFS directly into the kernel.)
4. Create a file called .gdbinit and place it in your kernel source subdirectory (in other words, /usr/src/linux). The file .gdbinit
has the following four lines in it:
❍ set remotebaud 115200
❍ symbol-file vmlinux
❍ set output-radix 16
5. Add the append=gdb line to lilo, which is the boot load used to select which kernel is used to boot the kernel.
❍ image=/boot/bzImage-2.4.17
❍ label=gdb2417
❍ read-only
❍ root=/dev/sda8
Listing 7 is an example of a script that pulls the kernel and modules that you built on your development machine over to the test
machine. You need to change the following items:
Now we are ready to start the gdb program on your development machine by changing to the directory where your kernel source
tree starts. In this example, our kernel source tree is at /usr/src/linux-2.4.17. Type gdb to start the program.
If everything is working, the test machine will stop during boot process. Enter the gdb command cont to continue the boot
process. One common problem is that the null-modem cable could be connected to the wrong serial port. If gdb does not start,
switch the port to the second serial, and this should enable gdb to start.
Listing 9 displays a gdb exception after the mount command to the file system is issued. There are several commands that are
available from kgdb, such as displaying data structures and values of variables and seeing what state all tasks in the system are in,
where they are sleeping, where they are spending CPU, and so on. Listing 9 shows the information that the back trace provides for
this problem; the where command is used to do a back trace, which tells the calls that were executed to get to the stopping place in
your code.
Oops analysis
The Oops (or panic) message contains details of a system failure, such as the contents of CPU registers. With Linux, the traditional
method for debugging a system crash has been to analyze the details of the Oops message sent to the system console at the time of
the crash. Once you capture the details, the message then can be passed to the ksymoops utility, which attempts to convert the code
to instructions and map stack values to kernel symbols. In many cases, this is enough information for you to determine a possible
cause of the failure. Note that the Oops message does not include a core file.
Let's say your system has just created an Oops message. As the author of the code, you want to solve the problem and determine
what has caused the Oops message, or you want to give the developer of the code that has displayed the Oops message the most
information about your problem, so it can be solved in a timely manner. The Oops message is one part of the equation, but it is not
helpful without running it through the ksymoops program. The figure below shows the process of formatting an Oops message.
There are several items that ksymoops needs: Oops message output, the System.map file from the kernel that is running, and
/proc/ksyms, vmlinux, and /proc/modules. There are complete instructions on how to use ksymoops in the kernel source
/usr/src/linux/Documentation/oops-tracing. txt or on the ksymoops man page. Ksymoops disassembles the code section, points to
the failing instruction, and displays a trace section that shows how the code was called.
First, get the Oops message into a file so you can run it through the ksymoops utility. Listing 10 shows the Oops created by the
mount to the JFS file system with the problem that was created by the three lines in Listing 8 that were added to the mount code of
JFS.
objdump jfs_mount.o
00000000 <jfs_mount>:
0:55 push %ebp
...
2c: e8 cf 03 00 00 call 400 <chkSuper>
31: 89 c3 mov %eax,%ebx
33: 58 pop %eax
34: 85 db test %ebx,%ebx
36: 0f 85 55 02 00 00 jne 291 <jfs_mount+0x291>
3c: 8b 2d 00 00 00 00 mov 0x0,%ebp << problem line above
42: 55 push %ebp
kdb
The Linux kernel debugger (kdb) is a patch for the Linux kernel and provides a means of examining kernel memory and data
structures while the system is operational. Note that kdb does not require two machines, but it does not allow you to do source level
debugging like kgdb. You can add additional commands to format and display essential system data structures given an identifier
or address of the data structure. The current command set allows you to control kernel operations, including the following:
● Single-stepping a processor
● Stopping upon execution of a specific instruction
● Stopping upon access (or modification) of a specific virtual memory location
● Stopping upon access to a register in the input-output address space
● Stack back trace for the current active task as well as for all other tasks (by process ID)
● Instruction disassembly
Scenario 4: Getting back trace using magic key sequence Chasing memory overruns
If your keyboard is still functioning and you have a hang on Linux, use the
following method to help resolve the source of the hang problem. With these
You do not want to be in a situation like an
steps, you can display a back trace of the current running process and all
allocation overrun that happens after thousands of
processes using the magic key sequence.
calls.
1. The kernel that you are running must be built with
Our team spent many long hours tracking down
CONFIG_MAGIC_SYS-REQ enabled. You must also be in text-mode.
an odd memory corruption problem. The
CLTR+ALT+F1 will get you into text mode, and CLTR+ALT+F7 will
application worked on our development
get you back to X Windows.
workstation, but it would fail after two million
2. While in text-mode, press <ALT+ScrollLock> followed by
calls to malloc() on the new product
<Ctrl+ScrollLock>. The magic keystrokes will give a stack trace of the
workstation. The real problem was an overrun
currently running processes and all processes, respectively.
back around call one million. The new system
3. Look in your /var/log/messages. If you have everything set up
had the problem because the reserved malloc()
correctly, the system should have converted the symbolic kernel
area was laid out differently, so the offending
addresses for you. The back trace will be written to the
memory was located at a different place and
/var/log/messages file.
destroyed something different when it did the
overrun.
Conclusion
There are many different tools available to help debug programs on Linux.
The tools in this article can help you solve many coding problems. Tools that We solved this problem using many different
show the location of memory leaks, overruns, and the like can solve memory techniques, one using a debugger, another adding
management problems, and I find MEMWATCH and YAMD helpful. tracing to the source code. I started to look at
memory debugging tools about this same time in
Using a Linux kernel patch to allow gdb to work on the Linux kernel helped my career, looking to solve these types of
in solving problems on the filesystem that I work on in Linux. In addition, the problems faster and more efficiently. One of the
strace utility helped determine where a filesystem utility had a failure during a first things I do when starting a new project is to
system call. Next time you are faced with squashing a bug in Linux, try one of run both MEMWATCH and YAMD to see if they
these tools. can point out memory management problems.
● Download YAMD.
● Download ElectricFence.
● Read the article "Linux software debugging with GDB". (developerWorks, February 2001)
This article was published in the August 2002 issue of the developerWorks journal. Request your free copy of the journal.
Killer! (5) Good stuff (4) So-so; not bad (3) Needs work (2) Lame! (1)
Comments?
Submit feedback