Final
Final
Xiang Mei
[email protected]
1 Introduction
Figure 1: My Strace
Strace is a very useful tool on Linux. It’s widely used to perform trou-
bleshooting. But we don’t have preinstalled strace on XV6. I’ll implement
a simple strace on XV6.
It sounds like reinventing a wheel. But for my experience in this course,
Intro to OS, I think writing a simple version of the ”wheel” could help me
to understand the complexity of the ”wheel” and help me think like an
engineer. We need to consider lots of questions during the implementation,
such as ”why should the wheel be round?”. Anyway, I learned a lot and
used some skills I learn from the previous assignment.
The first section is the introduction of the report and I’ll document my
design in the second section. The real task-related parts start from section
3.
1
Also, my strace satisfies all the extra credits, including ”Extra credits:
Formatting more readable output”, ”Extra credits: Combine options”, and
”Extra credits: Implement -c options”.
All explanations and screenshots are in Section 4 Design; The README
file of this assignment is at /README.MD; ”A text that state your partner
or working alone” is at /FYI.MD; and you can find ”Application of strace”
content in section 5.
For tasks 1-2, I choose ”mmap, read, execve, mprotect” as the targets to
explain. We can see the procedure clearly in the above figure. The ”execve”
2
call is called once and that’s the first syscall we called. The ”sleep” is an
executable file in our system and the execve syscall could run it. It’s worthy
to mention the executed binary would use the caller’s memory space and
proc struct. By the way, the execve syscall has three 3 parameters, the first
one is the path of the executable binary. And the second one would store
the arguments while the third one would store the environment parameters.
The mmap syscall is used to allocate a large chunk of memory. In our
command, it’s used to allocate memory to store the shared libraries, the
linker, and other needed files. As we can see in the above figure, mmap is
usually called after openat. The first parameter of mmap syscall is the ad-
dress of the allocated chunk. You can set it NULL to represent an arbitrary
address and we can also set it to a non-NULL value to get a chunk strat
from that address. This is used to allocate more space based on a known
chunk. The second parameter is the length we want to allocate while the
third argument is the permission of the chunk, such as readable, writeable,
and executable.
And the mprotect is used to change the permission of memory. The
mprotect can’t allocate new memory. It could only change the permission of
the memory chunks. In our command, it’s used to make mmaped memory
not writeable. For example, we allocate a chunk for the shared libraries
and the memory must be writeable because we need to copy the bytecodes
to the memory. But you know it’s dangerous to make writeable memory
executable. So we need mprotect to make it un-writeable after copying.
The syscall read is kind of straightforward. It reads the content from
the first parameter’s corresponding file and stores the content in the second
parameter’s correct memory while the third parameter is the max length of
content the read syscall could read. It’s used once to read the header of our
glibc.
So far, we go through the usage and the shown information of strace on
Linux. Strace is a useful tool and I have been using it for a long time but
I still find something new by reading its help page. And we are going to
implement out strace.
3
Figure 3: Usage
4 Design
This section will tell you the reasons for my design. This time, not like
the ”uniq” assignment, our implementation is different from the real sample
because we don’t have pthread syscall in xv6. Also, the strace is a big project
I don’t have much time to read its code. So during the implementation, I
try to solve the problem myself and look up the materials when I don’t have
an elegant solution.
ON/OFF
4
Figure 4: Strace All
Like what shows in the above figure, I set a global variable in ”sysproc.c”
because I’ll later implement a syscall as a bridge to transfer commands from
user space to the kernel.
5
Figure 5: Strace Syscall
As you can see in the above figure, I parse the parameters from the
command line and use strace syscall to modify the kernel variable. Basi-
cally, we can use the code above to control the strace mode from userspace.
Nevertheless, how do we monitor the syscalls?
At very first, a simple plan came up in my mind: I can insert a piece
of code into every syscall so that I can print the result and parameters
when running the syscall. And I use a global variable ”strace all” to tell the
system if we should print the strace info while calling syscalls. But quickly
I found there are some serious issues with this solution. We have 21 syscalls
and if we need to write different strace handle for every syscall and it’s not
convenient because we may need to modify 21 places for every single change.
My experience told me it’s horrible. We must implement something more
elegant.
The advantage of the previous plan is that we can print the arguments
of syscalls. If we need to print the content of the syscall we have to do that
because the syscalls would parse the parameters in these syscall handlers. I
6
asked in slack and found we don’t need to print the details about the syscall
so that we can move the strace code to the syscall interupt handle or the
wrapper function.
In this function, we would parse the EAX which represents the syscall
index, and store the return value in the EAX. So I think this function is a
good candidate to insert our strace code.
7
Figure 8: Monitor
I insert the monitor after the syscall because we need the return value
of the syscall but we may lose the ”exit” syscall because the process would
stop in the ”exit” syscall. So I add the same function before the process
really exits which is shown in figure 8.
The usage of this sub-command is simple. You can just use ”strace on”
to turn on strace and ”strace off” to turn off strace.
There is nothing more about these two sub-commands but there is a little
problem with the global variable. In order for avoiding race conditions, we
need to implement a lock mechanism for this variable. However, this is not
8
the requirement for this assignment and there is only one member on my
team. I decide to implement that only if I have time left.
DUMP
In previous figures, you may notice that I implemented a function to log the
syscall. so why do I implement a such complex function ”add one record”.
For implementing the ”DUMP” feature, we need to allocate a space in
the kernel to store the latest N system calls. These data have to be stored in
the kernel space as a global variable because xv6 is a multi-process system.
And we need a special circle buffer to store the latest N records.
9
Figure 11: Circle Buffer
The circle buffer for DUMP operation is easier than the general circle
buffer. First, I implemented a C code version for testing. As you can see
in the above figure, We read the input to the circle buffer and dump the
content when we get an ”enter”. The read-part is simple and the ”flag”
variable is important. It decides how many and where to dump. The trick
in the code is simple and makes sure we would print the last N records which
are verified by my fuzzer. That’s another reason why I write it in C. Also,
10
I attach my fuzz code:
11
Figure 13: Circle Buffer Fuzzer
That’s the key function of all my designs I’ll mention this function later
to introduce other features. This function is only called in function syscall
and sys exit so that if we want to modify, delete, or add a feature to strace,
we can just modify the code in this function. Another advantage is that we
can naturally combine kinds of options.
12
Figure 14: DUMP after Traing ”grep the readme.md”
I did several tests on DUMP and it works well. I attach a simple sample
above because the output of a complex test case would be big and hard to
recognize. You can also test it with any commands you like and please check
the README.md file attached to the assignment submission.
RUN
13
Figure 15: Supported Opetions
14
Figure 16: Strace a Process
As you can see in the above figure, I add an unsigned integer to the proc
struct. I don’t want to waste unnecessary space in the kernel space. so I
would use this 4 bytes variable to store all the strace information such as
output filter information and output format information. I’ll explain the
struct of this variable in the option-related paragraph.
15
The above figure is my userspace interface and corresponding system
call implementation. As you can see I use the 26th bit of the ”pstrace” to
sign if the kernel should strace the process. Another advantage of having a
variable in the ”proc” struc is that we can easily follow the subprocess by
modifying the ”fork” function in ”proc.c”.
Now we can use strace to run strace any process! There is a simple demo
of ”RUN” sub-command.
Opetions
My strace supports kinds of output filter and format options
which could help to eliminate the uninterested syscalls.
16
I would parse the parameters in the user space and use the strace syscall
to pass the operations to the kernel mode. As we talked about in the previous
paragraph, the atomic unit of strace is the process so we need a variable in
proc struct to tell strace-kernel what should it do. I’ll introduce every bit of
the ”pstrace” variable in the following paragraphs. And you can check the
struct of the variable in the following figure:
Option -e The 0th bit is the inuse-bit of 1-22 bits. For example, if we
don’t run strace with the ”-e” flag the 0th bit is 0, the strace-kernel would
not check the syscall filter.
17
Figure 22: -e Opetion Implementation
The above figure is the user space and the kernel space handler of the
”-e” operation. The userspace handler would use a strace syscall to pass
the filter to the kernel space which is a whitelist of syscalls while the kernel
space would enable the syscall filter and apply the filter.
The above figure is a screenshot of the usage of the ”-e” option and as
you can see, my implementation could naturally handle the combination of
the ”-e” option.
Option -s/f The 23rd and 24th bits of pstrace are designed for the ”-s/f”
flag. The 23rd bit is the inuse-bit of the 24th bit. And if the 24th bit is 0,
the kernel would only record successful syscalls while if the 24th bit is 1, the
kernel would only record failed syscalls.
18
Figure 24: -s/f Opetion Implementation(Handler)
The above figure is the user space and the kernel space handler of the ”-
s/f” operation. The user part would simply parse the arguments and call the
kernel to apply the filter. Also, the kernel-mode would apply the operation
to the pstrace of the proc. And as you can see in the beneath figure we will
check the filter before printing out the syscall in ”add one record”.
19
Figure 26: -s/f Opetion Demo
So far we finish the introduction of filter options (-s/f/e), and we’ll move
to format options in the following paragraphs.
Option -F
20
Figure 27: -F Opetion Implementation(Handler)
For this option, we need to ”pause” the state output until the process
finished its output. So I would store the output in someplace and dump it
after the process exits by reusing partial code of the ”DUMP” sub-command.
This is quite simple and elegant but it has a little flaw: What if the process
class tons of syscall and we could only store the latest N records.
21
Figure 28: -F Opetion Implementation
For this issue, I can just set a variable to monitor the storage and dump
the records if we are in ”-F” mode. But I think this method would break
XV6’s simply so I would rather mention this issue in the README.md.
And the following figure shows the output of the ”-F” option.
Option -c
22
”Options -c in strace will generate a statistic report of system
call regarding the input command such as duration, total call,
failed call. Create a similar report table using option -c.”
The ”-c” would show the metrics of the command. We need to store more
information about the syscalls so that allocating more space is necessary.
For time recording, I would use uptime syscall to calculate the time used for
every syscall. And it’s easy to count the error number and usage number in
function ”add one record”.
23
Figure 31: -C Option Implementation
And just before the exit, the ”metric” function would be triggered and
dump the ”-c” related informations as the following figure.
24
Figure 32: -c Option Demo
25
And it’s great to know ”even” costs such much (33 times more than
reading)!
Option -o
This one is the hardest one for my implementation because I use ”exec”
to run the command rather than the fork. So I need to handle all the things
in the kernel which is not secure. As you see in the above figure, I can easily
finish the userspace interface. Similar to what we learned in ”sh.c”, I just
redirect the stderr to the given file.
26
Figure 35: Format Print in Kernel
But unluckily, we don’t have ”printf” in the kernel to store the data into
stderr. So I need to implement a limited ”printf” to store the output to the
stdin out. I use the read file to achieve that as the above code shows. I
replace all output functions with my ”format print” functions so that ”-o”
could naturally work with other commands and options, which is shown in
the beneath figure.
27
Figure 36: -o Opetion Demo
5 Application to strace
”Write a small program that produces an unexpected be-
havior such as race condition, delay output, crash on condition,
memory leak, or your choice of implementation. Run strace on
this program.”
28
I wrote a simple test case.c to see if there are some race condition prob-
lems without the ”-c” option. And we could clearly see the demo results
in race conditions. It’s unexcepted. The except result should be two same
tables for each subprocess. I’ll fix it in this section.
I used to store the ”-c” option-related data in a global variable. And if
there are two processes using it, the output may be hard to accept because
the first exited process would clean the variable.
How to Fix We could let the process possess the struct so different pro-
cesses would have different variables to store their syscall records. So I
change the global variable to a process-owned variable and disable the in-
terrupts while dumping the output.
As you can see in the above figure, the output keeps the correct order
and the number of syscalls is correct!
29
Figure 40: Same Program on Linux
More Testing Also, I write lots of test cases to test my strace in /Test Log.MD.
30
6 Todo
During the implementation and the testing, I noticed there are several flaws
in my implementation. And I decide to leave these flaws in my code be-
cause I didn’t have time to improve these issues and this is an educational
assignment rather than a product.
Strace RUN For strace run, it’s better to use a fork because for some
options we need to store the data until the process exits. So we need a
parent process to wait for the child to process as a daemon and finish the
output task after the child process exits, as it’s more secure to write a file
in userspace.
-F Option This is a special issue with the -F option which aims at printing
more readable output. To protect the simplicity of the xv6 kernel, I decide
not to implement a mechanism to deal with infinity syscall records. So if
there are more than N records when running the command, I would only
print out the last N records rather than all the records.
7 Structure of Strace
There are two main parts of this implementation: Kernel Space Part and
User Space Part.
In the user space, I use the binary strace as the interface to transfer data
to the kernel. And the main handler is the file ”kernel/proc.c”. I implement
a strace syscall and deal with kinds of parameters.
In the kernel space, I use the metadata got from the strace syscall to
set the output format/filter options. Also, I use the kernel variables to
31
implement ”strace on/off”. And you can check all the supported features in
the above figure.
8 Summary
32
the course and the xv6 could be used to understand really Linux kernel.
33