Linuxnetworks Book
Linuxnetworks Book
Muñoz
Juanjo Alins
Jorge Mata
Oscar Esparza
Carlos H. Gañán
Linux Networks
Contents
I Linux Essentials 9
1 Introduction to Unix/Linux 11
2 Introduction to Unix/Linux 13
2.1 Introduction to OS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2 Resources Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3 User Interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.4 Implementations and Distros . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3 Processes 17
4 Processes 19
4.1 The man command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.2 Working with the terminal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.3 Listing processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.4 Scripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.5 Running in foreground/background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.6 Signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.7 Job Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.8 Running multiple commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.9 Extra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.9.1 *Priorities: nice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.9.2 *Trap: catching signals in scripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.9.3 *The terminal and its associated signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.9.4 *States of a process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.9.5 Command Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.10 Practices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
5 Filesystem 29
6 Filesystem 31
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
6.2 Basic types of files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
6.3 Hierarchical File Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
6.4 The path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
6.5 Directories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
6.6 Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
6.7 File content . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
6.8 File expansions and quoting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
6.9 Links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
6.10 Text Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
6.11 Commands and applications for text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
6.12 Unix Filesystem permission system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
6.12.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
6.12.2 Permissions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
6.12.3 Change permissions (chmod) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
6.12.4 Default permissions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
6.13 File System Mounting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
6.13.1 Disk usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
6.14 Extra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
6.14.1 *inodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
6.14.2 *A file system inside a regular disk file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
6.15 Command summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
6.16 Practices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
7 File Descriptors 49
8 File Descriptors 51
8.1 File Descriptors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
8.2 Redirecting Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
8.3 Redirecting Input . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
8.4 Unnamed Pipes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
8.5 Named Pipes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
8.6 Dash . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
8.7 Process Substitution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
8.8 Files in Bash . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
8.9 Extra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
8.9.1 Regular Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
8.9.2 tr . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
8.9.3 find . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
8.9.4 *xargs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
8.10 Command summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
8.11 Practices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
II Linux Virtualization 65
9 Introduction to Virtualization 67
10 Introduction to Virtualization 69
10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
10.2 Types of virtualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
10.3 What is UML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
10.4 Practical UML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
10.5 Update and Install Software in the UML system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
10.6 Problems and solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
10.7 Networking with UML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
10.7.1 Virtual switch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
10.7.2 Connecting the Host with the UML guests . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
10.8 Extra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
10.8.1 *Building your UML kernel and filesystem . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
13 Simulation Tools 93
14 Simulation Tools 95
14.1 Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
14.2 A Wrapper for VNUML: simctl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
14.2.1 Profile for simctl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
14.2.2 Simple Example Continued . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
14.2.3 Getting Started with simctl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
14.2.4 Start and Stop Scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
14.2.5 Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
14.2.6 Access to Virtual Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
14.2.7 Network Topology Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
14.2.8 Managing and Executing Labels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
14.2.9 Install Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
14.2.10 Drawbacks of Working with Screen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
16 Introduction 109
16.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
16.1.1 TCP/IP Networking in a Nutshell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
16.1.2 Client/Server Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
16.1.3 TCP/IP Sockets in Unix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
16.2 Basic Network Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
16.2.1 ifconfig . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
16.2.2 netstat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
16.2.3 services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
16.2.4 lsof . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
16.3 ping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
16.4 netcat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
16.5 Sockets with Bash . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
16.6 Commands summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
VI Appendices 209
A Ubuntu in a Pen-drive 211
8
Part I
Linux Essentials
9
Chapter 1
Introduction to Unix/Linux
11
12
Chapter 2
Introduction to Unix/Linux
2.1 Introduction to OS
The main goal of this chapter is understanding the our environment: the Linux Operating System. In short, an Operat-
ing System (OS) is a set of programs whose purpose is to provide a way of managing resources of a computer system
and also to provide an interface for interaction with human beings. Computer resources to be managed by an OS
include CPU, RAM, and I/O devices. These I/O devices include secondary storage devices (HDD, CD, DVD, USB,
etc.) and communication devices like wired networking (Ethernet) or wireless networking (Wifi), etc. From the above
discussion, we can say that two key issues for an OS are:
• Resources management. For example, in all modern computer systems, it is possible to run more than one
process simultaneously. So, the OS is responsible for allocating the CPU execution cycles and the RAM memory
required by each process. An introduction to which is the organization of the Linux OS which allows this OS to
achieve a proper way of managing and accessing system resources is provided in Section 2.2.
• User Interaction. Another essential issue is how the OS allows interaction between the user (human being) and
the computer. This interaction includes operations over the file system (copy, move, delete files), execution of
programs, network configuration, and so on. The two main system interfaces for interaction between the user
and the computer - CLI and GUI - are further discussed in Section 2.3.
13
UNIX
DOS
Mainframe
Multi-process Personal
Multi-user Computer
Terminal Mono-process
Mono-user
Terminal Terminal 80-90s
Today
Linux (Ubuntu, redhat...)
Android Microsoft
FreeBSD Windows
MAC OS (iOS)
Common Hardware
The kernel is the main component of most computer operating systems. It is a bridge between applications and the
actual data processing done at the hardware level. The kernel’s primary function is to manage the computer’s resources
and allow other programs to run and use these resources. Typically, the resources consist of:
• Central Processing Unit (CPU). This is the most central part of a computer system, responsible for running or
executing programs on it. The kernel takes responsibility for deciding at any time which of the many running
programs should be allocated to the processor or processors (each of which can usually run only one program at
a time).
• Memory. Memory is used to store both program instructions and data. Typically, both need to be present in
memory in order for a program to execute. Often multiple programs will want access to memory, frequently
demanding more memory than the computer has available. The kernel is responsible for deciding which memory
each process can use, and determining what to do when not enough is available.
• Input/Output (I/O) Devices. I/O devices present in the computer, such as keyboard, mouse, disk drives, print-
ers, displays, etc. The kernel allocates requests from applications to perform I/O to an appropriate device (or
subsection of a device, in the case of files on a disk or windows on a display) and provides convenient methods
for using the device (typically abstracted to the point where the application does not need to know implementa-
tion details of the device).
The kernel typically makes these facilities available to application processes through system calls (see Figure 2.3).
A system call defines how a program requests a service from an operating system’s kernel. This may include hard-
ware related services (for e.g. accessing the Hard Disk), creating and executing new processes, and communicating
with integral kernel services (like scheduling). System calls provide the interface between a process and the operating
system. Most operations interacting with the system require permissions not available to a user level process, e.g. I/O
performed with a device present on the system, or any form of communication with other processes requires the use
of system calls.
The Linux Kernel a monolithic hybrid kernel. Drivers and kernel extensions typically execute in a privileged zone
known as “Ring 0” (see again Figure 2.2), with unrestricted access to hardware. Unlike traditional monolithic kernels,
14
Hardware
Kernel (ring0)
User's Processes
System Calls
Kernel Modules
drivers and kernel extensions can be loaded and unloaded easily as modules. Modules are pieces of code that can be
loaded and unloaded into the kernel upon demand. They extend the functionality of the kernel without the need to
reboot the system. For example, one type of module is the device driver, which allows the kernel to access hardware
connected to the system. Without modules, we would have to build monolithic kernels and add new functionality
directly into the kernel image. Besides having larger kernels, this has the disadvantage of requiring us to rebuild and
reboot the kernel every time we want new functionality. Modules or drivers interact with devices like hard disks,
printers, network cards etc. and provide system calls. Modules can be statically compiled with the kernel or they can
be dynamically loaded. The commands related with modules are lsmod, insmod and modprobe.
Finally, we will discuss the Linux Booting Sequence. To start a Linux system (see Figure 2.4), we need to follow
a startup sequence in which the flow control goes from the BIOS to the boot manager 2 and finally to the Linux-core
(kernel).
As mentioned, Unix-like systems are multiprocess and to manage the processes the kernel starts a scheduler and
executes the initialization program init. init sets the user environment and allows user interaction and the log in, then
2 The boot loader stage is not absolutely necessary. Certain BIOS can load and pass control to Linux without the use of the loader. Each boot
process will be different depending on the architecture processor and the BIOS.
15
Boot Loader Init
BIOS Kernel
(Grub) (PID=1)
the kernel keeps itself idle until it is called. In summary, after you boot Linux:
• The kernel starts a scheduler and executes the initialization program init.
• Once init has started it can create other processes. In particular, init sets the user environment and allows user
interaction and the log in, then the kernel keeps itself idle until it is called.
• The processes can be parents and children and both at the same time.
We must remark that in current systems, we have another interface called Graphical User Interface or GUI. Of
course, we still have the CLI (Command Line Interface) interface. In general, CLI gives you more control and flex-
ibility than GUI but GUIs are easier to use. GUI requires a graphical server, called X server in UNIX. Processes
launched from the GUI are typically applications that have graphical I/O. For this graphical I/O, we can use a mouse,
a keyboard, a touch screen, etc. On the other hand, when we exclusively use the CLI, it is not mandatory to have an X
server running in the system. Processes launched from the CLI are commands that have only textual I/O. The I/O of
16
the textual commands is related or “attached” to a terminal and the terminal is also attached to a shell. Nowadays we
have three types of terminals: Physical terminals, Virtual Consoles and Terminal Emulators. Physical terminals are
not very common today, but all the Linux systems implement virtual consoles and terminal emulators (see Figure 2.6).
Terminal Emulator
Virtual Console or pseudo-terminal
<Ctrl><Alt><F1> /dev/tty1 xterm (classical from X)
<Ctrl><Alt><F2> /dev/tty2 gnome-terminal (GNOME)
… konsole (KDE)
<Ctrl><Alt><F6> /dev/tty6 ...
getty
Keyboard Keyboard X protocol /dev/pts/X
command command
X xterm Bash
Display Display
results login text Bash results
/dev/ttyX
X server: <Ctrl><Alt><F7>
(a) Virtual Consoles (b) Terminal Emulators
If we use virtual consoles to interact with our Linux system, we do not require to start an X server within the
system. Virtual consoles just manage text and they can emulate the behavior of several physical terminals. As in
general, the user only has a single screen and keyboard, the different virtual consoles are accessed using different key
combinations. By default, when Linux boots it starts six virtual consoles which can be accessed with CRL+ALT+F1
... CRL+ALT+F6. As shown in Figure 2.6(a), there is a process called getty which manages the first stage of the
virtual console. Once you log into the system (you enter your user name login and password), the management is
now passed to a process called login, which starts a shell (Bash in our case). The communication between the virtual
console and the shell is not performed using any physical media but using a special file: /dev/ttyX (where X is the
number of virtual console).
On the other hand, if our Linux system has an X graphical server running, users can also access the system through
a GUI. In fact, GUI is the default interface with the system for most desktop Linux distributions. To access the login
manager to the X server, you must type CRL+ALT+F7 in the majority of the Linux systems. Once you log into the
GUI, you can start a terminal emulator (see Figure 2.6(b)). Terminal emulators are also called pseudo-terminals. To
start a pseudo-terminal you can use ALT + F2 and then type gnome-terminal or xterm. You can also use the main
menu: MENU-> Accessories-> Terminal. This will open a gnome-terminal, which is the default terminal emulator
in our Linux distribution (Ubuntu). As you can see in Figure 2.6(b), the terminal emulator communicates with the
shell (Bash) also using a file /dev/pts/X (where X is the number of pseudo-terminal). In addition, the pseudo-terminal
receives and sends data to the X server using the X protocol.
Regarding the shell, this documentation refers only to Bash (Bourne Again Shell). This is because this shell is the
most widely used one in Linux and includes a complete structured programming language and a variety of internal
functions.
Note. When you open a terminal emulator or log into a virtual console, you will see a line of text that ends
with the dollar sign “$“ and a blinking cursor. Throughout this document, when using “$“ will mean that
we have a terminal that is open and it is ready to receive commands.
17
2.4 Implementations and Distros
UNIX is now more a philosophy than a particular OS. UNIX has led to many implementations, that is to say, different
UNIX-style operating systems. Some of these implementations are supported/developed by private companies like
Solaris of Sun/Oracle, AIX of IBM, SCO of Unisys, IRIX of SGI or Mac OS of Apple, Android of Google, etc. Other
implementations of Unix as “ Linux” or “FreeBSD” are not commercial and they are supported/developed by the open
source community.
In the context of Linux, we also have several distributions, which are specific forms of packaging and distributing
Linux (Kernel) and its applications. These distributions include Debian, Red Hat and some derivate of these like
Fedora, Ubuntu, Linux Mint, SuSE, etc.
18
Chapter 3
Processes
19
20
Chapter 4
Processes
$ man ps
If you type this, you will get the manual for the command “ps”. Once in the help environment, you can use the
arrow keys or AvPag/RePaq to go up and down. To search for text xxx, you can type /xxx. Then, to go to the next
and previous matches you can press keys n and p respectively. Finally, you can use q to exit the manual.
$ ps
PID TTY TIME CMD
21380 pts/3 00:00:00 bash
21426 pts/3 00:00:00 ps
21
• The second column is the associated terminal1 . In the previous example is the pseudo-terminal 3. A question
mark (?) in this column means that the process is not associated with a terminal.
• The third column shows the total amount of time that the process has been running.
In the example above, we see that two processes have been launched from the terminal: the bash, which is the
shell, and command ps itself. Figure 4.1 shows a scheme of the relationships of all the processes involved when we
type a command in a pseudo-terminal.
On the other hand, the command ps accepts parameters. For example the parameter -u reports the processes
launched by a user. Therefore, if you type:
$ ps -u user1
We obtain a list of all the processes owned by user1. Next, there is a summary of some relevant parameters of
the ps command (for further information, you can type man ps to view the ps manual):
Example:
$ ps -Ao pid,ppid,state,tname,%cpu,%mem,time,cmd
The preceding command shows the process PID, the PID of parent process, the state of the process, the associated
terminal, the % of CPU and memory consumed by the process, the accumulated CPU time consumed and the command
that was used to launch the process.
On the other hand, the pstree command displays all the system processes within a tree showing the relationships
between processes. The root of the tree is either init or the process with the given PID.
The top command returns a list of processes in a similar way as ps does, except that the information displayed
is updated periodically so we can see the evolution of a process’ state. top also shows additional information such
as memory space occupied by the processes, the memory space occupied by the exchange partition or swap, the total
number of tasks or processes currently running, the number of users, the percentage processor usage, etc.
Finally, the time command gives us the duration of execution of a particular command. Example:
1 The terminal can also be viewed with the tty command.
22
$ time ps
PID TTY TIME CMD
7333 pts/0 00:00:00 bash
8037 pts/0 00:00:00 ps
real 0m0.025s
user 0m0.016s
sys 0m0.012s
Real refers to actual elapsed time; User and Sys refer to CPU time used only by the process.
• Real is wall clock time - time from start to finish of the call. This is all elapsed time including time slices used
by other processes and time the process spends blocked (for example if it is waiting for I/O to complete).
• User is the amount of CPU time spent in user-mode code (outside the kernel) within the process. This is only
actual CPU time used in executing the process. Other processes and time the process spends blocked do not
count towards this figure.
• Sys is the amount of CPU time spent in the kernel within the process. This means executing CPU time spent in
system calls within the kernel, as opposed to library code, which is still running in user-space. Like ’user’, this
is only CPU time used by the process.
Notice that User+Sys will tell you how much actual CPU time your process used. This is across all CPUs, so if
the process has multiple threads it could potentially exceed the wall clock time reported by Real.
4.4 Scripts
Normally shells are interactive. This means that the shell accepts commands from you (via keyboard) and executes
them. But if you use command one by one, then you can store this sequence of commands into a text file and tell the
shell to execute this text file instead of entering the commands. This is know as a shell script. Another way of defining
a shell script is just as a series of command written in plain text file.
Why to write shell scripts?
• Shell script can take input from user, file and output them on screen.
• Useful to create our own commands.
• Save lots of time.
• To automate some task of day today life.
• System Administration part can be also automated.
Example:
ps
sleep 2
pstree
The previous script command executes a ps and then after approximately 2 seconds (sleep makes us wait 2 sec-
onds) a pstree command.
To write down a script you can use a text editor (like gedit). To run a script you must give it execution
permissions ($ chmod u+x myscript.sh) and then execute it ($ ./myscript.sh).
23
Another example script is the classical “Hello world” script.
#!/bin/bash
# Our first script, Hello world!
echo Hello world
As you can observe, the script begins with a line that starts with “#!”. This is the path to the shell that will execute
the script. In general, you can ignore this line (as we did in our previous example) and then the script is executed by
the default shell. However, it is considered good practice to always include it in scripts. In our case, we will build
scripts always for the bash.
As you can see to write to the terminal we can use the echo command. Finally, you can also observe that text
lines (except the first one) after the sign “#” are interpreted as comments.
Next, we assign execution permission and execute the script:
Finally, in a script you can also read text typed by the user in the terminal. To do so, you can use the read
command. For example, try:
#!/bin/bash
echo Please, type a word and hit ENTER
read WORD
echo You typed $WORD
Using the first script, notice that the bash clones itself to run the commands within the script.
$ sleep 5
The command simply makes us wait 5 seconds and then terminates. While running a command in foreground we
cannot run any other commands.
Bash also let us run commands non-interactively or in background. To do so, we must use the ampersand symbol
(&) at the end of the command line. Example:
$ sleep 5 &
Whether a process is running in foreground or background, its output goes to its attached terminal. However, a
process cannot use the input of its attached terminal while in background.
4.6 Signals
A signal is a limited form of inter-process communication. Essentially it is an asynchronous notification sent to a
process. A signal is actually an integer and when a signal is sent to a process, the kernel interrupts the process’s
normal flow of execution and executes the corresponding signal handler. The kill command can be used to send
signals 2 .
2 More technically, the command kill is a wrapper around the system call kill(), which can send signals to processes or groups of processes
in the system, referenced by their process IDs (PIDs) or process group IDs (PGIDs).
24
The default signal that the kill command sends is the termination signal (SIGTERM), which asks the process for
releasing its resources and exit. The integer and name of signals may change between different implementations of
Unix. Usually, the SIGKILL signal is the number 9 and SIGTERM is 15. In Linux, the most widely used signals and
their corresponding integers are:
• 2 SIGINT. Is the same signal that occurs when an user in an interactive press Control-C to request termina-
tion.
The kill command syntax is: kill -signal PID. You can use both the number and the name of the signal:
$ kill -9 30497
$ kill -SIGKILL 30497
In general, signals can be intercepted by processes, that is, processes can provide a special treatment for the
received signal. However, SIGKILL and SIGSTOP are special signals that cannot be captured, that is to say, that are
only seen by the kernel. This provides a safe mechanism to control the execution of processes. SIGKILL ends the
process and SIGSTOP pauses it until a SIGCONT is received.
Unix has also security mechanisms to prevent an unauthorized users from finalizing other user processes. Basically,
a process cannot send a signal to another process if both processes do not belong to the same user. Obviously, the
exception is the user root (superuser), who can send signals to any process in the system.
Finally, another interesting command is killall. This command is used to terminate execution of processes by
name. This is useful when you have multiple instances of a running program.
$ xeyes &
$ xeyes &
$ xclock &
$
The previous commands run 2 processes or instances of the application xeyes and an instance of xclock. To
see their JIDs, type:
$ jobs
[1] Running xeyes &
[2]- Running xeyes &
[3]+ Running xclock &
$
25
In this case we can observe that the JIDs are 1, 2 and 3 respectively. Using the JID, we can make a job to run in
foreground. The command is fg. The following command brings to foreground job 2.
$ fg 2
xeyes
On the other hand, the combination of keys Control-z sends a stop signal (SIGSTOP) to the process that is running
on foreground. Following the example above, we had a job xeyes in the foreground, so if you type control-z the
process will be stopped.
$ fg 2
xeyes
^Z
[2]+ Stopped xeyes
$ jobs
[1] Running xeyes &
[2]- Running xclock &
[3]+ Stopped xeyes
$
To resume the process that we just stopped, type the command bg:
$ bg
[2]+ xeyes &
$
In general, typping the JID after the command bg will send the process identified by it to background (in the
previous case the command bg 2 could be used as well).
The JID can also be used with the command kill. To do this, we must write a % sign right before the JID to
differentiate it from a PID. For example, we could terminate the job “1” using the command:
$ kill -s SIGTERM %1
Another very common shortcut is Control-c and it is used to send a signal to terminate (SIGINT) the process that
is running on foreground. Example:
$ fg 3
xclock
^C
[1] Terminated xeyes
Notice that whenever a new process is run in background, the bash provides us the JID and the PID:
$ xeyes &
[1] 25647
26
$ ps -ñ
$ echo $?
$ ps
$ echo $?
$ command1 & command2 & ... using & one or more commands will run in background.
$ command1 && command2 && ... command2 is executed if and only if command1 exits successfully (exit
status is 0) and so on.
$ command1 || command2 || ... command2 is executed if command1 fails execution. That is to say, if
command1 has an exit status > 0 or if command1 has not been executed.
4.9 Extra
4.9.1 *Priorities: nice
Each Unix process has a priority level ranging from -20 (highest priority) to 19 (lowest priority). A low priority means
that the process will run more slowly or that the process will receive less CPU cycles.
The top command can be used to easily change the priority of running processes. To do this, press ”r“ and enter
the PID of the process that you want change it’s priority. Then, type the new level of priority. We must take into
consideration that only the superuser ”root“ can assign negative priority values.
You can also use nice and renice instead of top. Examples.
27
$ tty
/dev/pts/1
$ echo $PPID
11587
$ xeyes &
[1] 11646
$ ps
PID TTY TIME CMD
11592 pts/1 00:00:00 bash
11646 pts/1 00:00:02 xeyes
11706 pts/1 00:00:00 ps
If you close the terminal from the GUI, or if you type the following from another terminal:
• Ready (R) - A process is ready to run. Just waiting for receiving CPU cycles.
• Running (O) - Only one of the ready processes may be running at a time (for uniprocessor machine).
• Suspended (S) - A process is suspended if it is waiting for something to happen, for instance, if it is waiting for
a signal from hardware. When the reason for suspension goes away, the process becomes Ready.
• Stopped (T) - A stopped process is also outside the allocation of CPU, but not because it is suspended waiting
for some event.
• Zombie (Z) - A zombie process or defunct is a process that has completed execution but still has an entry in the
process table. Entries in the process table allow a parent to end its children correctly. Usually the parent receives
a SIGCHLD signal indicating that one of its children has died. Zombies running for too long may point out a
problem in the parent source code, since the parent is not correctly finishing its children.
Note. A zombie process is not like an “orphan” process. An orphan process is a process that has lost its father
during its execution. When processes are “ orphaned”, they are adopted by “ Init.”
28
Table 4.1: Summary of commands for process management.
man is the system manual page.
ps displays information about a selection of the active processes.
tty view the associated terminal.
pstree shows running processes as a tree.
top provides a dynamic real-time view of running processes.
time provides us with the duration of execution of a particular command.
sleep do not participate in CPU scheduling for a certain amount of time.
echo write to the terminal.
read read from the terminal.
jobs command to see the jobs launched from a shell.
bg and fg command to set a process in background or foreground.
kill command to send signals.
killall command to send signals by name.
nice and renice command to adjust niceness, which modifies CPU scheduling.
trap process a signal.
nohup command to create a process which is independent from the father.
4.10 Practices
Exercise 4.1– In this exercise you will practice with process execution and signals.
1. Open a pseudo-terminal and execute the command to see the manual of ps. Once in the manual of the ps command,
search and count the number of times that appears the pattern ppid.
2. Within the same pseudo-terminal, execute ps with the appropriate parameters in order to show the PID, the terminal
and the command of the currently active processes that have been executed from the terminal. Do the same in the
second virtual console.
$ ps -o pid,comm
$ ps -o pid,cmd
4. Use the pstree command to see the process tree of the system. Which process is the father of pstree? and its
grandfather? and who are the rest of its relatives?
5. Open a gnome-terminal and then open a new “TAB” typing CRL+SHIFT+t. Now open another gnome-terminal
in a new window. Using pstree, you have to comment the relationships between the processes related to the
terminals that we opened.
6. Type ALT+F2 and then xterm. Notice that this sequence opens another type of terminal. Repeat the same
sequence to open a new xterm. Now, view the process tree and comment the differences with respect to the
results of the previous case of gnome terminals.
7. Open three gnome-terminals. These will be noted as t1, t2 and t3. Then, type the following:
Comment what you see and also which is the type of execution (foreground/background) on each terminal.
8. For each process of the previous applications (xeyes and xclock), try to find out the PID, the execution state,
the tty and the parent PID (PPID). To do so, use the third terminal (t3).
29
9. Using the third terminal (t3), send a signal to terminate the process xeyes.
10. Type exit in the terminal t2. After that, find out who is the parent process of xclock.
11. Now send a signal to kill the process xclock using its PID.
12. Execute an xclock in foreground in the first terminal t1.
13. Send a signal from the third terminal to stop the process xclock and then send another signal to let this process
to continue executing. Is the process executing in foreground or in background? Finally, send a signal to terminate
the xclock process.
14. Using the job control, repeat the same steps as before, that is, executing xclock in foreground and then stopping,
resuming and killing. List the commands and the key combinations you have used.
15. Execute the following commands in a pseudo-terminal:
$ xclock &
$ xclock &
$ xeyes &
$ xeyes &
Using the job control set the first xclock in foreground. Then place it back in background. Kill by name the two
xclock processes and then the xeyes processes. List the commands and the key combinations you have used.
16. Create a command line using execution of multiple commands that shows the processes launched from the termi-
nal, then waits for 3 seconds and finally shows again the processes launched from the terminal.
17. Create a command line using execution of multiple commands that shows the processes launched from terminal
but this execution has to be the result of an erroneous execution of a previous ps command.
18. Discuss the results of the following multiple command executions:
$ sleep || sleep || ls
$ sleep && sleep --help || ls && ps
$ sleep && sleep --help || ls || ps
Exercise 4.2– (*) This exercise deals with additional aspects about processes.
1. Type a command to execute an xeyes application in background with “niceness” (priority) equal to 18. Then,
type a command to view the command, the PID and the priority of the xeyes process that you have just executed.
2. Create a script that asks for a number and displays the number multiplied by 7. Note. If you use the variable VAR
to read, you can use $[VAR * 7] to display its multiplication.
3. Add signal managment to the previous script so that when the USR1 signal is received, the script prints the sentence
“waiting operand”. Try to send the USR1 signal to the clone Bash executing the script. Tip: to figure out the PID
of the proper Bash, initially launch the script in background.
30
Chapter 5
Filesystem
31
32
Chapter 6
Filesystem
6.1 Introduction
File Systems (FS) define how information is stored in data units like hard drives, tapes, dvds, pens, etc. The base of a
FS is the file.
There’s a saying in the Linux world that “everything is a file” (a comment attributed to Ken Thompson, the
developer of UNIX). That includes directories. Directories are just files with lists of files inside them. All these files
and directories are organized into a hierarchical file system, starting from the root directory and branching out. On the
other hand, files must have a name, which is tight to the following rules:
• The file names are case sentive, that is to say, characters in uppercase and lowercase are distinct.
• Example: letter.txt, Letter.txt or letter.Txt do not represent the same files.
The simplest example of file is that used to store data like images, text, documents, etc. The different FS tech-
nologies are implemented in the kernel or in external modules. FS define how the kernel is going to manage the files:
which meta-data we are going to use, how the file is going to be accessed for read/write, etc.
Examples of Disk File Systems (DFS) are reiserFS, ext2, ext3, ext4. These are developed within the Unix envi-
ronment for hard drives, pen drives or UDF (universal disk format) for DVD. In Windows environments we have other
DFS like fat16, fat32 and ntfs.
The command stat can be used to discover the basic type of a file.
33
stat /etc/services
File: ‘/etc/services’
Size: 19281 Blocks: 40 IO Block: 4096 regular file
Device: 801h/2049d Inode: 3932364 Links: 1
Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2012-09-24 22:06:53.249357692 +0200
Modify: 2012-02-13 19:33:04.000000000 +0100
Change: 2012-05-11 12:03:24.782168392 +0200
Birth: -
Unix uses the abstraction of “file” for many purposes and thus, this is a fundamental concept in Unix systems. This
type of abstraction allows using the API of files for devices like for example a printer. In this case, the API of files is
used for writing into the file that represents the printer, which indeed means printing. Another example of special file
is the symbolic link, which we are going to discuss later.
34
6.4 The path
As previously mentioned, in Unix-like systems the filesystem has a root denoted as /. All the files on the system
are named taking the root as reference. In general, the path defines how to reach a file in the system. For instance,
/usr/share/doc points out that the file doc (which is a directory) is inside the directory share which is inside the
directory usr, which is under the root of the filesystem /.
/
usr/
share/
doc/
We have three basic commands to move arround the FS and list its contents:
With commands related to the filesystem you can use absolute and relative names for the files:
• Absolute path. An absolute path always takes the root / of the filesystem as starting point. Thus, we need to
provide the full path from the root to the file. Example: /usr/local/bin.
• Relative path. A relative path provides the name of a file taking the current working directory as starting point.
For relative paths we use . (the dot) and .. (the two dots). Examples:
./Desktop or for short Desktop (the ./ can be omitted). This is names a file called Desktop inside the
current directory.
./../../etc or for short ../../etc. This names the file (directory) etc, which is located two directories
up in the FS.
Finally, the special character ~ (ALT GR+4) can used as the name of your “home directory” (typically /home/username).
Recall that your home directory is the area of the FS in which you can store your files. Examples:
$ ls /usr
bin games include lib local sbin share src
$ cd /
$ pwd
/
$ cd /usr/local
$ pwd
/usr/local
$ cd bin
$ pwd
/usr/local/bin
$ cd /usr
$ cd ./local/bin
35
$ pwd
/usr/local/bin
$ cd ../../share/doc
$ pwd
/usr/share/doc
$ ls ~
Downloads Videos Desktop Music
6.5 Directories
In a FS, files and directories can be created, deleted, moved and copied. To create a directory, we can use mkdir:
$ cd ~
$ mkdir myfolder
This will create a directory called ”myfolder” inside the working directory. If we want to delete it, we can use
rmdir.
$ rmdir ~/myfolder
The previous command will fail if the folder is not empty (contains some file or directory). There are two ways to
proceed:
(1) Delete the content and then the directory or
(2) Force a recursive removal using rm -rf:
$ rm -rf myfolder
$ rm -f -r myfolder
Where:
$ mkdir folder1
$ mkdir folder2
$ mv folder2 folder1
$ mv folder1 directory1
Finally, to copy folder contents to other place within the file system the cp may be used using the ”-r” modifier
(recursive). Example:
$ cd directory1
$ mkdir folder3
$ cd ..
$ cp -r directory1 directory2
36
6.6 Files
The easiest way to create a file is using touch:
$ touch test.txt
$ rm test.txt
Logically, if a file which is not in the working directory has to be removed, the complete path must be the argument
of rm:
$ rm /home/user1/test.txt
In order to move or rename files we can use the mv command. Moving the file test.txt to the Desktop on the home
directory might look like this:
$ mv test.txt ~/Desktop/
In case a name is specified in the destination, the resulting file will be renamed:
$ mv test.txt Desktop/test2.txt
$ mv test.txt test2.txt
The copy command works similar to mv but the origin will not disappear after the copy operation:
$ cp test.txt test2.txt
Example:
$ file /etc/services
/etc/services: ASCII English text
37
6.8 File expansions and quoting
Bash provides us with some special characters that can be used to name groups of files. This special characters have a
special behavior called “filename expansion” when used as names of files. We have several expansions:
Character Meaning
* Expands zero or more characters (any character).
? Expands one character.
[ ] Expands one of the characters inside [ ].
!( ) Expands not the file expansion inside ( ).
Examples:
These special characters for filename expansions can be disabled with quoting:
Character Action
’ (simple quote) All characters between simple quotes are interpreted
without any special meaning.
“ (double quotes) Special characters are ignored except $, ‘ ’ and \
\ (backslash) The special meaning of the character that follows is ignored.
Example:
$ rm "hello?" # Removes a single file called hello? but not, for example,
# a file called hello1.
6.9 Links
A link is a special file type which points to another file. Links can be hard or symbolic (soft).
38
The ln command is used to create links. If the -s option is passed as argument, the link will be symbolic.
Examples:
$ ln -s /etc/passwd ~/hello
$ cat ~/hello
The previous commands create a symbolic link to /etc/passwd and print the contents of the link.
$ cd
$ cp /etc/passwd .
$ ln passwd hello
$ rm hello
$ cd
$ cp /etc/passwd hello
Later, the ASCII table was expanded to 8 bits (a byte). Examples of 8-bit ASCII codification are:
a: 0110 0001
A: 0100 0001
As you may observe, to build the 8-bit codification, the 7-bit codification was maintained just putting a 0 before the
7-bit word. For those words whose codification started with 1, several specific encodings per language appeared. These
codifications were defined in the ISO/IEC 8859 standard. This standard defines several 8-bit character encodings. The
series of standards consists of numbered parts, such as ISO/IEC 8859-1, ISO/IEC 8859-2, etc. There are 15 parts. For
instance, ISO/IEC 8859-1 is for Latin languages and includes Spanish or ISO/IEC 8859-7 is for Latin/Greek alphabet.
An example of 8859-1 codification is the following:
Nowadays, we have other types of encodings. The most remarkable is UTF-8 (UCS Transformation Format 8-bits),
which is the default text encoding used in Linux.
UFT-8 defines a variable length universal character encoding. In UTF-8 characters range from one byte to four
bytes. UTF-8 matches up for the first 7 bits of the ASCII table, and then is able to encode up to 231 characters
unambiguously (universally). Example:
ç (UTF8): Oxc3a7
39
Finally, we must take into account the problem of newlines. A new line, line break or end-of-line (EOL) is a
special character or sequence of characters signifying the end of a line of text.
Systems based on ASCII or a compatible character set use either LF (Line feed, ”\n“, 0x0A, 10 in decimal) or CR
(Carriage return, ”\r“, 0x0D, 13 in decimal) individually, or CR followed by LF (CR+LF, ”\r\n“, 0x0D0A).
The actual codes representing a newline vary across operating systems:
• CR+LF: Microsoft Windows, DEC TOPS-10, RT-11 and most other early non-Unix and non-IBM OSes, CP/M,
MP/M, DOS (MS-DOS, PC-DOS, etc.), Atari TOS, OS/2, Symbian OS, Palm OS.
• CR:Commodore 8-bit machines, Acorn BBC, TRS-80, Apple II family, Mac OS up to version 9 and OS-9.
• LF: Multics, Unix and Unix-like systems (GNU/Linux, AIX, Xenix, Mac OS X, FreeBSD, etc.), BeOS, Amiga,
RISC OS, Android and others.
The different codifications for the newline can be a problem when exchanging data between systems with different
representations. If for example you open a text file from a windows-like system inside a unix-like system you will
need to either convert the newline encoding or use a text editor able of detecting the different formats (like gedit).
$ vi myfile.txt
The previous command puts vi in command mode to edit myfile.txt. In this mode, we can navigate through
myfile.txt and quit by typing :q. If we want to edit the file, we have to press “i”, which puts vi in insertion mode.
After modifying the document, we can hit ESC to go back to command mode (default one). To save the file we must
type :wq and to quit without saving, we must force the exit by typing :q!.
On the other hand, there are also other commands to view text files. These commands are cat, more and less.
Note: you should not use the previous commands to view executable or binary data files because these may contain
non printable characters. The less command works in the same way as man does. Try:
$ cat /etc/passwd
$ more /etc/passwd
$ less /etc/passwd
Another couple of useful commands are head and tail, which respectively, show us the text lines at the top of
the file or at the bottom of the file.
$ head /etc/passwd
$ tail -3 /etc/passwd
1 There are other command-line text editors like nano, joe, etc.
40
A very interesting option of tail is -f, which outputs appended data as the file grows. Example:
$ tail -f /var/log/syslog
If we have a binary file, we can use hexdump or od to see its contents in hexadecimal and also in other formats.
Another useful command is string, which will find and show characters or groups of characters (strings) contained
in a binary file. Try:
$ hexdump /bin/ls
$ strings /bin/ls
$ cat /bin/ls
There are control characters in the ASCII tables that can be present in binary files but that should never appear in
a text file. If we accidentally use cat over a binary file, the prompt may turn into a strange state. To exit this state,
you must type reset and hit ENTER.
Other very useful commands are those that allow us to search for a pattern within a file. This is the purpose of the
grep command. The first argument of grep is a pattern and the second is a file. Example:
Another interesting command is cut. This command can be used to split the content of a text line using a specified
delimiter. Examples:
$ cat /etc/passwd
$ cut -c 1-4 /etc/passwd
$ cut -d ":" -f 1,4 /etc/passwd
$ cut -d ":" -f 1-4 /etc/passwd
6.12.2 Permissions
Linux FS provides us with the ability of having a strict control of files and directories. To this respect, we can control
which users and which operations are allowed over certain files or directories. To do so, the basic mechanism (despite
there are more mechanisms available) is the Unix Filesystem permission system. This system controls the ability of
the users affected to view or make changes to the contents of the filesystem.
There are three specific permissions on Unix-like systems:
• The read permission, which grants the ability to read a file. When set for a directory, this permission grants the
ability to read the names of files in the directory (but not to find out any further information about them such as
contents, file type, size, ownership, permissions, etc.)
41
• The write permission, which grants the ability to modify a file. When set for a directory, this permission grants
the ability to modify entries in the directory. This includes creating files, deleting files, and renaming files.
• The execute permission, which grants the ability to execute a file. This permission must be set for executable
binaries (for example, a compiled C++ program) or shell scripts in order to allow the operating system to run
them. When set for a directory, this permission grants the ability to traverse its tree in order to access files or
subdirectories, but not see the content of files inside the directory (unless read is set).
When a permission is not set, the rights it would grant are denied. Unlike other systems, permissions on a Unix-
like system are not inherited. Files created within a directory will not necessarily have the same permissions as that
directory.
On the other hand, from the point of view of a file or directory, the user is in one of the three following categories
or classes:
2. Group Class. The user belongs to the group of the file or directory.
The most common form of showing permissions is symbolic notation. The following are some examples of
symbolic notation:
-rwxr-xr-x for a regular file whose user class has full permissions and whose group and others classes have only
the read and execute permissions.
dr-x------ for a directory whose user class has read and execute permissions and whose group and others classes
have no permissions.
lrw-rw-r-- for a link special file whose user and group classes have the read and write permissions and whose
others class has only the read permission.
$ls -l /usr
total 188
drwxr-xr-x 2 root root 69632 2011-08-23 18:39 bin
drwxr-xr-x 2 root root 4096 2011-04-26 00:57 games
drwxr-xr-x 41 root root 4096 2011-06-04 02:32 include
drwxr-xr-x 251 root root 69632 2011-08-20 17:59 lib
drwxr-xr-x 3 root root 4096 2011-04-26 00:56 lib64
drwxr-xr-x 10 root root 4096 2011-04-26 00:50 local
drwxr-xr-x 9 root root 4096 2011-06-04 04:11 NX
drwxr-xr-x 2 root root 12288 2011-08-23 18:39 sbin
drwxr-xr-x 370 root root 12288 2011-08-08 08:28 share
drwxrwsr-x 11 root src 4096 2011-08-20 17:59 src
42
User Type u user
g group
o other
Operation + Add permission
- Remove permission
= Assign permission
Permissions r reading
w writing
x execution
Another way of managing permissions is to use octal notation. With three-digit octal notation, each numeral
represents a different component of the permission set: user class, group class, and "others" class respectively. Each
of these digits is the sum of its component bits. Here is a summary of the meanings for individual octal digit values:
0 --- no permission
1 --x execute
2 -w- write
3 -wx write and execute
4 r-- read
5 r-x read and execute
6 rw- read and write
7 rwx read, write and execute
Next, we provide some examples in which we illustrate the different ways of using the chmod command. For
instance, assigning read and execute permissions to the group class of the file temp.txt can be achieved by the following
command:
chmod g+rx temp.txt
Changing the permissions of the file file1.c for the user user1 only to read its contents can be achieved by:
chmod u=r file1.c
For file file1.c, we want read permission assignment for all users, write permission only for the owner and
execution just for the users within the group. Then, the numeric values are:
r_user+r_group+r_other+w_user+x_group=400+40+4+200+10=654
Thus, the command should be:
chmod 654 file1.c
$ umask
0022
$ umask 0044
The two permissions that can be used by default are read and write but not execute. The mask tells us in fact which
permission is subtracted (i.e. it is not granted).
43
6.13 File System Mounting
Storage Devices
In UNIX systems, the kernel automatically detects and maps storage devices in the /dev directory. The name that
identifies a storage device follows the following rules:
So this, if a CD-ROM or DVD is plugged to IDE bus/connector 1 master device, Linux will show it as hdc.
3. If there is a SCSI or SATA controller these devices are listed as devices sda, sdb, sdc, sdd, sde, sdf, and sdg in
the /dev directory. Similarly, partitions on these disks can range from 1 to 16 and are also in the /dev directory.
Note. You can run command fdisk -l to display list of partitions. Example:
# fdisk -l /dev/sdb
Mounting a Filesystem
When a Linux/UNIX system boots, the Kernel requires a “mounted root filesystem”. In the most simplest case, which
is when the system has only one storage device, the Kernel has to identify which is the device that contains the
filesystem (the root / and all its subdirectories) and then, the Kernel has to make this device “usable“. In UNIX, this is
called ”mounting the filesystem”.
For example, if we have a single SATA disk with a single partition, it will be named as /dev/sda1. In this case, we
say that the device ”/dev/sda1“ mounts ”/“. The root of the filesystem ”/“ is called the mount point.
If our disk has a second partition, the Kernel will map it in /dev/sda2. Then, we can ”mount“ some part of our
filesystem in this second partition. For example, let us consider that we want to store the /home directory in this second
partition of sda. The command to mount a storage device under a mount point is mount. In the previous example:
In this example, /dev/sda2 is the device and /home is the mount point. The mounting point can be defined as the
directory under which the contents of a storage device can be accessed.
Linux can also mount WINDOWS filesystems such as fat32:
This will mount a vfat file system (Windows 95, 98, XP) from the first partition of the hdd device in the mount
point /mnt/windows.
44
Unmounting a Filesystem
After using a certain device, we can also ”unmount” it. For example, if the file system of the pen-drive mapped in
/dev/sdc1 is mounted under the directory or mount point /media/pen, then any operation on /media/pen will
act in fact over the FS of the pen-drive. When we finish working with the pen-drive, it is important to ”unmount” the
device before extracting it. This is because unmouting gives the opportunity to the OS of finishing all I/O pending
operations. This can be achieved with the command umount using the mount point or the device name2 :
# umount /media/pen
or
# umount /dev/sdc1
Furthermore, when /media/pen is unmounted, then all I/O operations over /media/pen are not performed
over the pen anymore. Instead, these operations are performed over device that is currently mounting the directory,
usually the system’s hard disk that is mounting the root.
Note. It is not possible to unmount an ”busy” (in use) file system. A file system is busy if there is a process using
a file or a directory of the mounted FS.
/etc/fstab
Notice that while we can use the command mount once the system is running, we need some way of specifying which
storage devices and which mount points are used during booting. On the other hand, only the root user can mount
filesystems as this is a risky operation, so we also need a way of defining mount points for unprivileged users. This is
necessary for instance, for using storage devices like pen-drives or DVDs.
The fstab file provides a solution to the previous issues. The fstab file lists all available disks and disk partitions,
and indicates how they are to be initialized or otherwise integrated into the overall system’s file system. fstab is still
used for basic system configuration, notably of a system’s main hard drive and startup file system, but for other uses
(like pen-drives) has been superseded in recent years by “automatic mounting”.
The fstab file is most commonly used by the mount command, which reads the fstab file to determine which options
should be used when mounting the specified device. It is the duty of the system administrator to properly create and
maintain this file.
An example of an entry in the /etc/fstab file is the following:
The 1st and 2nd columns are the device and default mount point. The 3rd column is the filesystem type. The
4th column are mount options and finally, the 5th and 6th columns are options for the dump and fsck applications.
The 5th column is used by dump to decide if a filesystem should be backed up. If it’s zero, dump will ignore that
filesystem. This column is zero many times. The 6th column is a fsck option. fsck looks at the number in the 6th
column to determine in which order the filesystems should be checked. If it’s zero, fsck won’t check the filesystem.
$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 108G 41G 61G 41% /
none 1,5G 728K 1,5G 1% /dev
2 Today you can also unmount a mount point using the menus of the GUI.
45
none 1,5G 6,1M 1,5G 1% /dev/shm
none 1,5G 116K 1,5G 1% /var/run
none 1,5G 0 1,5G 0% /var/lock
$ du -sh /etc/apache2/
464K /etc/apache2/
6.14 Extra
6.14.1 *inodes
The inode (index node) is a fundamental concept in the Linux and UNIX filesystem. Each object in the filesystem is
represented by an inode. Each and every file under Linux (and UNIX) has following attributes:
$ ls -i /etc/passwd
32820 /etc/passwd
You can also use stat command to find out inode number and its attribute:
$ stat /etc/passwd
File: ‘/etc/passwd’
Size: 1988 Blocks: 8 IO Block: 4096 regular file
Device: 341h/833d Inode: 32820 Links: 1
Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2005-11-10 01:26:01.000000000 +0530
Modify: 2005-10-27 13:26:56.000000000 +0530
Change: 2005-10-27 13:26:56.000000000 +0530
46
Many commands often give inode numbers to designate a file. Let us see he practical application of inode number.
Let us try to delete file using inode number.
Create a hard to delete file name:
$ cd /tmp
$ touch "\+Xy \+\8"
$ ls
$ rm \+Xy \+\8
Remove file by an inode number, but first find out the file inode number:
$ ls -il
The rm command cannot directly remove a file by its inode number, but we can use find command to delete file
by inode.
$ rm "\+Xy \+\8"
If you have file like name like name “2011/8/31” then no UNIX or Linux command can delete this file by name.
Only method to delete such file is delete file by an inode number. Linux or UNIX never allows creating filename like
this but if you are using NFS from MAC OS or Windows then it is possible to create a such file.
Next we show in a more step by step fashion, how the previous commands work. In fact, what we use is a loopback
device. A more detailed explanation and a more extended set of commands is the following:
• Create a 30MB disk file (zero-filled) called virtualfs in the root (/) directory:
dd bs=1M if=/dev/zero of=virtualfs count=30
• Confirm that the current system is not using any loopback devices.
$ losetup /dev/loop0
Replace /dev/loop0 with /dev/loop1, /dev/loop2, etc, until a free Linux loopback device is found.
• Attach the loopback device (/dev/loop0) with regular disk file (/virtualfs):
$ losetup /dev/loop0 /virtualfs
47
• Confirm that the previous step is completed successfully:
$ echo $?
• We are going to create a Linux EXT3 file system on the loopback device that is currently associated with a
regular disk file.
$ mkfs.ext3 /dev/loop0
• Mount the loopback device (regular disk file) to /mnt/vfs as a “regular” Linux ext3 file system.
$ mount -t ext3 /dev/loop0 /mnt/vfs
Now, all the Linux file system-related commands can be act on this “new” file system.
For example, you can type df -h to confirm its “disk usage“.
6.16 Practices
Exercise 6.1– This exercise is related to the Linux filesystem and its basic permission system.
1. Open a terminal and navigate to your home directory (type cd ~ or simply cd). Then, type the command that
using a relative path changes your location into the directory /etc.
3. Once at your home directory, type a command to copy the file /etc/passwd in your working directory using only
relative paths.
4. Create six directories named: dirA1, dirA2, dirB1, dirB2, dirC1 and dirC2 inside your home directory.
5. Write two different commands to delete the directories dirA1, dirA2, dirB1 and dirB2 but not dirC1 or dirC2.
8. Type a command for viewing text to display the contents of the file, which obviously must be empty.
9. Type a command to display the file metadata and properties (creation date, modification date, last access date, inode
etc.).
10. What kind of content is shown for the temp? and what kind basic file is?
48
Table 6.1: FS
stat shows file metadata.
file guess file contents.
mount mounts a file system.
umount unmounts a file system.
fdisk show information of a file system.
df display the amount of available disk space in complete filesystems.
du file space used under a particular directory or files on a file system.
cd changes working directory.
ls lists a directory.
pwd prints working directory.
mkdir makes a directory.
rmdir removes a directory.
rm removes .
mv moves a file or folder.
cp copies a file or folder.
touch updates temporal stamps and creates files.
ln creates hard and soft links.
gedit graphical application to edit text.
vi application to edit text from the terminal.
cat shows text files.
more shows text files with paging.
less shows text files like man.
head prints the top lines of a text file.
tail prints the bottom lines of a text file.
hexdump and od shows file data in hex and other.
strings looks for character strings in binary files.
grep prints lines of a file matching a pattern.
cut print cuts of a file.
chmod changes file permissions (or control mask).
umask shows/sets default control mask for new files.
11. Change to your working directory. From there, type a command to try to copy the file temp to the /usr directory.
What happened and why?
12. Create a directory called practices inside your home. Inside practices, create two directories called
with_permission and without_permission. Then, remove your own permission to write into the direc-
tory without_permission.
13. Try to copy the temp file to the directories with_permission and without_permission. Explain what
has happened in each case and why.
14. Figure out which is the minimum set of permissions (read, write, execute) that the owner has to have to execute
the following commands:
Exercise 6.2– This exercise presents practices about text files and special files.
1. Create a file called orig.txt with the touch command and use the command ln to create a symbolic link to orig.txt
called link.txt.
2. Open the vi text editor and modify the file orig.txt entering some text.
3. Use the command cat to view link.txt. What can you observe? why?.
49
4. Repeat previous two steps but this time modifying first the link.txt file and then viewing the orig.txt file. Discuss
the results.
5. Remove all permissions from orig.txt and try to modify the link.txt file. What happened?
6. Give back the write permission to orig.txt. Then, try to remove the write permission to link.txt. Type ls -l and
discuss the results.
7. Delete the file orig.txt and try to display the contents of link.txt with the cat command. Then, edit the file with vi
and enter some text in link.txt. What has happened in each case?
8. Use the command stat to see the number of links that orig.txt and link.txt have.
9. Now create a hard link for the orig.txt file called hard.txt. Then, using the command stat figure out the number
of “Links” of orig.txt and hard.txt.
10. Delete the file orig.txt and try to modify with vi hard.txt. What happened?
11. Use the grep command to find all the information about the HTTP protocol present in the file /etc/services
(remember that Unix commands are case-sensitive).
12. Use the cut command over the file /etc/group to display the groups of the system and its first five members.
13. Create an empty file called text1.txt. Use text editor vi abn to introduce “abñ” in the file, save and exit. Type a
command to figure out the type of content of the file.
14. Search in the Web the hexadecimal encoding of the letter “ñ” in ISO-8859-15 and UTF8. Use the command
hexdump to view the content in hexadecimal of text1.txt. Which encoding have you found?
15. Find out what the character is “0x0a“, which also appears in the file.
16. Open the gedit text editor and type ”abñ“. Go to the menu and use the option “Save As” to save the file with the
name text2.txt and “Line Ending” type Windows. Again with the hexdump examine the contents of the file. Find
out which is the character encoded as “0x0d”.
$ hexdump text2.txt
0000000 6261 b1c3 0a0d
0000006
17. Explain the different types of line breaks for Unix (new Mac), Windows and classical Mac.
18. Open the gedit text editor and type ”abñ“. Go to the menu and use the option “Save As” to save the file with the
name text3.txt and “Character Encoding” ISO-8859-15. Recheck the contents of the text file with hexdump and
discuss the results.
50
Chapter 7
File Descriptors
51
52
Chapter 8
File Descriptors
When a command is executed using a shell (like Bash) it inherits the 3 standard file descriptors from this shell.
Let’s open a terminal and discover which are these three “mysterious” file descriptors. First, let’s discover the PID of
the bash:
$ ps
PID TTY TIME CMD
14283 pts/3 00:00:00 bash
14303 pts/3 00:00:00 ps
The command lsof (”list open files“) will help us to find which are these files:
1 In Unix-like systems, file descriptors can refer to regular files or directories, but also to block or character devices (also called “special files”),
sockets, FIFOs (also called named pipes), or unnamed pipes. In what follows we will explain which are these other types of files.
53
$ lsof -a -p 14283 -d0-10
COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
bash 7988 student1 0r CHR 136,3 5 /dev/pts/3
bash 7988 student1 1w CHR 136,3 5 /dev/pts/3
bash 7988 student1 2w CHR 136,3 5 /dev/pts/3
As we can see in the column for FD, the file descriptors for 0,1 and 2 are connected to the file /dev/pts/3. The fd=0
is opened for reading while fd=1,2 are opened for writing. This “mysterious” file /dev/pts/3 is just the file associated
with the gnome pseudo-terminal2 .
The “tty” file and the file descriptors 0,1 and 2 will be used to send/receive data from/to a textual terminal or
pseudo-terminal (see Figure 8.1). We can find the “tty” file with the tty command and find that it is a special file with
the file command:
$ tty
/dev/pts/3
$ file /dev/pts/3
/dev/pts/3: character special
$ ls -l /dev/pts/3 # Another way of see that /dev/pts/3 is a special file
crw--w---- 1 student1 tty 136, 3 2011-03-03 15:18 /dev/pts/3
Another interesting command is the command fuser. Among other functionality, fuser shows the PID of the
processes that have currently open a certain file. Example:
$ fuser /dev/pts/3
/dev/pts/3: 14283
number of pseudo-terminal.
54
$ echo Hello, how are you?
Hello, how are you?
As shown, echo displays the text that follows the command or in other words it makes an “echo” to standard
output (i.e. echoes text to the default terminal).
Now, using > we can redirect standard output (fd=1) avoiding the text to appear in the default terminal but we can
redirect the output to another file. Let’s see an example:
On the other hand, using >> we can redirect standard output to a file, but in this case, without deleting the file
previous contents but appending the new text at the end of the file. Example:
$ ls -qw
ls: option requires an argument -- ’w’
Try ‘ls --help’ for more information.
$ ls -qw 2> error.txt # Now error is not displayed in the terminal
In general:
N > f ile. It is used to redirect standard output or standard error to “file”. If file exists, is deleted and overwritten. In
case file does not exist, it is created.
N >> f ile. Similar to the first case but opens file to add data at the end of the file without erasing its contents.
& > f ile. Redirects both standard output and standard error to file.
$ LOGFILE=script.log
$ echo "This sentence is added to $LOGFILE." 1> $LOGFILE
$ echo "This statement is appended to to $LOGFILE" 1>> $LOGFILE
$ echo "This phrase does not appear in $LOGFILE as it goes to stdout."
$ ls /usr/tmp/notexists >ls.txt 2>ls.err
$ ls /usr/tmp/notexists &>ls.all
Note that the redirection commands are initialized or ”reseted” after executing each command line. Finally, it is
worth to mention that the special file /dev/null is used to discard data. example:
The output of above command is sent to /dev/null which is equivalent to discard this output.
55
8.3 Redirecting Input
Basic redirection
Input redirection allows you to specify a file for reading standard input (fd=0). Thus, if we redirect the input for
a command, the data that the command will use will not come from the terminal (typed by the user) but from the
specified file. The format for input redirection is < f ile. We are going to use myscript.sh for illustrating the input
redirection. Recall that this is script contains the following:
#!/bin/bash
# myscript.sh
echo Please, type a word and hit ENTER
read WORD
echo You typed $WORD
As you can observe, the script now reads from file.txt instead of reading from the standard input (the terminal).
We can modify our script to show the table of open file descriptors:
#!/bin/bash
# myscript.sh v2.0
# The next line is used to show the open fd (from 0 to 10).
lsof -a -p $$ -d0-10
echo Please, type a word and hit ENTER
read WORD
echo You typed $WORD
Here Documents
Another input redirection is based on internal documents or “ here documents”. A here document is essentially a
temporary file. The syntax here is: “<<<”. The following example creates an here document whose content it the
text “ hello world” and this temporary file is used as input for the cat command.
Finally, to complete the types of input redirection, we discuss the following expression: << expr. This type of
construction is also used to create “here documents” in which you can type text until you enter “ expr”. The following
example illustrates the concept:
56
$ cat <<END
> hello world
> cruel
> END
hello world
cruel
$ ls | grep x
Bash uses “|” -the pipe symbol- to separate commands and executes these commands connecting the output of a
preceding command (ls in our example) with the input of the following command (grep in our example). In our
example, ls produces the list of files of the working directory and then grep prints only those lines containing the
letter “x”. In more detail, this type of pipe is called “unnamed pipe” because the pipe exists only inside the kernel and
cannot be accessed by processes that created it, in this case, the bash shell.
All Unix-like systems include a variety of commands to manipulate text outputs. We have already seen some of
these commands: head, tail, grep and cut. We also have other commands like uniq which displays or removes
repeating lines, sort which lists the contents of the file ordered alphabetically or numerically, wc which counts lines,
words and characters and find which searches for files.
The following example shows a compound command with several pipes:
The above command line outputs the contents of all the files ending with .txt of the working directory but removing
duplicate lines and alphabetically sorting these lines. The result is saved in the file result.txt
Another useful filter-related command is tee. This command is normally used to split the output of a program
so that it can be seen on the display terminal and also be saved in a file. The command can also be used to capture
intermediate output before the data is altered by another command. The tee command reads standard input, then
writes its content to standard output and simultaneously copies it into the specified file(s).
The following example lists the contents of the working directory, leaves a copy of these contents in a file called
output.txt and then displays these contents in the default terminal in reverse order:
Finally, it is worth to mention that when a command line with pipes or pipeline is executed in the background, all
commands executed in the pipeline are considered members of the same task or job. In the following example, tail
and grep belong to the same task:
The PID shown, 15789, corresponds to the process ID of the last command of the pipeline (grep in this example)
and the JID in this case is “1”. Signals received by any process of the pipeline affects all the processes of the pipeline.
57
8.5 Named Pipes
The other sort of pipe is a “named“ pipe, which is sometimes called a FIFO. FIFO stands for ”First In, First Out“ and
refers to the property that the order of bytes going in is the same coming out. The ”name“ of a named pipe is actually
a file name within the file system.
On older Linux systems, named pipes are created by the mknod command. On more modern systems, mkfifo
is the standard command for creation of named pipes. Pipes are shown by ls as any other file with a couple of
differences:
$ mkfifo fifo1
$ ls -l fifo1
prw-r--r-- 1 user1 someusers 0 Jan 22 23:11 fifo1
$ file pipe1
pipe1: fifo (named pipe)
The ”p“ in the leftmost column indicates that fifo1 is a pipe.
The simplest way to show how named pipes work is with an example. Suppose we have created pipe as shown
above. In a pseudo-terminal type:
t1$ echo hello how are you? > pipe1
and in another pseudo-terminal type:
t2$ cat < pipe
As you see, the output of the command run on the first pseudo-terminal shows up on the second pseudo-terminal.
Note that the order in which you run the commands does not matter. If you watch closely, you will notice that the
first command you run appears to hang. This happens because the other end of the pipe is not yet connected, and so
the kernel suspends the first process until the second process opens the pipe. In Unix jargon, the process is said to be
”blocked“, since it is waiting for something to happen.
One very useful application of named pipes is to allow totally unrelated programs to communicate with each
other. For example, a program that services requests of some sort (print files, access a database) could open the pipe
for reading. Then, another process could make a request by opening the pipe and writing a command. That is, the
”server“ can perform a task on behalf of the ”client“. Blocking can also happen if the client is not writing, or the server
is not reading.
8.6 Dash
The dash “-” in some commands is useful to indicate that we are going to use stdin or stdout instead of a regular file.
An example of such type of command is diff. To show how does dash works, we will generate two files. The
first file called doc1.txt has to contain eight text lines and four of these text lines must contain the word “linux”. The
second file called doc2.txt has to contain the four lines containing the word “linux” that we introduced in the file
doc1.txt. The following command compares the lines of both files that contain the word “linux” and checks that these
lines are the same.
$ grep linux doc1.txt | diff doc2.txt -
In the above command, the dash in diff replaces stdin, which is connected by the pipe to the output of grep.
Note. In general the use of the script depends on the context in which it is used as redirection operator
addition, the script has other uses (not discussed in this document). To give an example, the following
command: cd “ -”We are commanded to previous working directory (here the script has not been used as
operator redirection).
58
8.7 Process Substitution
When you enclose several commands in parenthesis, the commands are actually run in a “subshell”; that is, the shell
clones itself and the clone interprets the commands within the parenthesis (this is the same behavior as with shell
scripts). Since the outer shell is running only a “single command”, the output of a complete set of commands can be
redirected as a unit. For example, the following command writes the list of processes and also the current directory
listing to the file commands.out:
$ (ps ; ls) >commands.out
Command substitution occurs when you put a “<” or “>” in front of the left parenthesis. For instance:
$ cat <(ls -l)
The previous command-line results in the command ls -l executing in a subshell as usual, but redirects the
output to a temporary named pipe, which bash creates, names and later deletes. Therefore, cat has a valid file
name to read from, and we see the output of ls -l, taking one more step than usual to do so. Similarly, giving
“>(commands)” results in bash naming a temporary pipe, which the commands inside the parenthesis read for input.
Command substitution also makes the tee command (used to view and save the output of a command) much more
useful in that you can cause a single stream of input to be read by multiple readers without resorting to temporary files.
With command substitution bash does all the work for you. For instance:
ls | tee >(grep foo | wc >foo.count) \
>(grep bar | wc >bar.count) \
| grep baz | wc >baz.count
The previous command-line counts the number of occurrences of foo, bar and baz in the output of ls and writes
this information to three separate files.
As shown, we have to redirect the output for each command line. A way of improving the previous script is to
permanently assign the standard output of the script to the corresponding file. For this functionality, we use exec.
Table 8.2 summarizes the functions of exec that will be explained in the below by examples.
59
Table 8.2: Standard File Descriptors
Syntax Meaning
exec fd> file open file for writing and assign fd.
exec fd>>file open file for appending and assign fd.
exec fd< file open file for reading and assign fd.
exec fd<> file open file for reading/writing and assign fd.
exec fd1>&fd2 open fd1 for writing. From this moment fd1 and fd2 are the same.
exec fd1<&fd2 open fd1 for reading. From this moment fd1 and fd2 are the same.
exec fd>&- close fd.
command >&fd write stdout to fd.
command 2>&fd write stderr to fd.
command <&fd read stdin from fd.
In the previous example we associate stdout (fd=1) to the log file with exec and therefore we no longer have
to redirect the output of each command line. Instead of redirecting standard output permanently, it is usual to do it
semi-permanently. To do so, you need to backup the original file descriptor (which probably is assigned to a special
terminal file /dev/pts/X). The following example is illustrative:
In another example let us assign the file logfile.log to fd=3 and use this file descriptor number for writing, finally
we close the fd. Example:
$ exec 3>logfile.log
$ lsof -a -p $$ -d0,1,2,3
COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
bash 3443 student 0u CHR 136,35 37 /dev/pts/35
bash 3443 student 1u CHR 136,35 37 /dev/pts/35
bash 3443 student 2u CHR 136,35 37 /dev/pts/35
bash 3443 student 3u REG 3,1 0 86956 /home/student/logfile.log
$ echo hello >&3
$ cat logfile.log
hello
$ exec 3>&-
60
exec 0<&4
exec 4>&-
The previous script saves the descriptor of stdin (fd=0) in fd=4. Then opens the file restaurants.txt using fd=0.
Then reads the file and finally restores fd=0 and closes fd=4.
You can also do redirections using different fd but only for a single command line (not for all commands executed
with the bash). In this case the syntax is the same but we do not use exec. Example:
#!/bin/bash
exec 3>student.log # Open fd=3 (for bash)
echo "This goes to student.log" 1>&3 # redirects stdout to student.log
# only for this command line
echo "This goes to stdout"
exec 1>&3 # redirects stdout to student.log
# for the rest of the commands
echo "This also goes to student.log"
echo "and this sentence, too"
Note. Children processes inherit the opened fd of their parent process. The children can close an fd if it is not
going to be used.
8.9 Extra
In this section we explain some useful commands and features that are typically used together with file redirections
like pipelines.
61
^ This character does not match any character but represents the beginning of the input line. For example, ^A is a
regular expression matching the letter A at the beginning of a line.
[ ] A bracket expression. Matches a single character that is contained within the brackets. For example, [abc]
matches a, b, or c. [a-z] specifies a range which matches any lowercase letter from a to z. These forms can be
mixed: [abcx-z] matches a, b, c, x, y, or z, as does [a-cx-z].
RE* A regular expression followed by * matches a string of zero or more strings that would match the RE. For
example, A* matches A, AA, AAA, and so on. It also matches the null string (zero occurrences of A).
\( \) and \{ \} are also used in basic RE but they are not going to be discussed here.
RE? A regular expression followed by ? matches a string of zero or one occurrences of strings that would match the
RE.
Some Examples
The following patterns are given as illustrations, along with plain language descriptions of what they match:
abc matches any line of text containing the three letters abc in that order.
.at matches any three-character string ending with at, including hat, cat, and bat.
^[hc]at matches hat and cat, but only at the beginning of the string or line.
[hc]at$ matches hat and cat, but only at the end of the string or line.
\[.\] matches any single character surrounded by [], for example: [a] and [b].
Remark: brackets must be escaped with \.
^.$ matches any line containing exactly one character (the newline is not counted).
.* [a-z]+ .* matches any line containing a word, consisting of lowercase alphabetic characters, delimited by at
least one space on each side.
Example in a pipeline:
The previous command-line shows the processes of the user that have a PID of four digits and that end with the
digit 9.
62
8.9.2 tr
The tr command (abbreviated from translate or transliterate) is a command in Unix-like operating systems. When
executed, the program reads from the standard input and writes to the standard output. It takes as parameters two sets
of characters, and replaces occurrences of the characters in the first set with the corresponding elements from the other
set. For example, the following command maps ’a’ to ’j’, ’b’ to ’k’, ’c’ to ’m’, and ’d’ to ’n’.
$ tr ’abcd’ ’jkmn’
The -d flag causes tr to remove characters in its output. For example, to remove all carriage returns from a
Windows file, you can type:
Notice that CR can be expressed as \r or with its ASCII octal value 15. The -s flag causes tr to compress
sequences of identical adjacent characters in its output to a single token. For example:
The previous command replaces sequences of one or more newline characters with a single newline. Note. Most
versions of tr operate on single byte characters and are not Unicode compliant.
8.9.3 find
With the find command you can find almost anything in your filesystem. In this section (and also in the next one),
we show some examples but find offers more options. For example, you can easily find all files on your system that
were changed in the last five minutes:
The following command finds all files changed between 5 and 10 minutes ago:
+5 means more than 5 minutes ago, and -10 means less than 10. If you want to find directories, use -type d.
Searching by file extension is easy too. This example searches the current directory for three different types of image
files:
You can also find all files that belong to a specified username:
Or to a group:
63
8.9.4 *xargs
xargs is a command on most Unix-like operating systems used to build and execute command lines from standard
input. On many Unix-like kernels3 arbitrarily long lists of parameters could not be passed to a command. For example,
the following command:
May eventually fail with an error message of “Argument list too long“, if there are too many files in /path. The
same will happen if you type rm /path*. The xargs command helps us in this situation by breaking the list of
arguments into sublists small enough to be acceptable. The command-line below with xargs (functionally equivalent
to the previous command) will not fail:
In the above command, find feeds the input of xargs with a long list of file names. xargs then splits this list
into sublists and calls rm once for every sublist. Another example:
However, with xargs we might have a problem that might cause that the previous commands do not work as expected.
The problem arises when there are whitespace characters in the arguments (e.g. filenames). In this case, the command
will interpret a single filename as several arguments. In order to avoid this limitation one may use:
The above command separates filenames using the NULL character (0x00) instead of using whitespace (0x20) to
separate arguments. In this way, as the NULL character is not permitted for filenames we avoid the problem. However,
we must point out that the find and xargs commands that we use have to support this feature. The next command-
line uses -I to tell xargs to replace {} with the argument list.
You may also specify a string after -I that will be replaced. Example:
8.11 Practices
Exercise 8.1– In this exercise, we will practice with file redirections using several filter commands.
1. Without using any text editor, you have to create a file called mylist.txt in your home directory that contains the
recursive list of contents of the /etc directory. Hint: use ls -R. Then, “append” the sentence “CONTENTS OF
ETC” at the end of the file mylist.txt. Finally, type a command to view the last 10 lines of mylist.txt to check that
you obtained the expected result.
3 Under the Linux kernel before version 2.6.23
64
Table 8.3: Commands related to file descriptors.
lsof displays per process open file descriptors.
fuser displays the list of processes that have opened a certain file.
uniq filter to show only unique text lines.
sort sorts the output.
wc count words, lines or characters.
diff search differences between text files.
tee splits output.
mkfifo creates a named pipe.
exec bash keyword for operations with files.
exec fd> file open file for writing and assign fd.
exec fd>>file open file for appending and assign fd.
exec fd< file open file for reading and assign fd.
exec fd<> file open file for reading/writing and assign fd.
exec fd1>&fd2 open fd1 for writing. From this moment fd1 and fd2 are the same.
exec fd1<&fd2 open fd1 for reading. From this moment fd1 and fd2 are the same.
exec fd>&- close fd.
command >&fd write stdout to fd.
command 2>&fd write stderr to fd.
command <&fd read stdin from fd.
tr translate text.
find search files.
xargs build argument lines.
2. Without using any text editor, you have to “prepend” the sentence “CONTENTS OF ETC” at the beginning of
mylist.txt. You can use auxiliary files but when you achieve the desired result, you have to remove them. Finally,
check the result typing a command to view the first 10 lines of mylist.txt.
3. Type a command-line using pipes to count the number of files in the /bin directory.
4. Type a command-line using pipes that shows the list of the first 3 commands in the /bin directory. Then, type
another command-line to show this list in reverse alphabetical order.
Hint: use the commands ls, sort and head.
5. Type the command-lines that achieve the same results but using tail instead of head.
6. Type a command-line using pipes that shows the “number” of users and groups defined in the system (the sum of
both).
Hint: use the files /etc/passwd and /etc/group.
7. Type a command line using pipes that shows one text line containing the PID and the PPID of the init process.
Exercise 8.2– In this exercise, we are going to practice with the special files of pseudo-terminals (/dev/pts/X).
1. Open two pseudo-terminals. In one pseudo-terminal type a command-line to display the the content of the file
/etc/passwd in the other terminal.
2. You have to build a chat between two pseudo-terminals. That is to say, what you type in one pseudo-terminal
must appear in the other pseudo-terminal and vice-versa.
Hint: use cat and a redirection to the special file of the pseudo-terminal.
Exercise 8.3– (*) Explain in detail what happens when you type the following command lines:
65
Do you see any output? Hint. Use top in another terminal to see CPU usage.
Exercise 8.4– (*) In this exercise, we will practice with I/O redirection and with Regular Expressions (RE).
1. Create a file called re.txt and type a command-line that continuously ”follows“ this file to display text lines
added to this file in the following way:
Display only the text lines containing the word ”kernel“.
From these lines, display only the first 10 characters.
Try your command-line by writing from another pseudo-terminal some text lines to the file re.txt.
Hint: use tail, grep and cut.
Note. You must use the grep command with the option --line-buffered. This option prevents grep
from using its internal buffer for text lines. If you do not use this option you will not see anything displayed on
the terminal.
2. Type a command-line that continuously ”follows“ the re.txt file to display the new text lines in this file in the
following way:
Display only the text lines ending with a word containing three bowels.
Try your command-line by sending from another pseudo-terminal some text lines to re.txt.
Exercise 8.5– (*) In this exercise, we deal with file descriptors are inheritance.
1. Execute the command less /etc/passwd in two different pseudo-terminals. Then, from a third terminal
list all processes that have opened /etc/passwd and check their PIDs.
Hint: use lsof.
2. Using the fuser command, kill all processes that have the file /etc/passwd open.
3. Open a pseudo-terminal (t1) and create an empty file called file.txt. Open file.txt only for reading with exec
using fd=4. Create the following script called “openfilescript.sh”:
#!/bin/bash
# Scriptname: openfilescript.sh
lsof -a -p $$ -d0-10
echo "Hello!!"
read "TEXT_LINE" <&4
echo "$TEXT_LINE"
Redirect ”stdout“ permanently (with exec) to file.txt in t1 and explain what happens when you execute the
previous script in this terminal. Explain what file descriptors has inherited the child bash that executes the
commands of the script.
4. From the second pseudo-terminal (t2) remove and create again file.txt. Then, execute “openfilescript.sh” in t1.
Explain what happened and why.
66
Part II
Linux Virtualization
67
Chapter 9
Introduction to Virtualization
69
70
Chapter 10
Introduction to Virtualization
10.1 Introduction
Virtualization is a methodology for dividing the resources of a physical computer into multiple operating system (OS)
environments. Virtualization techniques create multiple isolated Virtual Machines (VM) or Virtual Environments
(VEs) on a single physical server. Virtualized enviroments have three basic elements (see Figure 10.1):
• Physical Host. This is the hardware and the OS in which other virtualized machines will run.
Note. You should not mix up this nomenclature with the term “host” used in networking to refer to a end
user computer.
• Guest or virtual machine. This is the virtual system running over a physical host. A guest might be a traditional
OS running just like if it was on a real host. To do so, the host emulates all the system calls for hardware. This
makes the guests feel like if they were in a real computer.
• Virtualized network. The virtual network is composed of a virtual switch that connects the guests like in a real
network. As an additional feature, the physical host can provide connectivity for its guests, allowing them to
exchange traffic with real networks like Internet.
alice
Virtual
eth0 Network
Real Network
tap0
SW0 (e.g. Internet)
tap2 host
eth0 tap3
OS eth0
eth1 eth0 Ph
r1 SW2 r2
tap1
Ho y
carla eth2 eth1 st
bob
eth0
SW1 SW3
eth0
71
10.2 Types of virtualization
There are several kinds of virtualization techniques which provide similar features but differ in the degree of abstraction
and the methods used for virtualization.
• Virtual machines (VMs). Virtual machines emulate some real or fictional hardware, which in turn requires real
resources from the host (the machine running the VMs). This approach, used by most system emulators, allows
the emulator to run an arbitrary guest operating system without modifications because guest OS is not aware
that it is not running on real hardware. The main issue with this approach is that some CPU instructions require
additional privileges and may not be executed in user space thus requiring a virtual machines monitor (VMM) to
analyze executed code and make it safe on-the-fly. Hardware emulation approach is used by VMware products,
VirtualBox, QEMU, Parallels and Microsoft Virtual Server.
• Paravirtualization. This technique also requires a VMM, but most of its work is performed in the guest OS
code, which in turn is modified to support this VMM and avoid unnecessary use of privileged instructions.
The paravirtualization technique also enables running different OSs on a single server, but requires them to be
ported, i.e. they should «know» they are running under the hypervisor. The paravirtualization approach is used
by projects such as Xen, Wine and UML.
• Virtualization on the OS level, a.k.a. containers virtualization. Most applications running on a server can
easily share a machine with others, if they could be isolated and secured. Further, in most situations, different
operating systems are not required on the same server, merely multiple instances of a single operating system.
OS-level virtualization systems have been designed to provide the required isolation and security to run multiple
applications or copies of the same OS (but different distributions of the OS) on the same server. OpenVZ,
Virtuozzo, Linux-VServer, Solaris Zones and FreeBSD Jails are examples of OS-level virtualization.
The three techniques differ in complexity of implementation, breadth of OS support, performance in comparison
with standalone server, and level of access to common resources. For example, VMs have wider scope of usage,
but poorer performance. Para-VMs have better performance, but can support fewer OSs because one has to modify
the original OS. Virtualization on the OS level provides also good performance and scalability compared to VMs.
Generally, such systems are the best choice for server consolidation of same OS workloads.
Figure 10.2 shows a picture of the different virtualization types.
72
As you can observe in Figure 10.2, UML is a type of Paravirtualization. In particular, UML is designed to be run
over another Linux kernel. So UML does not require an intermediate virtualization layer or VMM in the host. Notice
that paravirtualization is less complex than VMs but less flexible too: the guest has to be an UML Kernel and host
must be a Linux Kernel (but a conventional kernel is enough).
The previous command executes the UML kernel in the user space of the host. Notice that we have also specified
the size of RAM memory that is going to be used. This is a minimal configuration, but UML Kernels support a large
number of parameters.
Now, let’s see how to boot two virtual guests at the same time. We can try to open two terminals and execute the
previous command twice but obviously, if kernels try to operate over the same filesystem, we are in trouble because
we will have the filesystem in an unpredictable state. A naive solution could be to make a copy of the filesystem in
another file and start a couple of UML kernel processes each using a different filesystem file. Nevertheless, a better
solution is to use the UML technology called COW (Copy-On-Write). COW allows changes to a filesystem to be
stored in a host file separate from the filesystem itself. This has two advantages:
• Undoing changes to a filesystem is simply a matter of deleting the file that contains the changes.
Now, let’s fire up our UML kernels with COW. This is achieved basically using the same command line as before,
with a couple of changes:
The COW file ”cowfile1“ need not exist. If it doesn’t, the command will create and initialize it. Once the COW
file has been initialized, it can be used on its own on the command line:
The name of the backing file (”filesystem.fs”) is stored in the COW file header, so it would be redundant to continue
specifying it on the command line.
The normal way to create a COW file is to specify a non-existent COW file on the UML command line, and let
UML create it for you. However, sometimes you want a new COW file, and you don’t want to boot UML in order to
get it. This can be done with the uml_mkcow command which comes with the uml-utilities package which can be
installed in the host system with:
In our example:
73
host$ uml_mkcow cowfile1 filesystem.fs
Finally, in another terminal t2 let’s fire up our second UML kernel with another COW file (cowfile2):
When you have finished, simply type ‘halt’ to stop. You can even ‘reboot’ and pretty much anything else
without affecting the host system in any way.
To finish:
host# exit
host# umount img/proc
host# fuser -k img
host# umount img
If something goes wrong while the UML guest is booting, the Kernel process might go into a bad state. In this
case, the best way to “clean” the system is to kill all the processes generated while booting. In our case, as the UML
Kernel is called uml-kernel, to kill all these processes, we can type the following:
In addition, it might be also necessary to remove the cow files and all the uml related files:
host$ rm cowfile?
host$ rm ~/.uml
Finally, unless otherwise stated, the uml_switch and the UML guests have to be launched with your unprivileged
user (not with the root user). If you launch the switch or the UML guest with the root user, you might have problems.
You should halt the UML guests, kill the uml_swtich and clean the system as explained above.
74
10.7 Networking with UML
10.7.1 Virtual switch
In this section, we explain how to build a virtual TCP/IP network with UML guests. To build the virtual network we
will use the uml_switch application (which is in the uml-utilites package). An uml_switch can be defined as a
virtual switch.
UML instances use internally Ethernet interfaces which are connected to the uml_switch. This connection uses
a Unix domain socket on the host (see Figure 10.3).
uml_switch
UML1 UML2
Guest eth0 eth0 Guest
/tmp/uml.ctl
Figure 10.3: Two UML Guests Connected with an uml_swith
75
/tmp/uml.ctl
uml_switch
UML1 UML2
Guest eth0 eth0 Guest
tap0
host
OS
Figure 10.4: Two UML Guests and the Host Connected with an uml_swith
Now, you can give an IP address and a mask to the tap0 interface, start the UML guests and try a ping from the
host to an UML guest:
10.8 Extra
10.8.1 *Building your UML kernel and filesystem
UML Kernel
We will be working in the directory ~/uml. To create it:
Copy the Kernel source code to ~/uml, then untar it and change into the new directory:
In this case, the version of the UML kernel that we are going to compile is XXX. Compiling a UML Kernel uses
the same mechanism as a standard Kernel compile, with the exception that every line you type in the process must
have the option ‘ARCH=um’ appended.
To compile make sure that you have the build-essential package installed1 . To continue, we will create a default
configuration to ensure that everything compiles properly.
1 Or install it with the command sudo apt-get install build-essential.
76
~/uml/linux-XXX$ make mrproper ARCH=um
~/uml/linux-XXX$ make defconfig ARCH=um
~/uml/linux-XXX$ make ARCH=um
When this completes, you will be have a file ‘linux’ in the root of the /uml/linux-XXX/ directory. This is your new
UML Kernel, optinised to run in user-space. Notice that this kernel is quite big, that is because we have not stripped
the debug symbols from it. They may be useful in some cases, but for now we really don’t need them so lets remove
this debugging info:
The UML Kernel that we have contains the default settings and it is compiled to use modules. Once we have
created the root filesystem for UML, we will go back to the kernel tree and install the modules into this filesystem, so
don’t delete the Linux kernel directory just yet.
Then, we create a 1GB file to hold the new root file-system. Create the empty file-system and format as ext3:
host$ cd ~/uml
host$ dd bs=1M if=/dev/zero of=ubuntu-ext3.fs count=2048
host$ mkfs.ext3 ubuntu-ext3.fs -F
Now, create a mount point, and mount this new file so we can begin to fill it up:
Use debootstrap to populate this directory with a basic Ubuntu Linux (natty in this example):
The program will now contact the Ubuntu archive servers (or a local archive if you specify it as the last option on
the command line) and download all the required packages to get a minimal Ubuntu natty system installed. Notice I
also asked to install ‘vim’ since it is my preferred command line text editor. Once it completes, if you list the image
directory you will see a familiar Linux root system.
Kernel Modules
Before running this UML, remember that we compiled the Kernel to use loadable modules. We still have to install
these modules into the file-system, hence the reason we kept the Linux source available. With the image still mounted,
do the following:
host# cd ~/uml/linux-XXX/
host# make modules_install INSTALL_MOD_PATH=../image ARCH=um
77
Removing Virtual Consoles
One major annoyance if we run the UML now is that 6 separate xterm consoles are opened when it boots. This is due
to the fact that Ubuntu by default starts 6 TTY’s (accessed by ALT-F1 to ALT-F6 from outside X). If we really only
need 1 to open, we can fix this as follows2 :
1. In the /etc/event.d/ directory are the startup scripts for tty’s. We will delete tty2 to tty6:
fstab
As a final touch, we will edit the default ‘/etc/fstab’ file to mount the root file-system when the system boots.
Open ‘/etc/fstab’ in your editor and change the contents to the following:
host# passwd
<type your new UML root password here>
<repeat it>
Finishing
host# exit
host# umount image
78
Chapter 11
79
80
Chapter 12
12.1 Introduction
Defining large topologies is hard and prone to errors. For this reason, it is very helpful to have some systematic way of
defining these topologies. To this respect, a research group of the university UPM (Universidad Politecnica de Madrid)
of Spain has developed a virtualization tool that allows to easily define and run simulations involving virtual networks
using UML. The related project is called VNUML (Virtual Network User Mode Linux). In their web site 1 you can
read the following:
«VNUML (Virtual Network User Mode Linux) is an open-source general purpose virtualization tool de-
signed to quickly define and test complex network simulation scenarios based on the User Mode Linux
(UML) virtualization software. VNUML is a useful tool that can be used to simulate general Linux based
network scenarios. It is aimed to help in testing network applications and services over complex testbeds
made of several nodes (even tenths) and networks inside one Linux machine, without involving the invest-
ment and management complexity needed to create them using real equipment.»
In short, the VNUML framework is a tool made of two components:
• A VNUML language for describing simulations in XML (Extensible Markup Language).
• A VNUML interpreter for processing the VNUML language.
Using the VNUML language the user can write a simple text file describing the elements of the VNUML scenario
such as virtual machines, virtual switches and the inter-connection topology. Then, the user can use the VNUML
interpreter called vnumlparser.pl to read the VNUML file and to run/manage the virtual network scenario. This
scheme provides also a way of hiding all UML complex details to the user. In the following sections, we provide a
description about VNUML and how can we use it.
81
• start-tags, for example <section>
• end-tags, for example </section>
• empty-element tags, for example <line-break />
Another special component in a XML file is the element. An element is a logical document component that either
begins with a start-tag and ends with a matching end-tag or consists only of an empty-element tag. The characters
between the start-tag and the end-tag, if any, are the element’s content. The element content may also contain markup,
including other elements, which are called child elements. An example of an element is <Greeting>Hello,
world.</Greeting>. A more elaborated example is the following:
<person>
<nif>46117234</nif>
<name>
<first>Peter</first>
<last>Scott</last>
</name>
</person>
Finally, the attribute of an element is a markup construct consisting of a name="value" pair that exists within a
start-tag or empty-element tag. For example, the above person record can be modified using attributes to add the age
and the gender of the person definition:
<person age="17" gender="male">
<nif>46117234</nif>
<name>
<first>Peter</first>
<last>Scott</last>
</name>
</person>
12.2.3 Escaping
XML uses several characters in special ways as part of its markup, in particular the less-than symbol (<), the greater-
than symbol (>), the double quotation mark ("), the apostrophe (’), and the ampersand (&). But what if you need to
use these characters in your content, and you don’t want them to be treated as part of the markup by XML processors?
For this purpose, XML provides escape facilities for including characters which are problematic to include directly.
These escape facilities to reference problematic characters or “entities” are implemented with the ampersand (&) and
semicolon (;). There are five predefined entities in XML:
82
• & refers to an ampersand (&)
For example, suppose that our XML file should contain the following text line:
The previous line is not correct in XML. To avoid our XML parser being confused with the greater-than character,
we have to use:
In the same way, the quotation mark (") might be problematic if you need to use it inside an attribute. In this case,
you have to scape this symbol. Notice however, that escaping the quotation mark is not necessary in our previous
example, since the quotation mark appears inside the content of the element (and not in the value of an attribute).
• None of the special syntax characters such as "<" and "&" appear except when performing their markup-
delineation roles.
• The begin, end, and empty-element tags that delimit the elements are correctly nested, with none missing and
none overlapping.
• The element tags are case-sensitive; the beginning and end tags must match exactly.
• Tag names cannot contain any of the characters !"#$%&’()*+, /;<=>?@[] \^‘{|}~ nor a space character, and
cannot start with - (dash), . (point), or a numeric digit.
• There must be a single "root" element that contains all the other elements.
83
12.3 General Overview of VNUML
12.3.1 VNUML DTD
The VNUML language 2 defines a set of elements, its corresponding attributes and the way these elements have to be
used inside an XML document to be a "valid". In Code 12.1 we show the beginning of the DTD file of the VNUML
language.
In the previous DTD file we can see several quantifiers. A quantifier in a DTD file is a single character that
immediately follows the specified item to which it applies, to restrict the number of successive occurrences of these
items at the specified position in the content of the element. The quantifier may be either:
• + for specifying that there must be one or more occurrences of the item. The effective content of each occurrence
may be different.
• * for specifying that any number (zero or more) of occurrences are allowed. The item is optional and the
effective content of each occurrence may be different.
• ? for specifying that there must not be more than one occurrence. The item is optional.
• If there is no quantifier, the specified item must occur exactly one time at the specified position in the content of
the element.
84
According to the VNUML DTD file, a VNUML specification document has the structure shown in Code 12.2. The
first two lines of the VNUML file, are mandatory for any XML document. The first line is used to check the XML
version and the text encoding scheme used. The second line tells the processor where to find the corresponding DTD
file.
Following the first two lines, the main body of the virtual network definition is inside the element "vnuml". Inside
the <vnuml> tag, we find, in first place, the global definitions section which is marked with the <global> tag. This
tag groups all global definitions for the simulation that do not fit inside other, more specific tags. Following the global
definitions section, we can find zero or more definitions of virtual networks. These definitions use the tag <net>.
Each network created with <net> is point of interconnection of virtual machines. This point of interconnection is
implemented with a virtual switch like uml_switch. The third part of an VNUML specification is devoted to the
definition of virtual machines. In this part, we can populate zero or more definitions of UML virtual machines using
the <vm> tag. The last part of an VNUML specification is devoted to define actions to be performed in the host using
the <host> tag. This part is optional and we are not going to use it.
• Design phase. The first step is to design the simulation scenario. For this purpose, several aspects have to be
considered in advance: the number of virtual machines, the topology (network interfaces in each machine and
how they are connected), what processes each virtual machine will execute, etc. Note that the whole simulation
runs in the same physical machine (named host). The host may or may not be part of the simulation.
• Implementation phase. Once the simulation scenario has been designed, the user must write the source
VNUML file describing it. This is an XML file, whose syntax and semantic is described by the Language
Reference Documentation of the VNUML project. Also, the XML file must be valid according to the VNUML
DTD file.
• Execution phase. Once we have written the VNUML file, we can run and manage the scenario executing the
vnumlparser.pl application.
uml1 uml3
eth1 eth1
Net0 Net1
eth1 eth2
uml2
Figure 12.1: Simple Network Topology.
85
Now, we have to write the VNUML file describing this topology. As mentioned, this is an XML file, whose syntax
and semantic is described by the VNUML Language. The Code 12.3 shows a VNUML file that describes the design
of Figure 12.1.
The Code 12.3 is rather auto-explicative but we are going to discuss it a little bit. The global section defines that
the version of the VNUML language is 1.8. The simulation name is “simple_example”. The MAC addresses of virtual
machines are auto-generated. The vm_mgmt tag with the attribute type="none" means that the virtual machines are
not accessed via a management network shared with the host. Virtual machines are accessed via a console (attribute
exec_mode="mconsole"). This console is the tty0 of the guest and the xterm terminal is used in the host to connect
to tty0 in the guest. Virtual machines use COW and all of them use the same root filesystem and kernel. In the
virtual networks section, we describe two networks in which two uml_switch are used to connect the machines of
these networks. Finally, in the virtual machines section, we describe three virtual machines with names uml1, uml2
and uml3. The machine uml1 has an Ethernet NIC called eth1 which is “connected” to network Net0, uml3 has an
Ethernet NIC called eth1 connected to network Net1, and uml2 has two Ethernet NICs called eth1 and eth2, which
are connected to networks Net0 and Net1 respectively. Now, if we save the previous VNUML description in a file
called simple_example.vnuml, we can run the scenario using "-t" option of vnumlparser.pl:
This command builds the virtual network topology described in simple_example.vnuml and boots all the three
virtual machines defined. Once you have finished playing around with the simulation scenario, you can release it
using the "-d" option of vnumlparser.pl:
86
12.4 VNUML language
In this section, we provide a general description of the VNUML language. For this purpose, we describe the main
tags that can be found in each of the three sections of a VNUML specification: the global section, the virtual networks
section and the virtual machines section. We will find three kinds of tags in VNUML: structural tags, with few or no
semantics; topology tags, used to describe the topology; and simulation tags, used to describe simulation parameters
and commands. The following sections describe these tags.
In addition, the tags allowed inside this element are the following:
87
read-only shared device. A machine’s writes are stored in the private device, while reads come from either
device. Using this scheme, the majority of data which is unchanged is shared between an arbitrary number
of virtual machines, each of which has a much smaller file containing the changes that it has made. With
a large number of UMLs booting from a large root filesystem, this leads to a huge disk space saving.
COW mechanism saves a lot of storage space, so COW mode is recommended to boot UMLs. Example:
<filesystem type="cow">/usr/share/vnuml/filesystems/root_fs_tutorial</filesystem>
There are more types of consoles, for more information see the description of the same tag in the section
of the virtual machines.
There are other attributes that we can find within the <net> tag help us to accurately define the behavior of the
virtual network. In this context, we can use the hub attribute set to "yes" to configure the uml_switch process in hub
mode (its default behavior is as switch):
88
<net name="Net0" mode="uml_switch" hub="yes" />
Another interesting attribute is the sock attribute, which contains the file name of a UNIX socket on which
an uml_switch instance is running. If the file exists and is readable/writable, then instead of starting a new
uml_switch process on the host, a symbolic link is created, pointing to the existing socket. In this way, we can create
uml_switch instances and set their permissions and their configuration ahead of time. In particular, we can attach
a tap interface in the host to the the uml_switch (this can be done using the -tap option of the uml_switch).
This allows the host to monitor the virtual networks or to be part of these virtual networks using its tap interface. For
example, in the VNUML specification we can use:
<net name="Net0" mode="uml_switch" hub="yes" sock="/var/run/vnuml/Net0.ctl" />
And then start the uml_switch in the host in the following way:
Within <vm> several tags configures the virtual machine environment (this is a non-intensive list of tags):
– If you use the console “xterm”, then a tty in the guest is connected to an xterm application in the host.
For example:
<console id="0">xterm</console>
If the previous element is present in the definition of a guest virtual machine, when the simulation is started
it will appear an xterm in the host that is connected to the tty0 of the guest.
89
– If you use the console “pts”, then a tty in the guest is connected to a pseudo-terminal (pts) in the host.
For example:
<console id="1">pts</console>
If the previous element is present in the definition of a guest virtual machine, then, when the simula-
tion is started, a pts in the host is connected to the tty1 (notice that id="1") on the guest. In par-
ticular, the pts device is stored by the vnumlparser.pl in a file. For example, if your simulation
name is “simple_example”, and the virtual machine name is “uml1“, the filename for the pts will be
$HOME/.vnuml/simulations/simple_example/vms/uml1/run/pts. If you execute cat over this file while
the simulation is running, you will obtain a result like this:
$ cat $HOME/.vnuml/simulations/simple_example/vms/uml1/run/pts
/dev/pts/7
This means that the /dev/tty1 inside the guest is connected to /dev/pts/7 inside the host. To access
to pseudo-terminal devices, we can use the screen command as follows:
$ screen /dev/pts/7
• <if >. Optional and multiple.
This tag describes a network interface in the virtual machine. It uses two attributes: id and net.
Attribute id identifies the interface. The name of a virtual machine interface with id=n is ethn.
Attribute net specifies the virtual network (using name value of the corresponding <net>) to which the interface
is connected.
Example:
<if id="1" net="Net1">
<ipv4>10.0.1.2/24</ipv4>
</if>
or using the optional mask attribute either in dotted or slashed notation, for example:
<ipv4 mask="/24">10.1.1.1</ipv4>
or
<ipv4 mask="255.255.255.0">10.1.1.1</ipv4>
If the mask is not specified (for example, <ipv4>10.1.1.1</ipv4>) 255.255.255.0 (equivalently /24) is used
as default.
Using mask attribute and the mask prefix in the tag value at the same time is not allowed.
– <ipv6 >. Optional and multiple.
Specifies an IPv6 address for the interface. The mask can be specified as part of the tag value. For example:
<ipv6>3ffe::3/64</ipv6>
90
You can also use the optional mask attribute in slashed notation. For example:
<ipv6 mask="/64">3ffe::3/64</ipv4>
Note that, different from <ipv4> , dotted notation is not allowed in <ipv6>. If the mask is not specified
(for example, <ipv6>3ffe::3/64</ipv6>) /64 is used as default.
Using mask attribute and the mask prefix in the tag value at the same time is not allowed.
• <route >. Optional and multiple.
Specifies a static route that will be configured in the virtual machine routing table at boot time.
The routes added with this tag are gateway type (gw). Two attributes are used: type (allowed values: "ipv4" for
IPv4 routes or "ipv6" for IPv6 routes) and gw, that specifies the gateway address. The value of the tag is the
destination (including mask, using the ’/’ prefix) of the route.
<route type="ipv4" gw="10.0.0.3">default</route>
• <forwarding >. Optional (default specified with <vm_defaults>) and unique. Activates IP packet forward-
ing for the virtual machine (packets arriving at one interface can be forward to another, using the information in
the routing table). This tag uses the optional type attribute (default is "ip"): allowed values are: "ipv4", that en-
ables forwarding for IPv4 only; "ipv6", that enables forwarding for IPv6 only; and "ip" that enables forwarding
for both IPv4 and IPv6. The forwarding is enabled setting the appropriate kernel signals under /proc/sys/net.
<forwarding type="ip" />
<exec>
This is an optional tag and it can appear multiple times in a VNUML file. Specifies one command to be executed by
the virtual machine during executing commands sequence mode. In this document, we show the mandatory attributes
this tag can use. Optional attributes are described in the VNUML reference manual.
Mandatory attributes:
• seq. It is a string that identifies a command sequence. This string is used to identify the commands to be
executed.
• type (allowed values: "verbatim", "file"). Using "verbatim" specifies that the tag value is the verbatim command
to be executed. Using "file", the tag value points (with absolute pathname) to a file (in the host filesystem) with
the commands that will be executed (line by line).
In the following example (see Code 12.4), it has been defined two labels as command sequences: "start" and "re-
move". When the "start" label is executed, the uml1 virtual machine will execute the command "/usr/bin/streamsender"
whereas the uml2 virtual machine will execute "/usr/bin/streamreceiver". If "remove" label is executed, then the uml1
virtual machine will execute the command "rm /etc/motd".
91
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE vnuml SYSTEM "/usr/share/xml/vnuml/vnuml.dtd">
<vnuml>
<global>
....
</global>
<net name="Net0" mode="uml_switch" />
<vm name="uml1">
...
<exec seq="remove" type="verbatim">rm /etc/motd</exec>
<exec seq="start" type="verbatim">/usr/bin/streamsender</exec>
</vm>
<vm name="uml2">
....
<exec seq="start" type="verbatim">/usr/bin/streamreceiver</exec>
</vm>
</vnuml>
<filetree>
This is an optional tag it can appear multiple times in a VNUML file.
<vnuml>
<global>
....
<vm_defaults exec_mode="mconsole">
....
<basedir>/home/user/config_files/</basedir>
...
</vm_defaults>
</global>
<vm name="uml1">
...
<filetree seq="stconf" root="/etc/streamer">streamer/</exec>
</vm>
</vnuml>
Specifies a filetree (a directory, as well as all its files and subdirectories) in the host filesystem that will be copied
to the virtual machine filesystem (overwriting existing files) during execution commands mode. This tag allows easily
copying of entire configuration directories (as /etc) that are stored and edited from host when preparing simulations.
• If the directory (in the host filesystem) starts with "/", then it is an absolute directory.
• If the directory doesn’t start with "/", then it is relative to <basedir> tag.
<basedir> is an optional tag (default specified with <vm_defaults>) and unique. It sets the root path used for
<filetree> tags, that is to say, when when the filetree path doesn’t start with "/" it uses the path specified in <filetree>
as a relative path to the value in <basedir>.
Important note: if <basedir> is not specified, the value of basedir is set to the directory in which it is stored the
VNUML file.
<filetree> tag uses two mandatory attributes:
92
• root. Specifies where (in the virtual machine filesystem) to copy the filetree.
• seq. The name of the commands sequence that triggers the copy operation. Note that filetree copy is made
before processing <exec> commands.
Other optional attributes can be viewed in VNUML language reference manual. The code 12.5 shows how to use
the <filetree> tag. In this example, when label "stconf" is executed, a filetree copy between host and uml1 virtual
machine is performed. Specifically, the filetree in the host below "/home/user/config_files/streamer" is copied to the
uml1 filesystem at "/etc/streamer".
Note that it is possible to have the same sequence label assigned to an <exec> tag and to a <filetree> tag. In this
case, the copy using <filetree> is executed first and next the commands within <exec>.
2. Execute commands. Once the scenario has been built, you can run command sequences on it. Basically in
this step, the parser takes the commands defined in the <exec> and <filetree> tags in the VNUML definition file
and executes them. Several command sequences may be defined (e.g., one to start routing daemons, another to
perform a sanity check in the virtual machines, etc.), specifying which to execute in every moment.
This step is optional. If you don’t need to execute command sequences (because you prefer interact with the
virtual machines directly), you don’t need it.
3. Release Scenario. In this final step, all the simulation components previously created (UMLs, virtual networks,
etc.) are cleanly released. The UML shutdown process makes this step also very processor intensive.
vnumlparser.pl has several operation modes each related to each of the three steps of the execution phase:
2. Execute commands: -x mode. In this case once we know the sequence label (labelname) we want to execute,
the command syntax is:
vnumlparser.pl -x labelname@VNUML-file
3. Release Scenario: -d mode. The command syntax is: vnumlparser.pl -d VNUML-file
3 In computing, a parser (syntax analyzer) is one of the components in an interpreter or compiler, which checks for correct syntax and builds
93
94
Chapter 13
Simulation Tools
95
96
Chapter 14
Simulation Tools
14.1 Installation
The following instructions allow you to install the tools to build VNUML Virtual Networks and use our simctl
wrapper. This installation has been tested using the 32-bit version of Ubuntu 12.04. It is known that the installation
does not work for the 64-bit versions of these distributions.
In case you have a 64-bit OS, we can provide to you an ISO or VDI (for VirtualBox) image with everything already
installed on it.
In this case, it is very important that you check that your processor supports hardware virtualization and that you
activate this feature in your BIOS. Another possibility, if your physical machine is able to boot from USB (most
modern computers support this), is to make a raw copy with dd of our ISO image on a USB pendrive and use this
device as hard disk.
One you have a Linux box, type the following command to add our repository to your list of APT (software)
repositories:
$ echo "deb http://sertel.upc.es/~vanetes/debs i386/" |
sudo tee /etc/apt/sources.list.d/simtools.list
Finally, type the following commands to update the repository list of software and to install all the packages related
to simtools.
Note: you can repeat these steps if the software is not installed correctly at the first time.
$ sudo apt-get update
$ sudo apt-get install metasimtools -y --force-yes
97
14.2.1 Profile for simctl
The simctl wrapper is compatible with the version 1.8 of VNUML but to be able to fully exploit the functionalities
of this script you should consider the issues that are listed below:
• We use consoles of type “pts” and we can use multiple consoles of this type (mpts functionality) as follows:
<console id="0">pts</console>
<console id="1">pts</console>
If there are <console> tags in both <vm_defauls> and <vm>, they are merged. Our wrapper, simctl, internally
uses the screen application to automatically manage the connection to these pseudo-terminal devices. Our
wrapper is able to list the pseudo-terminals available and to allow you to always connect to the virtual machines.
With simctl, you will never loose the possibility to have a console with a virtual machine while the simulation
is running. You can even close a console and later reopen it without loosing any data. On the other hand, the
mpts functionality is implemented by our modified version of the vnumlparser.pl, which stores the names
of the multiple pts devices at the same directory as the original vnumlparser.pl, but we use the filenames
pts.0, pts.1, etc.
• simctl always executes the label “start”, i.e. the <exec> tags with attribute seq="start", when it initiates a
simulation.
• simctl automatically creates a tap interface in the host for each virtual network definition in which it founds
a sock attribute. For example, if you define a virtual network like this:
<net name="Net0" mode="uml_switch" hub="yes" sock="/var/run/vnuml/Net0.ctl" />
Then, simctl creates a tap interface in the host called tap0. In more detail, simctl calls another bash
script called simtun. The script simtun is executed with root permissions, which allows us to create the tap
interfaces and execute the uml_switch instances in the host. In this creation, the tap is connected with the
uml_switch, which in turn, is connected with the virtual machines creating the virtual network.
• simctl uses a configuration file for defining some basic parameters using the syntax of bash. To locate this
file, the script first checks the file .simrc in the home directory of the user that is running simctl, if this file
is not found, then simctl accesses the system-wide configuration file located at /usr/local/etc/simrc.
An example configuration file is the following:
98
# simrc: tunning of environment variables
# for simctl
# Definition of scenario files directory
DIRPRACT=/usr/share/vnuml/scenarios
# Change the default terminal type (xterm)
# values: (gnome | kde | local)
# TERM_TYPE=gnome
# KDE Konsole tunning
# For Konsole version >= 2.3.2 (KDE 4.3.2) use these options:
# KONSOLE_OPTS="-p ColorScheme=GreenOnBlack - -title "
# For Konsole version <= 1.6.6 (KDE 3.5.10) use these options
# KONSOLE_OPTS="- -schema GreenOnBlack -T "
The configuration file can be customized. For example, the DIRPRACT environment variable contains the
“path” where VNUML simulation files can be found. On the other hand, if you want to use a GNOME terminal
instead of an xterm terminal, you can assign the variable TERM_TYPE=gnome.
Note. When simctl runs console terminals, it tries to use a classic color settings with green foreground on
black background. This feature is modifiable for GNOME and KDE terminals. KDE Konsole terminal can be
configured with the variable KONSOLE_OPTS, keeping in mind that the last parameter must be the handle of
the window title of the terminal. For gnome-terminal, you can define and save a profile with name vnuml with
custom features. Editing the gnome-terminal profiles can be made in the edit menu.
• With simctl, you can always use the “TAB” key to auto-complete the commands and options available at each
moment.
Now, following the example of Section 12.3.4, we are going to show how to manage the scenario with simctl instead
of with the vnumlparser.pl and we are going to complete the scenario with more functionality by including the
definition of IP network addresses, routes, the execution of commands, multiple consoles etc. Figure 14.1 shows the
design of the topology including IP addresses and networks.
host
uml1 OS uml3
10.0.0.2/24 SimTNet0 eth1
eth1 Net1 10.0.1.5/24
uml2
The Code 14.1 shows a VNUML file that meets the topology and network configuration above exposed. The XML
file contains also the configuration of additional functionalities. Next, we discuss the more relevant aspects about this
VNUML specification file.
99
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE vnuml SYSTEM "/usr/share/xml/vnuml/vnuml.dtd">
<vnuml>
<!-- Global definitions -->
<global>
<version>1.8</version>
<simulation_name>simple_example</simulation_name>
<automac/>
<vm_mgmt type="none" />
<vm_defaults exec_mode="mconsole">
<filesystem type="cow">/usr/share/vnuml/filesystems/root_fs_tutorial</filesystem>
<kernel>/usr/share/vnuml/kernels/linux</kernel>
<console id="0">pts</console>
</vm_defaults>
</global>
<!--Network definitions -->
<net name="Net0" mode="uml_switch" hub="yes" sock="/var/run/vnuml/Net0.ctl" />
<net name="Net1" mode="uml_switch" />
<!-- Virtual machines definition -->
<vm name="uml1">
<console id="1">pts</console>
<if id="1" net="Net0"> <ipv4>10.0.0.2/24</ipv4> </if>
<route type="ipv4" gw="10.0.0.1">default</route>
<exec seq="start" type="verbatim">echo "1" >/proc/sys/net/ipv4/conf/all/accept_source_route</exec>
<exec seq="reset_ips" type="verbatim">ifconfig eth1 0.0.0.0</exec>
</vm>
<vm name="uml2">
<if id="1" net="Net0"> <ipv4>10.0.0.1/24</ipv4> </if>
<if id="2" net="Net1"> <ipv4>10.0.1.1/24</ipv4> </if>
<forwarding type="ip" />
<exec seq="start" type="verbatim">echo "1" >/proc/sys/net/ipv4/conf/all/accept_source_route</exec>
<exec seq="reset_ips" type="verbatim">ifconfig eth1 0.0.0.0</exec>
<exec seq="reset_ips" type="verbatim">ifconfig eth2 0.0.0.0</exec>
<exec seq="enable_forwarding" type="verbatim"> echo "1" >/proc/sys/net/ipv4/ip_forward </exec>
<exec seq="disable_forwarding" type="verbatim"> echo "0" >/proc/sys/net/ipv4/ip_forward </exec>
</vm>
<vm name="uml3">
<if id="1" net="Net1"> <ipv4 mask="255.255.255.0">10.0.1.5</ipv4> </if>
<route type="ipv4" gw="10.0.1.1">default</route>
<exec seq="start" type="verbatim">echo "1" >/proc/sys/net/ipv4/conf/all/accept_source_route</exec>
<exec seq="reset_ips" type="verbatim">ifconfig eth1 0.0.0.0</exec>
</vm>
</vnuml>
The first relevant aspect to mention is that the previous definition uses multiple pseudo-terminals. In particular,
notice that there is a tag <console id="0"> in the global element of the specification. This means that all the virtual
machines will have one console of type "pts". In addition, the definition of virtual machine uml1 includes another
console tag: <console id="1">. This means that this virtual machine is going to have two consoles of type "pts".
Regarding the definition of virtual networks, notice that Net0 has been defined with the sock attribute, which
means that this network is going to be connected to the host with a tap interface called tap0. The other virutal
network defined, Net1, has not the sock attribute, and thus, it will not be connected to any tap interface of the host.
Then, we have the definition of the three virtual machines uml1, uml2 and uml3. Regarding the IP configuration,
as you can observe, we have configured the IP addresses as specified in Figure 14.1 and uml1 and uml3 have uml2
as their default router. The forwarding in uml2 has been activated too.
Finally, we have several <exec> tags in the definition of each virtual machine. Notice that all the virtual machines
have the label “start“ which in this example enables the ”source routing“ functionality of the virtual machines. Again,
all the virtual machines have the label ”reset_ips“, which simply removes the IP addresses of the Ethernet network
interfaces of the virtual machines (and as a result this action also removes all the routes from the routing tables).
100
Finally, the virtual machine uml2 has two labels called ”enable_forwarding“ and ”disable_forwarding“ that allow us
to enable and disable IP forwarding (which by default is enabled when uml2 is booted).
OPTIONS
start Starts scenario
stop Stops scenario
status State of the selected simulation
(running | stopped)
vms Shows vm's from simulation
labels [vm] Shows sequence labels for ALL vm's or for vm
exec label [vm] Exec label in the vms where label is defined
or exec the label only in the vm
netinfo Provides info about network connections
get [-t term_type] vm [pts] Gets a terminal for vm
term_type selects the terminal
(xterm | gnome | kde | local)
pts is an integer to select the console
The output of simctl in this example tells us that it has located four scenarios with names: icmp, routing,
subnetting and simple_example. This output also shows us all the possibilities that simctl provides us to manage
the scenario. These possibilities are explored in the following sections.
Please, be patient because it might take some time to complete the starting process (this might take up to several
minutes). Finally, the command ends indicating the time taken to start the scenario and we get the console prompt
again. At this moment, all the virtual machines and their respective interconnections with virtual switches have been
created. After the scenario is started, you can check in the host that the corresponding tap interfaces have been created.
In particular, after you run the simple_example scenario, if you type ifconfig -a in the host two view all the
network interfaces, you should obtain an output as follows:
host$ ifconfig -a
eth1 Link encap:Ethernet HWaddr 00:23:ae:1c:51:29
inet addr:192.168.234.252 Bcast:192.168.234.255 Mask:255.255.255.0
inet6 addr: fe80::223:aeff:fe1c:5129/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:137472 errors:0 dropped:0 overruns:0 frame:0
101
TX packets:105919 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:77824115 (77.8 MB) TX bytes:10273344 (10.2 MB)
Interrupt:22 Memory:f6ae0000-f6b00000
On the other hand, when you wish to stop the simulation, you can type the following:
14.2.5 Troubleshooting
If there is a problem starting a simulation you should type the following commands to clear the system and start it
again:
The forcestop option kills all the ”linux“ processes (UML kernels) and removes the directory .vnuml at the
user’s home.
On the other hand, you should never run two different simulations at the same time. If by mistake you start two
simulations do:
Finally, you should never use the superuser ”root“ to execute simctl. If by mistake you start a simulation with
the root user, you must clear the system and start it again using your user:
102
host$ sudo -s
host# simctl simulation_name stop
host# simctl forcestop
host# exit
host$ simctl simulation_name start
To access to the console of a virtual machine you have to execute “simctl simname get virtual_machine”. For
example, you can get a command console of the virtual machine uml1 using the following command:
The “get” option can have an argument (-t) to indicate what type of terminal you want to use (it requires that
selected terminal emulator is already installed on the system). The argument values (-t) and the terminals can be
any of: xterm (classic in X11), gnome (terminal from GNOME) or kde (Konsole from KDE). For example, to get
a gnome-terminal for the uml2 virtual machine, you can type (you can also define this terminal as default in your
preferences file “simrc”):
Once you have accessed to the console of the virtual machines uml1 and uml2, you can type:
host$ simctl simple_example get
uml1 10121.0 (Attached)
uml2 10169.0 (Attached)
uml3 Running --------
“Attached” indicates that there is already a terminal associated with the command console of uml1 virtual machine.
If you close the uml1 terminal then the terminal state is “Detached”.
Finally, you can also access to the other console of uml1 using the following command:
103
host$ simctl simple_example get
uml1 14272.1 (Attached)
uml1 10121.0 (Attached)
uml2 10169.0 (Attached)
uml3 Running --------
As you observe, uml1 has two consoles attached. Notice that if you try to get a second console on uml2 you will
obtain an error:
As you observe, the output is the topology defined in the XML file. This command can be useful to detect mistakes
in the topology configuration. It is also worth to mention, that the “netinfo” option uses information directly obtained
from the virtual machines (not from the VNUML file) when the simulation is running.
As shown in the output of the previous command, with the ”labels“ option we obtain the list of defined labels per
virtual machine. You can also view the labels of a specific virtual machine using the name of the virtual machine:
The other option of simctl (”exec“) allows you to manage the execution of the actions (commands) of a label.
The command syntax “simctl simname exec labelname” executes the commands associated with the label
“labelname” on all the virtual machines where that label is defined. For example:
104
host$ simctl simple_example exec reset_ips
Virtual machines group: uml3 uml2 uml1
OK The command has been started successfully.
OK The command has been started successfully.
OK The command has been started successfully.
Total time elapsed: 0 seconds
Recall that in our example, the “reset_ips” label removes the IP addresses from all the interfaces of the virtual
machines. Finally, you can also run a label on a single machine with the following syntax:
Notice that in this particular example, the same result can be obtained executing the previous command without
specifying the virtual machine (uml2). This is true because in this case, the label “disable_forwarding” is only defined
for uml2. In general, if a label is multiply defined on several virtual machines, with the previous command, the actions
of a label are only executed on the specified virtual machine but not on the rest of virtual machines that have defined
that label.
host$ sudo -s
host# cd /usr/share/vnuml/filesystems
host# mkdir img
host# mount -o loop filesystem.fs img/
host# cp /etc/resolv.conf img/etc/resolv.conf
host# mount -t proc none img/proc
host# chroot img
Where “filesystem.fs” must be replaced by the name of the file under the /usr/share/vnuml/filesystems directory
that contains the filesystem in which you want to install software. Then, to install software type:
To finish:
host# exit
host# umount img/proc
host# fuser -k img
host# umount img
host# exit
host$ simctl forcestop
105
• Problem 1. The UML Kernel is not aware of terminal size. The consequence of this is that when you resize the
terminal in which you are executing screen, the size of the terminal is not refreshed.
Workaround to problem 1. We can use stty command to indicate terminal size to the UML kernel. For example
"stty cols 80 rows 24". Elaborating a little bit more this solution, we can use the key binding facility of screen
to do so.
106
Part III
Network Applications
107
Chapter 15
109
110
Chapter 16
Introduction
16.1 Introduction
16.1.1 TCP/IP Networking in a Nutshell
In this section, we provide a brief review of the TCP/IP architecture; several more aspects of this architecture will be
further discussed in the following chapters.
TCP/IP defines the rules that computers must follow to communicate with each other over the Internet. Browsers
and Web servers use TCP/IP to communicate with each other, e-mail programs use TCP/IP for sending and receiving
e-mails and so on. TCP and IP were developed to connect a number different networks designed by different vendors
into a network of networks (the "Internet"). It was initially successful because it delivered a few basic services that
everyone needs (file transfer, electronic mail, remote logon) across a very large number of client and server systems.
Architectures for handling data communication are all composed of layers and protocols. TCP/IP considers two
layers: transport layer and network layer. The transport layer provides an process-to-process service to the applica-
tion layer, while the network layer provides computer-to-computer services to the transport layer. Finally, the network
layer is build on top of a data link layer, which in turn is build over a physical layer.
Within each layer, we have several protocols. A protocol is a description of the rules computers must follow to
communicate with each other. The architecture is called TCP/IP because TCP and IP are the main protocols for the
transport an network layers respectively; but actually, the essential protocols of the TCP/IP architecture in a top-down
view are (see Figure 16.1):
• Transport layer:
• Network layer:
– IP (Internet Protocol).
– ICMP (Internet Control Message Protocol).
In Unix-like systems, TCP, UDP, IP and ICMP are implemented in the Kernel of the system. These protocols are
used to achieve a communication between processes in user space. If a process in user space (application) wants to
communicate with another process in the same or another user space, it has to use the appropriate system calls.
TCP provides applications with a full-duplex communication, encapsulating its data over IP datagrams. TCP com-
munication is connection-oriented because there is a handshake of three messages between the kernel TCP instances
related to each process before the actual communication between the final processes is possible. These TCP instances
may reside on the same, or in different Kernels (computers). The TCP communication is managed as a data flow, in
111
USER SPACE
Application Layer
Transport Layer
tcp udp
KERNEL
Network Layer
icmp ip
NET CARD
Ethernet Wlan . . .
other words, TCP is not message-oriented. TCP adds support to detect errors or lost data and to trigger retransmission
until the data is correctly and completely received, so TCP is a reliable protocol.
UDP is the other protocol you can use as transport layer protocol. UDP is similar to TCP, but much simpler and
less reliable. For instance, this protocol does not manage packet retransmissions. UDP is message-oriented and each
UDP packet or UDP datagram is encapsulated directly within a single IP datagram.
IP is the main protocol of the network layer and it is responsible for moving packets of data between computers
also called “hosts“. IP is a connection-less protocol. With IP, messages (or other data) are broken up into small
independent "packets" and sent between computers via the Internet. IP is responsible for routing each packet to the
correct destination. IP forwards or routes packets from node to node based on a four byte destination address: the IP
address. When two hosts are not directly connected to a data link layer network, we need intermediate devices called
”routers“. The IP router is responsible for routing the packet to the correct destination, directly or via another router.
The path the packet will follow might be different from other packets of the same communication.
ICMP messages are generated in response to errors in IP datagrams, for diagnostic or routing purposes. ICMP
errors are always reported to the original source IP address of the originating datagram. ICMP is connection-less
and each ICMP message is encapsulated directly within a single IP datagram. Like UDP, ICMP is unreliable. Many
commonly-used network utilities are based on ICMP messages. The most known and useful application is the ping
utility, which is implemented using the ICMP ”Echo request“ and ”Echo reply“ messages (see also Section ??).
Finally, as shown in Figure 16.1, we may have many applications in user space that can use the TCP/IP architecture
like telnet, ftp, http/html, ssh, X window, etc. The arrows in Figure 16.1 show how application data can be encapsulated
to be transmitted over the TCP/IP network. In particular, it is shown that applications can select TCP or UDP as
transport layer. Then, these protocols are encapsulated over IP, and IP can select several data link layer technologies
like Ethernet, Wireless LAN, etc. to send the data.
112
16.1.2 Client/Server Model
The model
The client/server model is the most widely used model for communication between processes, generically called
”interprocess communication”. In this model, when there is a communication between two processes, one of them acts
as client and the other process acts as server (see Figure 16.2). Clients make requests to a server by sending messages,
and servers respond to their clients by acting on each request and returning results. One server can generally support
numerous clients.
Host A Host A
Service request
client Response server
... ...
... ...
In Unix-systems, server processes are also called daemons. In general, a daemon is a process that runs in the
background, rather than under the direct control of a user. Typically daemons have names that end with the letter ”d“.
For example, telnetd, ftpd, httpd or sshd. On the contrary, typically, client processes are not daemons. Clients are
application processes that usually require interaction with the user. In addition, clients are run in the system only when
needed, while daemons are always running in background so that they can provide their service when requested. As
the client process is who initiates the interprocess communication, it must know the address of server. One of
the ways of implementing this model is to use the ”socket interface“. A socket is an endpoint of a bidirectional inter-
process communication. The socket interface provides the user space applications with the system calls necessary to
enable client/server communications. These system calls are specific to client and to server applications. By now, we
will simplify this issue saying that servers open sockets for ”listening“ for client connections and clients open sockets
for connecting to servers. In practice, there are several socket implementations for client/server communications using
different architectures. Some of the two most popular socket domains are:
• Unix sockets. The Unix sockets domain is devoted to communications inside a computer or host. The addresses
used in this domain are filenames1 .
• TCP/IP sockets. These are the most widely used for client/server communications over networks.
Client/Server TCP/IP
In the TCP/IP domain, the address of a process is composed of: (i) an identifier called IP address that allows reaching
the destination “user space” or host in which the server process is running, (ii) an identifier of the process called
transport port and (iii) the transport protocol used. So, the client process needs to know these three parameters to
establish a TCP/IP socket with a network daemon (server).
To facilitate this aspect, Internet uses a scheme called: well-known services or well-known ports. In this scheme,
ports are split basically into two ranges. Port numbers below 1024 are reserved for well-known services and port
numbers above 1024 can be used as needed by applications in an ephemeral way. The original intent of service port
numbers was that they fall below port number 1024. Of course, the number of services exceeded that number long
1 Further details about this architecture are beyond the scope of this document.
113
ago. Now, many network services occupy the port number space greater than 10242 . For example, the server of the X
window system (explained later) listens on port 6000. In this case, we say that this port is the “default” port for this
service. As mentioned, the tuple {IP address, protocol, port} serves as ”process identifier” in the TCP/IP network.
Some remarkable issues about this tuple are the following:
• IP address. Hosts do not have a unique IP address. Typically, they have at least an IP per network interface.
Then, we can use in the tuple interchangeably any of the IP addresses of any of the host interfaces to identify a
process within a host.
• Ports and protocol. Ports are commonly called ”bound“ when they are attached to a given socket (which has
been opened by some process). Ports are unique on a host (not interface) for a given transport protocol. Thus,
at a time only one process can bind a port for a given protocol. When a server daemon has bound a port, we
typically say that it is ”listening“ on that port.
sd=4 sd=4,sd=5
Network Network
MAC address Interface MAC address Interface
socket
So, what happens when you open your favorite browser (e.g. firefox) and you type a name in the URL bar like
http://www.upc.edu? As mentioned, the client (Firefox) needs the tuple {IP address, protocol, port} that identifies the
2 In most Unix-like systems, only the root user can start processes using ports under 1024.
114
server process (e.g. apache Web server) to establish the socket. The first parameter, the IP address, is obtained from
the name3 that you introduced in the URL bar of the browser. Furthermore, you can also type directly an IP address
in this bar, for example, you can type: http://192.168.0.1. The second parameter, the transport protocol, is always
TCP for HTTP (which is the protocol for the Web service). The third parameter of the tuple, the port, is 80 because
this is the default port for this service since this is a well-known service. In the URL bar you can type another port
using “:” if the Web server is not running on the the default port, for example, you can type: http://192.168.0.1:8080.
Then, our client (Firefox) will ask its kernel to establish the socket with the server passing the tuple to the proper
system call. Next, the client’s kernel will generate a TCP instance and it will negotiate the end-to-end communication
with another TCP instance in the server’s kernel. After the hand-shake is performed, the client’s kernel will provide
our WEB browser with a file descriptor (as mentioned, specifically called “socket descriptor”). In addition, when
establishing this socket, the client’s kernel will assign as an ephemeral transport port for identifying the client process
in this TCP/IP end-to-end communication.
In our example of Figure 16.3, our client process (e.g. firefox browser) has PID 24567. It has used a system call
with the tuple {172.16.0.1,TCP,80} to create a socket with a WEB server. The ephemeral port and the socket descriptor
assigned by the client’s kernel for this socket are 11300 and 4, respectively. On the other side, we have a Web server
(e.g. apache) that is running in background with PID=45889, waiting for client connections (or “listening”) on port
80. In more detail, the network daemon has bounded server TCP port 80 and it has obtained from its kernel a socket
descriptor for this transport port (in our example sd=4). When the WEB server receives an incoming attempt to create
a new TCP connection from the remote client, it is notified and it has to use a system call to accept this connection.
When a connection is accepted, the server’s kernel creates a new socket associated with the corresponding tuple of this
connection, which is {192.168.0.1,TCP,11300}, and provides the server process with a socket descriptor (sd=5 in our
example).
80
Kernel space
socket socket
192.168.0.1:11300 10.1.0.1:7231
On the other hand, most network daemons, and WEB servers are not an exception, use multiple processes4 to man-
age multiple connections. In other words, the parent process creates a different child process to serve each connection.
This is shown in Figure 16.4. In this example, we have two connections: one from a client in 192.168.0.1 and TCP port
11300 and another connection from a client in 10.1.0.1 and TCP port 7231. Each connection is served by a different
process, in this case with PID=45899 and PID=45896. Notice that there is no problem that two or more processes use
the same file descriptor (in our example sd=5), because the file descriptor table is local to each process and because the
kernel knows which is the TCP/IP tuple associated with the other end of the communication for each socket descriptor.
Notice that with this functionality provided by the kernel, a single process can easily manage multiple connections,
some as client and some as server simultaneously.
In the following sections, we show how to configure the basic parameters of a network interface and we review
some of the most popular services that are deployed using TCP/IP networks. The services that we are going to discuss
are build over TCP and they use the client/server model.
3 Typically using a DNS service, but you can also use a local file to do this name to IP address translation.
4 Indeed, multiple execution threads are used.
115
16.2 Basic Network Configuration
16.2.1 ifconfig
The ifconfig command is the short for interface configuration. This command is a system administration utility in
Unix-like operating systems to configure, control, and query TCP/IP network interface parameters from the command
line interface. Common uses for ifconfig include setting an interface’s IP address and netmask, and disabling or
enabling a given interface. In its simplest form, ifconfig can be used to set the IP address and mask of an interface
by typing:
16.2.2 netstat
The command netstat (network statistics) is a tool that displays network connections (both incoming and outgoing),
routing tables, and a number of network interface statistics. You should take a look at the man of netstat. In Table
16.1, we show some of the most significant parameters of this command.
16.2.3 services
The file /etc/services is used to map port numbers and protocols (tcp/udp) to service names.
16.2.4 lsof
The lsof command means “list open files” and it is used to report a list of all open files and the processes that opened
them. Open files in the system include disk files, pipes, network sockets and devices opened by all processes. In this
case, we will use lsof to list network sockets.
16.3 ping
The ping command is used to test the reachability of a host on an IP network. It also measures the round-trip time
of messages sent from the originating host to the destination host. The command operates by sending an echo-request
ICMP message to the target host and waiting for an echo-replay ICMP message as response. The ping command
also measures the time from transmission to reception (round-trip time) and records any packet loss. The results of
the test are printed in the form of a statistical summary of the response packets received, including the minimum,
maximum, and the mean round-trip times, and sometimes the standard deviation of the mean. The ping command
may be run using various options that enable special operational modes such as specifying the ICMP message size,
the time stamping options, the time to live parameter, etc. A couple of simple examples are the following:
116
$ ping 192.168.0.2
$ ping -c1 192.168.0.2
The first command line sends an echo-request ICMP message each second. You have to type CRL+c to stop it. The
second command line sends just a single echo-request and the command ends when the corresponding echo-replay is
received or after a timeout.
16.4 netcat
As previously mentioned, with Bash we can open network sockets in client mode but we cannot open server sockets
(ports for listening). Fortunately, we have Netcat5 . This command is a very useful tool for networking because it can
be used to open raw TCP and UDP sockets in client and server modes. In its simplest use, Netcat works as a client:
$ nc hostname port
In this example “hostname“ is the name or IP address of the destination host and ”port“ is the port of the server
process. The above command will try to establish a TCP connection to the specified host and port. The command will
fail if there is not any server listening on the specified port in the specified host.
We can use the -l (listening) option to make Netcat work as a server:
$ nc -l -p port
The previous command opens a socket on the specified port in server mode or for listening. In other words, with
the -l option the Netcat process is waiting for a client that establishes a connection. Once a client is connected, the
behavior of Netcat until it dies is as follows (see also Figure 16.5):
• Send to network. Netcat sends to the network the data received from stdin. This is performed by the Kernel
when Netcat performs a write system call with the corresponding sd.
• Receive from network. Netcat also writes to stdout the data received from the network. To do so, Netcat uses
a read system call with the corresponding sd.
192.168.0.2 192.168.0.1
As a first example, we are going to use netcat to run a server and a client that establish a TCP socket on the
local host. To do this, type:
$ nc -l -p 12345
5 This command is really useful for management tasks related to network and it is available in most Unix-like platforms, Microsoft Windows,
MAC, etc.
117
The above command creates a socket for TCP listening on port 12345.
To view the state of connections we can use the netstat command. To verify that there is a TCP socket
(option -t) in listening mode (option -l) on port 12345 you can type the following command:
Using the following command, we can see that there is no established connection on port 12345:
Now, we are going to open another pseudo-terminal and run a nc in client mode to establish a local TCP connection
on port 12345:
$ nc localhost 12345
An equivalent command to the previous one is nc 127.0.0.1 12345. This is because “localhost” is the name
for the address 127.0.0.1 which is the default IP address for the loopback interface. Now, from a third terminal, we
can observe with netstat that that the TCP connection is established:
We can also use lsof to see the file descriptor table of the Netcat processes. The nc client process (which in this
example has PID=4578) has the following file descriptor table:
The nc server process (which in this example has PID=4577) has the following file descriptor table:
The main options that we will use with netcat (traditional version) are shown in Table 24.3.
A couple of interesting applications of netcat:
• Transfer files:
118
Table 16.2: Netcat Options
-h show help.
-l listening or server mode (waiting for incoming client connections).
-p port local port.
-u UDP mode.
-e cmd execute cmd after the client connects..
-v verbose debugging (more with -vv).
-q secs quit after EOF on stdin and delay of secs.
-w secs timeout for connects and final net reads.
$ nc -l -p 12345 -e /bin/bash
We have to point out that the main limitation of Netcat is that when it runs as server it cannot manage multiple
connections. For this purpose, you need to use a specific server for the service you want to deploy or program your
own server using a language that let you use system calls (like C).
Finally, we have to mention that in the Ubuntu distro, the default Netcat application installed is from the netcat-
openbsd package. An alternative netcat is available in the netcat-traditional package, which is the one that we are
going to use. To install this version type:
# apt-get install netcat-traditional
In fact, when you type nc or netcat these are symbolic links. After installing the netcat-traditional package,
you have to change these symbolic links to point to the traditional version:
# rm /etc/alternatives/nc /etc/alternatives/netcat
# ln -s /bin/nc.traditional /etc/alternatives/nc
# ln -s /bin/nc.traditional /etc/alternatives/netcat
exec 3<>/dev/tcp/192.168.0.1/11300
The previous command opens for read and write a client TCP socket connected with a remote server at IP address
192.168.0.1 and port 11300. As you see, we use exec in the same way we use it to open file descriptors of regular
files. Then, to write to the socket, we do the same as with any other file descriptor:
6 In fact, to have this capability, our Bash must be compiled with the –enable-net-redirections flag.
119
echo "Write this to the socket" >&3
cat <&3
120
Chapter 17
Protocol Analyzer
121
122
Chapter 18
Protocol Analyzer
Once installed, you can run the protocol analyzer from a terminal typing wireshark. However, if you run
Wireshark with your unprivileged user and you try to click over the “Interface list” (see Figure 18.1), you will get the
following error message: “there are no interfaces on which a capture can be done”. This is because Unix-like systems
are not designed for accessing network interfaces directly but through the socket interface (as we have already seen).
Unprivileged users can open sockets (ports above 1024) but they can not capture raw network traffic. For this purpose,
you have to be root, so you will need to type (or login as root):
$ sudo wireshark
After you execute the previous command successfully, you will get the initial Wireshark screen (Figure 18.1).
As you may observe, now you have available the list of all the interfaces of the system1 for capturing. Then, you
can just click on one of the network interfaces to start capturing packets.
123
Figure 18.1: Initial Wireshark Screen.
destination link address that is not the one that has our network interface. Finally, notice that you can select a ”Capture
Filter”.
Let’s see some examples. For instance, to capture only traffic to or from IP address 172.18.5.4, you can type the
following capture filter:
host 172.18.5.4
To capture traffic to or from a range of IP addresses, you can type the following capture filter (both are equivalent):
net 192.168.0.0/24
net 192.168.0.0 mask 255.255.255.0
To capture traffic from a range of IP addresses, you can type the following capture filter (both are equivalent):
To capture traffic to a range of IP addresses, you can type the following capture filter (both are equivalent):
To capture only HTTP (port 80) traffic, you can type the following capture filter:
2 The capture filter syntax is the same as the one used by programs using the Lipcap (Linux) or Winpcap (Windows) library like the famous
command tcpdump.
124
Figure 18.2: Captured Packets.
port 80
To capture non-HTTP and non-SSH traffic on 192.168.0.1, you can type the following capture filter (both are
equivalent):
To capture all traffic except ICMP and HTTP traffic, you can type the following capture filter:
125
To capture traffic within a range of ports, for example TCP ports between 2001 and 2500, you can type the
following capture filter:
To capture packets with source IP address 10.4.1.12 or source network 10.6.0.0/16 and having destination TCP
port range from 2001 to 2500 and destination IP network 10.0.0.0/8, you can type the following capture filter:
(src host 10.4.1.12 or src net 10.6.0.0/16) and tcp dst portrange 2001-2500 and dst
net 10.0.0.0/8
tcp.port eq 80 or icmp
To display only traffic between workstations in the LAN 192.168.0.0/16, you can type the following display filter:
To match HTTP requests where the last characters in the URL/URI are the characters “html”, you can type the
following display filter:
Note: The $ character is a regular expression that matches the end of a string, in this case the end of http.request.uri
field.
126
Figure 18.4: Follow TCP stream.
127
128
Chapter 19
129
130
Chapter 20
$ telnet
telnet>open 192.168.0.1
$ telnet 192.168.0.1 23
Once the connection is established, TELNET provides a bidirectional interactive text-oriented communication and
the commands you type locally, are executed remotely in the host at the other end of the TCP/IP network. If at any
moment you need to return to the telnet sub-shell, you can to type “CRL+ALTGR+]“.
• User data is interspersed in-band with TELNET control information in an 8-bit oriented data connection over
the Transmission Control Protocol (TCP).
• All data octets are transmitted over the TCP transport without altering them except the value 255.
• Line endings also may suffer some alterations. The standard says that: ”The sequence CR LF must be treated
as a single new line character and used whenever their combined action is intended; the sequence CR NULL
131
must be used where a carriage return alone is actually desired and the CR character must be avoided in other
contexts“. So, to achieve a bare carriage return, the CR character (ASCII 13) must be followed by a NULL
character (ASCII 0), so if a client finds a CR character alone it will replace this character by CR NULL.
Note. Many times, telnet clients are used to establish interactive raw TCP connections. Despite most of the
times there will be no problem with this, it must be taken into account that the telnet clients apply the previous
rules and thus they might alter some little part of the data stream. If you want a pure raw TCP connection, you should
use the netcat tool. Finally, we must point out that because of security issues with telnet, its use for the purpose of
having a remote terminal has waned in favor of SSH (which is discussed later).
• In active mode, the client sends the server the IP address and port number on which the client will listen, and
the server initiates the TCP connection. In situations where the client is behind a firewall and unable to accept
incoming TCP connections, passive mode may be used.
• In passive mode, the client sends a PASV command to the server and receives an IP address and port number
in return. The client uses these to open the data connection to the server.
• ASCII mode: used for text. Data is converted, if needed, from the sending host’s character representation to
”8-bit ASCII“ before transmission, which uses line endings type CR+LF.
Note. Some ftp clients change the representation of the received file to the representation of receiving host’s line
ending. In this case, this mode can cause problems for transferring files that contain data other than plain text.
• Binary mode (also called image mode): the sending machine sends each file byte for byte, and the recipient
stores the byte-stream as it receives it.
• EBCDIC mode: use for plain text between hosts using the EBCDIC character set. This mode is otherwise like
ASCII mode.
• Local mode: Allows two computers with identical setups to send data in a proprietary format without the need
to convert it to ASCII.
132
20.2.4 Data transfer modes
Data transfer can be done in any of three modes:
• Stream mode: Data is sent as a continuous stream, relieving FTP from doing any processing. Rather, all
processing is left up to TCP. No End-of-file indicator is needed, unless the data is divided into records.
• Block mode: FTP breaks the data into several blocks (block header, byte count, and data field) and then passes
it on to TCP.[5]
• Compressed mode: Data is compressed using a single algorithm (usually run-length encoding).
$ ftp name
$ ftp 192.168.0.1
$ ftp [email protected]
To establish an FTP session you must know the ftp username and password. In case the session is anonymous,
typically, you can use the word ”anonymous“ as both username and password. When you enter your own loginname
and password, it returns the prompt ”ftp>“. This is a sub-shell in which you can type several subcommands. As
summary of these subcommands is shown in Table 20.1.
On the other hand, to use the ftp client in passive mode (default is active mode) you can type either ftp -p or
pftp. Finally, it is worth to mention that you can also use FTP through a Browser (and also with a several available
graphical applications). Browsers such as Firefox allow typing the following in the URL bar:
ftp://ftp.upc.edu
ftp://[email protected]
ftp://ftpusername:[email protected]
133
20.3 Super Servers
20.3.1 What is a super server?
The inetd server is sometimes referred to as the Internet ”super-server“ or ”super-daemon“ because inetd can
manage connections for several services. When a connection is received by inetd, it determines which program
the connection is destined for, spawns the particular process and delegates the socket to it. For services that are not
expected to run with high loads, this method uses memory more efficiently, when compared to running each daemon
individually in stand-alone mode since the specific servers run only when needed. For protocols that have frequent
traffic, such as HTTP, a dedicated or stand-alone server that intercepts the traffic directly may be preferable.
So, when a TCP packet or UDP packet arrives with a particular destination port number, inetd launches the
appropriate server program to handle the connection. Furthermore, no network code is required in the application-
specific daemons, as inetd hooks the sockets directly to stdin, stdout and stderr of the spawned process. In other
words, the application-specific daemon is invoked with the service socket as its standard input, output and error
descriptors.
20.3.2 Configuration
The inetd superdaemon is configured using the file /etc/inetd.conf. Each line in /etc/inetd.conf contains the follow-
ing fields:
service-name socket-type protocol {wait|nowait} user server-program [server-program-arguments]
For example, the configuration line for the telnet service is something similar to:
The previous configuration-line tells inetd to launch the program ”/usr/sbin/in.telnetd“ for the telnet service
using tcp.
Notes:
• Lines in /etc/inetd.conf starting with # are comments, i.e. inactive services.
• Be careful to not to leave any space at the beginning of inetd configuration lines.
Service names and ports are mapped in the configuration file /etc/services. You can check the default port for the
telnet service typing:
The previous result tells us that the telnet service uses the tcp port number 23.
Finally, if you change the configuration file of inetd, you have to restart it to apply the changes. To do so, you
can use one of the following commands:
# /etc/init.d/openbsd-inetd reload
# killall -HUP inetd
20.3.3 Notes*
Generally inetd handles TCP sockets by spawning a separate application-specific server to handle each connection
concurrently. UDP sockets are generally handled by a single application-specific server instance that handles all
packets on that port. Finally, some simple services, such as ECHO, are handled directly by inetd, without spawning
an external application-specific server.
You can also run a telnet daemon standalone. For this purpose, you can use in.telnetd in debug mode typing:
134
in.telnetd -debug <port number> &
When a daemon is started stand-alone it must open the port for listening, while when the daemon is started through
inetd, it just uses stdin, stdout and stderr, which are inherited from inetd.
20.3.4 Replacements*
In recent years, there have appeared several new super-servers that can be used to replace inetd. One of the most
extended is xinetd (eXtended InterNET Daemon). The xinetd super-daemon offers a more secure extension to
or version of inetd. This super-server features access control mechanisms, extensive logging capabilities, and the
ability to make services available based on time. It can place limits on the number of servers that the system can start,
and has deployable defense mechanisms to protect against port scanners, among other things. The global configuration
of xinetd can be usually found in the file /etc/xinetd.conf, while each application-specific configuration file can be
found in the /etc/xinetd.d/ directory.
20.5 Practices
20.6 Practices
Exercise 20.1– Start the scenario basic-netapps on your host platform by typing the following command:
host$ simctl basic-netapps start
This scenario has two virtual machines virt1 and virt2. Each virtual machine has two consoles (0 and 1)
enabled. E.g. to get the console 0 of virt1 type:
host$ simctl basic-netapps get virt1 0
1. Open the console 0 in virt1, which we call virt1.0 and also open virt2.0.
2. Figure out which is the port number of the service daytime using the configuration file /etc/services of
virt1.
3. In a windows-like OS, what port number you expect for the service daytime?
4. List the services that are active in virt1 under inetd.
5. Annotate the MAC and IP addresses of each interface on virt1 and virt2.
6. Find information and explain what is and for what can be used the loopback (lo) interface. Why lo does not
have a MAC address?
135
7. Assign the IP addresses 192.168.0.1 and 192.168.0.2 to the ethernet interfaces of virt1 to virt2, respectively.
8. Send 3 ICMP echo-requests from virt2 to virt1 with the ping command.
9. Restore the original IP addresses of each ethernet interface of each machine.
Exercise 20.2– Using the scenario basic-netapps, we are going to analyze the TELNET service.
Use the follow TCP stream option of wireshark and comment the differences between SSH with TELNET
about ports used and security.
Exercise 20.3– Using the scenario basic-netapps, we are going to use the Web.
1. Start the apache2 Web server on virt1 using the console virt1.0. To do so type the following command:
Do yo see a configuration line in inetd for apache2 why? Find out the PID of the process that is listening
on port 80 in virt1. Is this process inetd? Describe how you do it.
2. Edit the file /var/www/index.html with vi in virt1 and change some of the text (respecting the HTML tags).
Using a console in virt2 and the lynx command, which is a console (text-based) web browser, display the
web page of virt1.
3. Start a new capture of tap0 with wireshark in the host platform. Give the IP address 10.1.1.3 to the tap0
interface and open a firefox to see the web page served in virt1. Use the follow TCP stream option of
wireshark and roughly comment the protocols that you see.
Exercise 20.4– Using the scenario basic-netapps, we are going to use and analyze the tool netcat.
1. Start a new capture of tap0 with wireshark in the host platform and try the following command:
virt1.0$ nc -l -p 12345
136
Describe what the previous command does and also check the ports and open files related to the netcat
process.
2. Now try:
Describe the ports, the open files of each netcat process. Send some text from each machine and terminate
the connection. Describe also the capture and the behavior of the previous commands.
3. Create and test a command line with netcat that transfers the file /etc/services. Use the port 23456. Run the
command on virt1. From virt2 try to connect to the service with the commands nc and telnet.
4. Create and test a command line with netcat that emulates the daytime service using the port 12345 (hint: use
the date command). Run the command on virt1. From virt2 try to connect to the service with nc.
Exercise 20.5– Using the scenario basic-netapps, we are going to create a service under inetd.
1. Create a small script with netcat that listening on port 5555 gives us the amount of free disk of the host
(hint: use the df command). Explain the configuration, your tests and explain in as much detail as possible the
behavior of your network service: ports used, relationship between processes, filedescriptors, general behavior
of the service etc.
2. Implement the same service using inetd (without using netcat). Explain the configuration in the system,
the differences with respect to the netcat implementation and explain all details possible too.
Exercise 20.6– Using the scenario basic-netapps, we are going to analyze the FTP service.
1. You must allow the ftp access for the root user in virt1. To do so, you have to modify the configuration file
/etc/ftpusers by removing or commenting (using the # symbol at the beginning of the line) the line for the root
user.
2. Start a new capture of tap0 with wireshark in the host platform. Then, establish an FTP session from virt2
to virt1 using the root user and the console virt2.0. Use the follow TCP stream option of wireshark and
comment the protocol.
3. Using the console virt1.0 allow the ftp access to the root user. This is done by commenting the line that contains
root in the file /etc/ftpusers.
4. Start a new capture of tap0 with wireshark in the host platform. Establish an FTP session from virt2
to virt1 using the root user and the console virt2.0. Using the console virt2.1 check the ports and the file
descriptors used in virt2. Using the console virt1.0 check the ports and the file descriptors used in virt1.
5. Get all the files in /usr/bin that start with “z” and exit. Which is the default data representation for these
transmissions?
6. Use the follow TCP stream option of wireshark to comment the FTP dialogue that you see with port 21 of
the server.
7. Figure out also how the data (files) are transferred and which ports are used.
8. Look at the files you have downloaded in virt2. Check the permissions. Are these permissions the same as in
the server? When you finish, remove these files in the client.
9. (*) Now we are going to see the differences between the binary and ascii data representations. Download the
files /tmp/text-lf.txt and /tmp/text-crlf.txt in binary mode (default). In the client, rename them respectively as
text-cr-binary.txt and text-crlf-binary.txt.
137
10. (*) Now, download the same files in ascii mode. In the client, rename them respectively as text-lf-ascii.txt and
text2-crlf-ascii.txt.
11. (*) Use the commands hexdump and file in the client and the server to analyze these files. Explain your
conclusions and explain if our ftp client changes the line endings of the original files.
12. (*) Analyze with wireshark the number of bytes sent in each transmission. When transmitting the file text-
lf.txt, why is 11 bytes in binary mode and 13 bytes in ascii mode?
13. (*) Finally, try the passive mode and transfer any file you like. Explain how you do it and analyze and discuss
the differences regarding to active mode.
138
Part IV
Linux Advanced
139
Chapter 21
Shell Scripts
141
142
Chapter 22
Shell Scripts
22.1 Introduction
A shell is not just an interface for executing commands or applications, it is much more than this. Shells provide
us with a powerful programming language for creating new commands called ”shellscripts“ or simply “scripts“. A
script serves as “ glue” for making work together several commands or applications. Most Unix commands follow
philosophy: “Keep It Simple Stupid !”, which means that commands should be as simple as possible and address
specific tasks, and then, use the power of scripting to perform more complex tasks. Shellscripts were introduced
briefly in Section 4.4. This section, by means of examples, makes a more detailed introduction to the possibilities that
scripts offer.
22.2 Quoting
One of the main uses of quoting in when defining variables. Let’s define a variable called MYVAR1 whose value is
the word ”Hello“. Example:
$ MYVAR=Hello
Notice that there are not any spaces between the equal sign ”=“ and the value of the variable ”Hello”. Now, if we
want to define a variable with a value that contains spaces or tabs we need to use quotes. Example:
$ MYVAR=’Hello World’
Single quotes mean that the complete string ’Hello World‘ (including spaces) must be assigned to the variable.
Then, a “Hello World” script could be:
#!/bin/bash
MYVAR='Hello world'
echo $MYVAR
The dollar sign “$” before a variable name means that we want to use its value. Now, we want to use the value of
MYVAR to define another variable, say MYVAR2. For this purpose, let’s try the following script:
#!/bin/bash
MYVAR='Hello world'
MYVAR2= '$MYVAR, How are you?'
echo $MYVAR2
1 In general, you can use many combination of letters, numbers and other signs to define variables but by convention, we will not use lowercase.
143
Table 22.1: Types of quotes
Quotes Meaning
’ simple The text between single quotes is treated as a literal (without modifications). In bash, we say that it is not expanded.
" double The text between double quotes is treated as a literal except for what follows to characters \, `and $.
` reversed The text between reverse quotes is interpreted as a command, which is executed and whose output is used as value. In bash,
this is known as command expansion.
If you execute the previous script, you will obtain as output: $MYVAR, How are you?, which is not the
expected result. To be able to use the value of a variable and also quoting (to be able to assign a value with spaces)
you have to use double quotes. In general, there are three types of quotes as shown in Table 22.1. Now, if you try the
following script you will obtain the desired result:
#!/bin/bash
MYVAR='Hello world'
echo "$MYVAR, how are you?"
Finally, the third type of quoting are reverse quotes. These quotes cause the quoted text to be interpreted as a shell
command. Then, the command is expanded to its output. Example:
On the other hand, we must point out that when you expand a variable, you may have problems if it is immediately
followed by text. In this case, you have to use braces {}. Let’s see an example:
$ MYVAR=’Hello world’
$ echo "foo$MYVARbar"
foo
We were expecting the output “fooHello worldbar”, but we did not obtain this result. This is because the variable
expansion of bash was confused. Bash could not tell if we wanted to expand $M, $MY $MYVAR, $MYVARbar, etc.
We have to use braces to resolve the ambiguity:
$ echo foo${MYBAR}bar
fooHello worldbar
144
As shown in the example above, positional parameters are considered to be separated by spaces. You have to use
single or double quotes if you want to specify a single positional parameter assigned to a text string that contains
multiple words (i.e. a text containing spaces or tabs). For example, create the following script:
#!/bin/bash
echo Script name is "$0"
echo First positional parameter $1
echo Second positional parameter $2
echo Third positional parameter $3
echo The number of positional parameters is $#
echo The PID of the script is $$
As you can observe, there are parameters that have a special meaning: $# expands to the number of positional
parameters, $$ expands to the PID of the bash that is executing the script, and $@ expands to all the positional
parameters.
22.4 Expansions
Before executing your commands, bash checks whether there are any syntax elements in the command line that should
be interpreted rather than taken literally. After splitting the command line into tokens (words), bash scans for these
special elements and interprets them, resulting in a changed command line: the elements are said to be expanded to or
substituted to new text and maybe new tokens (words). We have several types of expansions. In processing order they
are:
• Brace Expansion: create multiple text combinations.
Syntax: {X,Y,Z} {X..Y} {X..Y..Z}
• Tilde Expansion: expand useful pathnames home dir, working dir and previous working dir.
Syntax: ~ ~+ ~-
• Parameter Expansion: how bash expands variables to their values.
Syntax: $PARAM ${PARAM...}
• Command Substitution: using the output of a command as an argument.
Syntax: $(COMMAND) `COMMAND `
• Arithmetic Expansion: how to use arithmetics.
Syntax: $((EXPRESSION)) $[EXPRESSION]
• Process Substitution: a way to write and read to and from a command.
Syntax: <(COMMAND) >(COMMAND)
• Filename Expansion: a shorthand for specifying filenames matching patterns.
Syntax: *.txt page_1?.html
145
22.4.1 Brace Expansion
Brace expansions are used to generate all possible combinations with the optional surrounding preambles and postscripts.
The general syntax is: [preamble]{X,Y[,...]}[postscript] Examples:
$ echo a{b,c,d}e
abe ace ade
$ echo "a"{b,c,d}"e"
abe ace ade
$ echo "a{b,c,d}e" # No expansion because of the double quotes
a{b,c,d}e
$ mkdir $HOME/{bin,lib,doc} # Create $HOME/bin, $HOME/lib and $HOME/doc
The are also a couple of alternative syntaxes for brace expansions using two dots:
$ echo {5..12}
5 6 7 8 9 10 11 12
$ echo {c..k}
c d e f g h i j k
$ echo {1..10..2}
1 3 5 7 9
$ cd /
/$ cd /usr
/usr$ cd ~-
/$ cd ~
~$ cd /etc
/etc$ echo ~+
/etc
${VAR:-string} If the parameter VAR is not assigned, the expansion results in ’string’. Otherwise, the expansion
returns the value of VAR. For example:
146
$ VAR1=’Variable VAR1 defined’
$ echo ${VAR1:-Variable not defined}
Variable VAR1 defined
$ echo ${VAR2:-Variable not defined}
Variable not defined
USER=${1:-joe}
${VAR:=string} If the parameter VAR is not assigned, VAR is assigned to ’string’ and the expansion returns the
assigned value. Note. Positional and special parameters cannot be assigned this way.
$ MYVAR=‘dirname /usr/local/share/doc/foo/foo.txt‘
$ echo $MYVAR
/usr/local/share/doc/foo
$ echo $(basename /usr/local/share/doc/foo/foo.txt)
foo.txt
We also can use command substitution together with parameter expansion. Example:
We also can use the extern command expr to do arithmetic operations. However, if you use expr, your script
will create a new process, making the processing of arithmetics less efficient. Example:
$ X=`expr 3 \* 2 + 7`
$ echo $X
13
Note. Bash cannot handle floating point calculations, and it lacks operators for certain important mathematical
functions. For this purpose you can use bc.
147
22.4.6 Process Substitution
Process substitution is explained in Section 8.7.
Examples:
$ ls *.txt
file.txt doc.txt tutorial.txt
$ ls [ft]*
file.txt tutorial.txt
$ ls [a-h]*.txt
file.txt doc.txt
$ ls *.??
script.sh file.id
22.5.1 If
The clause if is the most basic form of conditional. The syntax is:
if expression then statement1 else statement2 fi.
Where statement1 is only executed if expression evaluates to true and statement2 is only executed if expression
evaluates to false. Examples:
if [ -e /etc/file.txt ]
then
echo "/etc/file.txt exists"
else
echo "/etc/file.txt does not exist"
fi
In this case, the expression uses the option -e file, which evaluates to true only if “file” exists. Be careful,
you must leave spaces between “[” and “]” and the expression inside. With the symbol “!” you can do inverse logic.
Example:
if [ ! -e /etc/file.txt ]
then
echo "/etc/file.txt does not exist"
else
echo "/etc/file.txt exists"
fi
148
We can also create expressions that always evaluate to true or false. Examples:
if true
then
echo "this will always be printed"
else
echo "this will never be printed"
fi
if false
then
echo "this will never be printed"
else
echo "this will always be printed"
fi
String comparison:
Arithmetic operators:
The syntax ((...)) for arithmetic expansion can also be used in conditional expressions. The syntax ((...)) supports
the following relational operators: ==, ! =, >, <, >=, y <=. Example:
if ((VAR == Y * 3 + X * 2))
then
echo "The variable VAR is equal to Y * 3 + X * 2"
fi
We can also use conditional expressions with the OR or with the AND of two conditions:
[ -e filename -o -d filename ]
[ -e filename -a -d filename ]
149
Finally, it is worth to mention that in general it is a good practice to quote variables inside your conditional
expressions. If you don’t quote your variables you might have problems with spaces and tabs. Example:
In the previous example the expression might not behave as you expect. If the value of VAR is “foo”, we will see
the output “not match”, but if the value of VAR is ”foo bar oni”, bash will report an error saying ”too many arguments”.
The problem is that spaces present in the value of VAR confused bash. In this case, the correct comparison is: if [
”$VAR” = ’foo bar oni’ ]. Recall that you have to use double quotes to use the value of a variable (i.e. to
allow parameter expansion).
The special variable $? is a shell built-in variable that contains the exit status of the last executed command. If
we are executing a script, $? returns the exit status of the last executed command in the script or the number after the
keyword exit. Next, we show an example:
command
if [ "$?" -ne 0]
then
echo "the previous command failed"
exit 1
fi
The clause exit allows us to specify the exit status of the script. In the previous example we reported failure
because one is greater than zero. We can also replace the previous code by:
Finally, we can also use the return code of conditions. Conditions have a return code of 0 is the condition is true
or 1 if the condition is false. Using this, we can get rid of the keyword if in some conditionals. Example:
$ [ -e filename ] && echo "filename exists" || echo "filename does not exist"
The first part of the previous command-line evaluates the condition and returns 0 if the condition is true or 1 if the
condition is false. Based on this return code, the echo is executed.
150
22.5.3 for
A for loop is a bash programming language statement which allows code to be repeatedly executed. A for loop is
classified as an iteration statement i.e. it is the repetition of a process within a bash script. The syntax is:
The previous for loop executes K times a set of N commands. You can also use the values (item1, item2, etc.)
that takes your control VARIABLE in the execution of the block of commands. Example:
#!/bin/bash
for X in one two three four
do
echo number $X
done
Script's output:
number one
number two
number three
number four
In fact, the loop accepts any list after the key word “in“, including listings of the file system. Example:
#!/bin/bash
for FILE in /etc/r*
do
if [ -d "$FILE" ]
then
echo "$FILE (dir)"
else
echo "$FILE"
fi
done
Script's output:
/etc/rc.d (dir)
/etc/resolv.conf
/etc/rpc
Furthermore, we can use filename expansion to create the list of files/folders Example:
151
In the previous example the expansion is relative to the script location. We can make loops with the positional
parameters of the script as follows:
#!/bin/bash
for THING in "$@"
do
echo You have typed: ${THING}.
done
With the command seq or with the syntax (()) we can generate C-styled loops:
#!/bin/bash
for i in `seq 1 10`;
do
echo $i
done
#!/bin/bash
for (( i=1; i < 10; i++));
do
echo $i
done
Remark. Bash scripts are not compiled but interpreted by bash. This impacts their performance, which is poorer
than compiled programs. If you need an intensive usage of loops, you should consider using a compiled program (for
example, written in C).
22.5.4 while
A while loop is another bash programming statement which allows code to be repeatedly executed. Loops with
while execute a code block while a expression is true. For example:
#!/bin/bash
X=0
while [ $X -le 20 ]
do
echo $X
X=$((X+1))
done
The loop is executed while the variable X is less or equal (-le) than 20. We can also create infinite loops, example:
#!/bin/bash
while true
do
sleep 5
echo "Hello I waked up"
done
22.5.5 case
A case construction is a bash programming language statement which is used to test a variable against set of pat-
terns. Often case statement let’s you express a series of if-then-else statements that check single variable for various
conditions or ranges in a more concise way. The generic syntax of case is the following:
case VARIABLE in
pattern1)
152
1st block of code ;;
pattern2)
2nd block of code ;;
...
esac
A pattern can actually be formed of several subpatterns separated by pipe character "|". If the VARIABLE matches
one of the patterns (or subpatterns), its corresponding code block is executed. The patterns are checked in order until
a match is found; if none is found, nothing happens.
For example:
#!/bin/bash
for FILE in $*; do
case $FILE in
*.jpg | *.jpeg | *.JPG | *.JPEG)
echo The file $FILE seems a JPG file.
;;
*.avi | *.AVI)
echo "The filename $FILE has an AVI extension"
;;
-h)
echo "Use as: $0 [list of filenames]"
echo "Type $0 -h for help" ;;
*)
echo "Using the extension, I don’t now which type of file is $FILE."
echo "Use as: $0 [list of filenames]"
echo "Type $0 -h for help" ;;
esac
done
The final pattern is *, which is a catchall for whatever didn’t match the other cases.
As you see there is a different behavior in comparison to echo command. No new line had been printed out as it it
in case of when using default setting of echo command. To print a new line we need to supply printf with format
string with escape sequence \n (new line):
or
153
$ printf "%s\n" "hello printf" "in" "bash script"
hello printf
in
bash script
As you could observe in the previous examples we have used %s as a format specifier. Specifier %s means to print all
argument in literal form. The specifiers are replaced by their corresponding arguments. Example:
$ printf "%s\t%s\n" "1" "2 3" "4" "5"
1 2 3
4 5
The %b specifier is essentially the same as %s but it allows us to interpret escape sequences with an argument.
Example:
$ printf "%s\n" "1" "2" "\n3"
1
2
\n3
$ printf "%b\n" "1" "2" "\n3"
1
2
3
$
To print integers, we can use the %d specifier:
$ printf "%d\n" 255 0xff 0377 3.5
255
255
255
bash: printf: 3.5: invalid number
3
As you can see %d specifiers refuses to print anything than integers. To printf floating point numbers use %f:
$ printf "%f\n" 255 0xff 0377 3.5
255.000000
255.000000
377.000000
3.500000
The default behavior of %f printf specifier is to print floating point numbers with 6 decimal places. To limit a
decimal places to 1 we can specify a precision in a following manner:
154
You can also print ASCII characters using their hex or octal notation:
$ printf "\x41\n"
A
$ printf "\101\n"
A
In the previous script, we defined the function zip_contents(). This function accepts one argument: the
filename of a zip file and shows its content. Observing the code, you can see that in the script we use the first
positional parameter received in the command-line as the first argument for the function. To test the script, try:
$ zip -r etc.zip /etc
$ myscript.sh home.zip
The other special variables have also a meaning related to the function. For example, after the execution of
a function, the variable $? returns the exit status of the last command executed in the function or the value after
the keyword return or exit. The difference between return and exit is that the former does not finish the
execution of the script, while the latter exits the script.
22.7.2 Variables
Unlike many other programming languages, by default bash does not segregate its variables by type. Essentially,
bash variables are character strings, but, depending on context, bash permits arithmetic operations and comparisons
on variables. The determining factor is whether the value of a variable contains only digits or not. This design is
very flexible for scripting, but it can also cause difficult to debug errors. Anyway, you can explicitly define a variable
as numerical using the bash built-in let or equivalently ”declare -i VAR“. The built-in declare can be
used also to declare a variable as read-only (declare -r VAR). On the other hand, we will use the convention of
defining the name of the variables in capital letters to distinguish them easily from commands and functions. Finally,
it is important to know that shell variables have one of the following scopes: local variables, environment variables,
position parameters and special variables.
155
Local Variables
Simply stated, local variables are those variables used within a script, but there are subtleties about local variables that
you must be aware of. In most complied languages like C, when you create a variable inside a function, the variable is
placed in a separate namespace regarding the general program.
Let’s us consider that you write a program in C in which you define a function called my_function, you define
a variable inside this function called ’X’, and finally, you define another variable outside the function also called ’X’.
In this case, these two variables have different local scopes so if you modify the variable ’X’ inside the function, the
variable ’X’ defined outside the function is not altered. While this is the default behavior of most compiled languages,
it is not valid for bash scripts.
In a bash script, when you create a variable inside a function, it is added to the script’s namespace. This means that
if we set a variable inside a function with the same name as a variable defined outside the function, we will override the
value of the variable. Furthermore, a variable defined inside a function will exist after the execution of the function.
Let’s illustrate this issue with an example:
#!/bin/bash
VAR=hello
my_function(){
VAR="one two three"
for X in $VAR
do
echo -n $X
done
}
my_function
echo $VAR
When you run this script the output is: ’onetwothreethree”. If you really want to declare VAR as a local vari-
able, whose scope is only its related function, you have to use the keyword local, which defines a local namespace
for the function. The following example illustrates this issue:
#!/bin/bash
VAR=hello
my_function(){
local VAR="one two three"
for X in $VAR
do
echo -n $X
done
}
my_function
echo $VAR
In this case, the output is: ’onetwothreehello’, so the global variable VAR is not affected by the execution
of the function.
Environment variables
Any process in the system has a set variables that defines its execution context. These variables are called environment
or global variables. Shells are not an exception. Each shell instance has its own environment variables. If you open
a terminal, you can add environment variables to the associated shell in the same manner you define variables in your
scripts. Example:
$ ps
PID TTY TIME CMD
156
5785 pts/2 00:00:00 bash
5937 pts/2 00:00:00 ps
$ MY_VAR=hello
$ echo $MY_VAR
hello
In the previous example, we added MY_VAR as a new environment variable for the bash instance identified by
PID 5785. However, if you create a script that uses MY_VAR and execute it from the previous terminal, you will see
that the variable is not declared. This is because when you execute a shellscript from a terminal or a pseudo-terminal,
by default, the associated bash creates a child bash, which is who really executes the commands of the script. For
this execution, the child bash only inherits the “exported context”. The problem is that variables (or functions) are
not exported by default. To export a variable (or a function) you have to use the keyword export or alternatively
declare -x:
• Export a variable. Syntax: export variable or declare -x variable.
• Export a function. Syntax: export -f function_name.
• View context. Syntax: declare.
• View exported context. Syntax: export or declare -x.
For example, create a script called env-script.sh as follows:
#!/bin/bash
# file: env-script.sh
echo $MY_VAR
157
Table 22.4: Typical environment variables
Variable Meaning
SHELL=/bin/bash The shell that you are using.
HOME=/home/student Your home directory.
LOGNAME=student Your login name.
OSTYPE=linux-gnu Your operating system.
PATH=/usr/bin:/sbin:/bin Directories where bash will try to locate executables.
PWD=/home/student/documents Your current directory.
USER=student Your current username.
PPID=45678 Your parent PID.
echo $$
2119
$ ./script.sh
PID of our parent process 2119
PID of our process 2225
Showing the process tree from PID=2119:
bash---bash---pstree
$ source ./script.sh
PID of our parent process 2114
PID of our process 2119
Showing the process tree from PID=2114:
gnome-terminal---bash---pstree
As you observe, when script.sh is executed with source, the bash does not create a child bash but executes
the commands itself. An alternative syntax for source is a single dot, so the two following command-lines are
equivalent:
$ source ./pids.sh
$ . ./pids.sh
158
Now, we must explain which is the main utility of executing scripts with source. “Sourcing“ is a way of
including variables and functions in a script from the file of another script. This is a way of creating an environment
but without having to ”export“ every variable or function. Using the example of the function zip_contents() of
Section 22.7.1, we create a file called my_zip_lib.sh. This file will be our “library” and it will contain the function
zip_contents() and the definition of a couple of variables:
#!/bin/bash
# file my_zip_lib.sh
OPTION="-l"
VAR="another variable..."
zip_contents() {
echo "Contents of $1: "
unzip $OPTION $1
}
Now, we can use ’source’ to make the function and the variables defined in this file available to our script.
#!/bin/bash
# script.sh
source my_zip_lib.sh
echo The variable OPTION from sourced from my_zip_lib.sh is: $OPTION
echo Introduce the name of the zip file:
read FILE
zip_contents $FILE
}
Note. If you don’t use source to execute “my_zip_lib.sh” inside your script, you will not get the variables/func-
tions available, since the child bash that is executing your script will create another child bash which is the one that
will receive these variables/functions but that will be destroyed after “my_zip_lib.sh” is executed.
Position parameters
Position parameters are special variables that contain the arguments with which a script or a function is invoked.
These parameters have already been introduced in Section 22.3. A useful command to manipulate position parameters
is shift. This command moves to the left the list of parameters for a more comfortable processing of position
parameters.
Special variables
Special variables indicate the status of a process. They are treated and modified directly by the shell so they are read-
only variables. In the above examples, we have already used most of them. For example, the variable $$ contains
the PID of the running process. The variable $$ is commonly used to assign names to files related to the process for
temporary storage. As the PID is unique in the system, this is a way of easily identifying files related to a particular
running process. Table 22.5 briefly summarizes these variables.
22.8 Extra
22.8.1 *Debug Scripts
Debugging facilities are a standard feature of compilers and interpreters, and bash is no different in this regard. You can
instruct bash to print debugging output as it interprets you scripts. When running in this mode, bash prints commands
and their arguments before they are executed. The following simple script greets the user and prints the current date:
159
Table 22.5: Special variables
Variable Meaning
$$ Process PID.
$* String with all the parameters received.
$@ Same as above but treats each parameter as a different word.
$# Number of parameters.
$? Exit status (or return code) of last command
(0=normal, >0=error).
$! PID of the last process executed in background.
$_ Value of last argument of the command previously executed.
#!/bin/bash
echo "Hello $USER,"
echo "Today is $(date +'%Y-%m-%d')"
$ bash -x example_script.sh
+ echo ’Hello user1,’
Hello user1,
++ date +%Y-%m-%d
+ echo ’Today is 2011-10-24’
Today is 2011-10-24
In this mode, Bash prints each command (with its expanded arguments) before executing it. Debugging output is
prefixed with a number of + signs to indicate nesting. This output helps you see exactly what the script is doing, and
understand why it is not behaving as expected. In large scripts, it may be helpful to prefix this debugging output with
the script name, line number and function name. You can do this by setting the following environment variable:
$ export PS4=’+${BASH_SOURCE}:${LINENO}:${FUNCNAME[0]}: ’
Let’s trace our example script again to see the new debugging output:
$ bash -x example_script.sh
+example_script.sh:2:: echo ’Hello user1,’
Hello user1,
++example_script.sh:3:: date +%Y-%m-%d
+example_script.sh:3:: echo ’Today is 2011-10-24’
Today is 2011-10-24
Sometimes, you are only interested in tracing one part of your script. This can be done by calling set -x where
you want to enable tracing, and calling set +x to disable it. Example:
#!/bin/bash
echo "Hello $USER,"
set -x
echo "Today is $(date %Y-%m-%d)"
set +x
You can run the script and you no longer need to run the script with bash -x. On the other hand, tracing script
execution is sometimes too verbose, especially if you are only interested in a limited number of events, like calling a
certain function or entering a certain loop. In this case, it is better to log the events you are interested in. Logging can
be achieved with something as simple as a function that prints a string to stderr:
_log() {
if [ "$_DEBUG" == "true" ]; then
160
echo 1>&2 "$@"
fi
}
Now you can embed logging messages into your script by calling this function:
_log "Copying files..."
cp src/* dst/
Log messages are printed only if the _DEBUG variable is set to true. This allows you to toggle the printing of log
messages depending on your needs. You do not need to modify your script in order to change this variable; you can
set it on the command line:
$ _DEBUG=true ./example_script.sh
Finally, if you are writing a complex script and you need a full-fledged debugger to debug it, then you can use
bashdb, the Bash debugger.
22.8.2 *Arrays
An array is a set of values identified under a unique variable name. Each value in an array can be accessed using the
array name and an index. The built-in declare -a is used to declare arrays in bash. Bash supports single dimension
arrays with a single numerical index but no size restrictions for the elements of the array. The values of the array can
be assigned individually or in combination (several elements). When the special characters [] or [*] are used as index
of the array, they denote all the values contained in the array. Next, we show the use of arrays with some examples:
22.8.3 *Builtins
A “builtin” command is a command that is built into the shell so that the shell does not fork a new process. The result
is that builtins run faster and can alter the environment of the current shell. The shell will always attempt to execute
the builtin command before trying to execute a utility with the same name. For more information on the builtins:
$ info bash
161
Table 22.6: Bash builtins.
Builtin Function
: returns exit status of 0
. execute shell script from current process
bg places suspended job in background
break exit from loop
cd change to another directory
continue start with next iteration of loop
declare display variables or declare variable
echo display arguments
eval scan and evaluate command line
exec execute and open files
exit exit current shell
export place variable in the environment
fg bring job to foreground
getopts parse arguments to shell script
jobs display list of background jobs
kill send signal to terminate process
pwd present working directory
read read line from standard input
readonly declare variable to be read only
set set command-line variables
shift promote each command-line argument
test compare arguments
times display total times for current shell and children
trap trap a signal
umask return value of file creation mask
unset remove variable or function
wait [pid] wait for background process to complete
Indirection
${!PARAMETER}
The referenced parameter is not PARAMETER itself, but the parameter named by the value of it. Example:
Case modification
${PARAMETER^} ${PARAMETER^^} ${PARAMETER,} ${PARAMETER,,}
The ^ operator modifies the first character to uppercase, the , operator to lowercase. When using the double-form
all characters are converted.
$ echo ${!BASH*}
BASH BASH_ARGC BASH_ARGV BASH_COMMAND BASH_LINENO BASH_SOURCE ...
This will show all defined variable names (not values) beginning with BASH.
162
Substring removal
${PARAMETER#PATTERN} ${PARAMETER##PATTERN} ${PARAMETER%PATTERN} ${PARAMETER%%PATTERN}
Expand only a part of a parameter’s value, given a pattern to describe what to remove from the string. The operator
"#" will try to remove the shortest text matching the pattern from the beginning, while "##" tries to do it with the
longest text matching from the beginning. The operator "%" will try to remove the shortest text matching the pattern
from the end, while "%%" tries to do it with the longest text matching from the end. Examples:
$ PATHNAME=/usr/bin/apt-get
$ echo ${PATHNAME##*/}
apt-get
$ echo ${PATHNAME#*/}
usr/bin/apt-get
$ VAR="today is my day"
$ echo ${VAR/day/XX}
toXX is my day
$ echo ${VAR//day/XX}
toXX is my XX
String length
${#PARAMETER}
The length of the parameter’s value is expanded.
Substring expansion
${PARAMETER:OFFSET} cells ${PARAMETER:OFFSET:LENGTH}
This one can expand only a part of a parameter’s value, given a position to start and maybe a length.
163
Use an alternate value
${PARAMETER:+string}
${PARAMETER+string}
This form expands to nothing if the parameter is unset or empty. If it is set, it does not expand to the parameter’s
value, but to some text you can specify.
22.9 Summary
164
Table 22.8: Common keywords and symbols used in shell scripts.
$VAR Value of variable VAR.
’ Quoting literals (without modifications).
" Quoting except \ ` $.
` Command expansion.
$1 ... Positional parameters.
{X,Y,Z} {X..Y} {X..Y..Z} Brace Expansion.
~ ~+ ~- Tilde Expansion.
$PARAM ${PARAM...} Parameter Expansion.
$(COMMAND) `COMMAND ` Command Substitution.
$((EXPRESSION)) $[EXPRESSION] Arithmetic Expansion.
<(COMMAND) >(COMMAND) Process Substitution.
*.txt page_1?.html Filename Expansions.
* For filename expansion, any string of characters, including a null (empty) string.
? For filename expansion, any unique character.
[List] For filename expansion, a unique character from the list.
[^List] For filename expansion, a unique character not from the list.
if [ expresssion ]; then; cmd; else; cmd; fi Basic conditional.
for VARIABLE in list; do; cmd; done iteration statement.
while [ expresssion ]; do; cmd; done another iteration statement.
case string in ; pattern1); cmd ;; esac Conditional multiple.
function function_name or function_name() define a function.
let or declare -i Define a variable as numerical.
local Define a variable local to a function.
export or declare -x Define an exported variable.
export -f Define an exported function.
source Execute the script without a child bash.
shift Move left the list of positional parameters.
$$ Process PID.
$* String with all the parameters received.
$@ Same as above but treats each parameter as a different word.
$# Number of parameters.
$? Exit status (or return code) of last command (0=normal, >0=error).
$! PID of the last process executed in background.
$_ Value of last argument of the command previously executed.
Table 22.8 summarizes the keywords and commands used within this section.
22.10 Practices
22.11 Practices
#!/bin/bash
# AUTHOR: teacher
# DATE: 4/10/2011
# NAME: shellinfo.sh
# SYNOPSIS: shellinfo.sh [arg1 arg2 ... argN]
# DESCRIPTION: Provides information about the script.
# HISTORY: First version
165
if [ $# -gt 0 ]; then
I=1
for PARAM in $@
do
echo "Parameter \$$I is $PARAM"
((I++))
done
fi
Exercise 22.3– Develop a script that calculates the square root of two cathetus. Use a function with local variables
and arithmetic expansions.
Exercise 22.4– Describe in detail line by line the following script files:
#!/bin/bash
# AUTHOR: teacher
# DATE: 4/10/2011
# NAME: fill_terminal_procedure.sh
# SYNOPSIS: fill_terminal arg
166
# DESCRIPTION: Procedure to fill the terminal with a printable character
# FUNCTION NAME: fill_terminal:
# OUTPUT: none
# RETURN CODES: 0-success 1-bad-number-of-args 2-not-a-printable-character.
# HISTORY: First version
fill_terminal() {
source fill_terminal_procedure.sh
fill_terminal $@
case $? in
0)
exit 0 ;;
1)
echo "I need one argument (an hex value)" >&2 ; exit 1 ;;
2)
echo "Not printable character. Try one between 0x21 and 0x7F" >&2 ; exit 1 ;;
*)
echo "Internal error" >&2 ; exit 1
esac
Exercise 22.5– The following script illustrates how to use functions recursively. Describe it in detail line by line.
#!/bin/bash
167
# AUTHOR: teacher
# DATE: 4/10/2011
# NAME: recfind.sh
# SYNOPSIS: recfind.sh file_to_be_found
# DESCRIPTION: Search recursively a file from the working directory
# HISTORY: First version
# Function: search_in_dir
# Arguments: search directory
function search_in_dir() {
local fileitem
[ $DEBUG -eq 1 ] && echo "Entrant a $1"
cd $1
for fileitem in *
do
if [ -d $fileitem ]; then
search_in_dir $fileitem
elif [ "$fileitem" = "$FILE_IN_SEARCH" ]; then
echo ‘pwd‘/$fileitem
fi
done
[ $DEBUG -eq 1 ] && echo "Sortint de $1"
cd ..
}
DEBUG=0
if [ $# -ne 1 ]; then
echo "Usage: $0 file_to_search"
exit 1
fi
FILE_IN_SEARCH=$1
search_in_dir ‘pwd‘
Exercise 22.6– Using a function recursively, develop a script to calculate the factorial of a number.
168
Chapter 23
System Administration
169
170
Chapter 24
System Administration
• Superuser, administrator or root user. This user has a special account which is used for system administration.
The root user has granted all the rights over all the files and processes.
• Regular users. A user account provides access to the system for users and groups of users which usually have a
limited access to critical resources such as files and directories.
• Special users. The accounts of special users are not used by human beings but they are used by internal system
services. Examples of such users are www-data, lp, etc.
Note. Some services run using the root user but the use of special users is preferred for security reasons.
The minimal data to create a user account is the name of the user or User ID, a password and a personal directory
or home.
/etc/passwd
/etc/shadow
/etc/group
/etc/gshadow
/etc/skel
user1:x:1000:1000:Mike Smith,,,:/home/user1:/bin/bash
root:x:0:0:root:/root:/bin/bash
Note. If it appears :: this means that the corresponding field is empty. The fields have the following meaning:
1. user1: this field contains the name of the user, which must be unique within the system.
2. x: this field contains the encoded password. The ”x” means that the password is not in this file but it is in the
file /etc/shadow.
171
3. 1000: this field is the number assigned to the user, which must be unique within the system.
4. 1000: this field is the number of the default group. The members of the different groups of the system are
defined in the file /etc/group.
5. Mike Smith: this field is optional and it is normally used to store the full name of the user.
On the other hand, the /etc/shadow file essentially contains the encrypted password liked with the user name
and some extra information about the account. A sample text line of this file is the following:
user1:a1gNcs82ICst8CjVJS7ZFCVnu0N2pBcn/:12208:0:99999:7:::
Where we can observe the following fields:
2. Password: It your encrypted password. The password should be minimum 6-8 characters long including special
characters/digits
3. Last password change (lastchanged): Days since Jan 1, 1970 that password was last changed
4. Minimum: The minimum number of days required between password changes i.e. the number of days left
before the user is allowed to change his/her password
5. Maximum: The maximum number of days the password is valid (after that user is forced to change his/her
password)
6. Warn : The number of days before password is to expire that user is warned that his/her password must be
changed
7. Inactive : The number of days after password expires that account is disabled
8. Expire : days since Jan 1, 1970 that account is disabled i.e. an absolute date specifying when the login may no
longer be used
Another important configuration file related with user management is /etc/group, which contains information
of system groups. A sample text line of this file is the following:
users:x:1000:user1,user2
Where the fields follow this format:
group-name:password-group:ID-group:users-list
The users-list is optional since this information is already stored in the /etc/passwd file. Groups can also have
an associated password (although this is not very common). Typically, the encrypted password of the group is stored
in another file called /etc/gshadow.
Another interesting configuration directory is /etc/skel. This directory contains the "skeleton" that is copied
to each user’s home when the user is created.
172
• usermod: modify a user.
• passwd: change your password. If you are root you can also change the password of other users.
• su: switch user (change of user). Go to section 24.4 for further information.
• who: shows who is logged in the system and their associated terminals.
• id: prints the real and effective user and group IDs.
Finally, the command chown can be used to change the owner and the group of a file. Example:
The above command sets a new owner (user1) for the file notes.txt. Only the superuser can change the owner
of a file and users can change the group of a file. Obviously, a user can only change the group of a file to a one which
she belongs. The “-R” option is commonly used to make the change recursive. Example:
The command above sets the group ”student” to ”directory1” and all its contents.
$ sudo command
Replace command with the command for which you want to use sudo. If your user is in the sudoers for any
command you can get a “root shell” using:
user$ sudo -s
root$
On the other hand, the su command stands for "switch user", and allows you to become another user. To use the
su command on a per-command basis, enter:
$ su user -c command
Replace user with the name of the account which you’d like to run the command as, and command with the
command you need to run as another user. To switch users before running many commands, enter:
$ su user
173
Replace user with the name of the account which you’d like to run the commands as. The user feature is optional;
if you don’t provide a user, the su command defaults to the root account, which in Unix is the system administrator
account. In either case, you’ll be prompted for the password associated with the account for which you’re trying to
run the command. If you supply a user, you will be logged in as that account until you exit it. To do so, press Ctrl-d or
type exit at the command prompt.
Using su creates security hazards, is potentially dangerous, and requires more administrative maintenance. It’s
not good practice to have numerous people knowing and using the root password because when logged in as root, you
can do anything to the system. Notice that, on the other hand, the sudo command makes it easier to practice the
principle of least privilege.
• Individual packages management. Individual packages are managed by rpm and dpkg on Red Hat and
Debian respectively. These two are functionally equivalent though, and will be compared in Table 24.1.
Note. In the case of debian, APT uses a file that lists the ’sources’ from which packages can be obtained. This
file is /etc/apt/sources.list.
• Package Management Systems. At a higher level, package dependencies can be automatically managed by
yum and apt on Red Hat and Debian respectively. These two are functionally equivalent though, and will
be compared in Table 24.2. With these tools one can essentially say “install this package” and all dependent
1 Historically, libraries could only be static.
174
Table 24.1: dpkg and rpm.
Debian Red Hat Description
dpkg -Gi package(s).deb rpm -Uvh packages(s).rpm install/upgrade package file(s)
dpkg -r package rpm -e package remove package
dpkg -l ’*spell*’ rpm -qa ’*spell*’ show all packages whose names
contain the word spell
dpkg -l package rpm -q package show version of package installed
dpkg -s package rpm -q -i package show all package metadata
dpkg -I package.deb rpm -q -i -p package.rpm show all package file’s metadata
dpkg -S /path/file rpm -q -f /path/file what package does file belong
dpkg -L package rpm -q -l package list where files were installed
dpkg -c package.deb rpm -q -l -p package.rpm list where files would be installed
dpkg -x package.deb rpm2cpio package.rpm | cpio -id extract package files to current di-
rectory
dpkg -s package | grep ^Depends: rpm -q --requires package list files/packages that package
needs
dpkg --purge --dry-run package rpm -q --whatrequires package list packages that need package (see
also whatrequires)
packages will be installed/upgraded as appropriate. One of course has to configure where these tools can find
these packages, and this is typically done by configuring online package repositories.
Note. Debian requires one to apt-get update before these commands so the local cache is up to date
with the online ones. Yum is the opposite, in that one needs to add the -C option to tell it to operate on the
local cache only.
Another very useful tool, which is not installed by default is apt-file. This tool allows us to search between
packages and files. For example:
Final remark. OS which do not use a package management system (like windows) traditionally release programs
with an executable installer that contains any required dependencies (libraries etc.). There are many problems with
this method, the most obvious being:
1. No standard install API so can’t do many things like seeing where a file came from etc.
2. Size of install files large due to included dependencies and install logic.
175
In the particular case of Microsoft, they have addressed these problems with MSI files, which are essentially
the same as linux packages. However MSI files are still autonomous, i.e. they don’t handle dependencies and so
still suffer from the second problem above. The windows filesystem layout generally puts all files associated with a
program under one directory (usually in "Program files"). There are many problems with this also, the main one being
that this leads to mixing of logic and data which rather inflexible when for instance, we want to use different disk
partitions for data and logic.
In short, /var/log is the location where you should find all Linux logs file. However some applications such as
apache2 (one of the most famous WEB servers) have a directory within /var/log/ for their own log files. You can
rotate log file using the command logrotate.
176
have a single host that accepts all of the log messages from a number of hosts, writing out the information to a single
file. The service is highly configurable (in our case in /etc/rsyslog.conf and /etc/rsyslog.d). In these files you must
specify rules. Every rule consists of two fields, a selector field and an action field. These two fields are separated by
one or more spaces or tabs. The selector field specifies a pattern of facilities and priorities belonging to the specified
action.
24.6.5 Selectors
The selector field consists of two parts: a facility and a priority.
• The facility is one of the following keywords:
auth, authpriv, cron, daemon, kern, lpr, mail, mark, news, syslog, user, uucp and local0 through local7.
• The priority defines the severity of the message and it is one of the following keywords, in ascending order:
debug, info, notice, warning, err, crit, alert, emerg.
Additionally, the following keywords and symbols have a special meaning: “none”, “*”, “=” and “!”. We show
their use by examples below.
24.6.6 Actions
The action field of a rule describes the abstract term “logfile”. A “logfile” need not to be a real file but also a named
pipe, a virtual console or a remote machine. To forward messages to another host, prepend the hostname with the at
sign “@”.
24.6.7 Examples
*.info;mail.none;news.none;authpriv.none;cron.none /var/log/syslog
The *.info means “Log info from all selectors”. However, after that, it says mail.none;news.none and so forth.
What that means when all put together is “Log everything from these EXCEPT these things that are following it with
’.none’ behind them”.
*.crit;kern.none /var/adm/critical
This will store all messages with the priority crit in the file /var/adm/critical, except for any kernel message.
# Kernel messages are first, stored in the kernel
# file, critical messages and higher ones also go
# to another host and to the console
#
kern.* /var/adm/kernel
kern.crit @mylogserver
kern.=crit /dev/tty5
kern.info;kern.!err /var/adm/kernel-info
• The first rule directs any message that has the kernel facility to the file /var/adm/kernel.
• The second statement directs all kernel messages of the priority crit and higher to the remote host mylogserver.
This is useful, because if the host crashes and the disks get irreparable errors you might not be able to read the
stored messages.
• The third rule directs kernel messages of the priority crit to the virtual console number five.
• The fourth saves all kernel messages that come with priorities from info up to warning in the file /var/adm/kernel-
info. Everything from err and higher is excluded.
177
24.6.8 Other Logging Systems
We must remark that many programs deal with their own logging. Apache Web server is one of those. In your
httpd.conf file you must specify where you are logging things.
# tail -f /var/log/syslog
On the other hand, the logger command makes entries in the system log and it provides an interface with the log
system for shells. Examples:
To log a message indicating a system reboot, enter:
logger -f /tmp/msg1
24.7 Extra
24.7.1 *Quotes
If you manage a system that’s accessed by multiple users, you might have a user who hogs the disk space. Using disk
quotas you can limit the amount of space available to each user. It’s fairly easy to set up quotas, and once you are done
you will be able to control the number of inodes and blocks owned by any user or group.
Control over the disk blocks means that you can specify exactly how many bytes of disk space are available to a
user or group. Since inodes store information about files, by limiting the number of inodes, you can limit the number
of files users can create.
When working with quotas, you can set a soft limit or a hard limit, or both, on the number of blocks or inodes
users can use. The hard limit defines the absolute maximum disk space available to a user. When this value is reached,
the user can’t use any more space. The soft limit is more lenient, in that the user is only warned when he exceeds the
limit. He can still create new files and use more space, as long as you set a grace period, which is a length of time for
which the user is allowed to exceed the soft limit.
178
24.7.3 *Example of a sudoers file
# This file MUST be edited with the ’visudo’ command as root.
# See the man page for details on how to write a sudoers file.
Defaults env_reset
# Host alias specification
# User alias specification
User_Alias NET_USERS = %netgroup
# Cmnd alias specification
Cmnd_Alias NET_CMD =/usr/local/sbin/simtun, /usr/sbin/tcpdump, /bin/ping,
/usr/sbin/arp, /sbin/ifconfig, /sbin/route, /sbin/iptables, /bin/ip
# User privilege specification
root ALL=(ALL) ALL
# Uncomment to allow members of group sudo to not need a password
# (Note that later entries override this, so you might need to move
# it further down)
# %sudo ALL=NOPASSWD: ALL
NET_USERS ALL=(ALL) NOPASSWD: NET_CMD
# Members of the admin group may gain root privileges
%admin ALL=(ALL) ALL
The process of changing ACLs is fairly simple but sometimes understanding the implications is much more com-
plex. There are a few commands that will help you make the changes for ACLs on individual files and directories.
$ getfacl file
The getfacl command will list all of the current ACLs on the file or directory. For example if user1 creates a
file and gives ACL rights to another user (user2) this is what the output would look like:
$ getfacl myfile
# file: myfile
# owner: user1
# group: user1
user::rw-
user:user2:rwx
group::rw-
mask::rwx
other::r--
179
The getfacl shows typical ownership as well as additional users who have been added with ACLs like user2 in
the example. It also provides the rights for a user. In the example, user2 has rwx to the file myfile. The setfacl
command is used to create or modify the ACL rights. For example, if you wanted to change the ACL for user3 on a
file you would use this command:
The -m is to modify the ACL and the “u” is for the user which is specifically named, user3, followed by the rights
and the file or directory. Change the “u” to a “g” and you will be changing group ACLs.
If you want to configure a directory so that all files that are created will inherit the ACLs of the directory you would
use the “d” option before the user or group.
24.9 Practices
24.10 Practices
Exercise 24.1– This exercise deals with system management. To do the following practices you will need to be
“root”. To not affect your host system, we will use an UML virtual machine.
1. Start an UML Kernel with a filesystem mapped in the ubda device (of the UML guest) and with 128 Megabytes
of RAM. Then, login as root in the UML guest and use the command uname -a to view the Kernel versions of
the host and the UML guest. Are they the same? explain why they can be different.
180
2. Login as root in the UML guest and type a command to see which is your current uid (user ID), gid (group ID) and
the groups for which root is a member. Type another command to see who is currently logged in the system.
3. As root, you have to create three users in the system called user1, user2 and user3. In the creation command, you
do not have to provide a password for any of the users but you should create them with ”home“ directories and with
a Bash as default shell.
4. Type the proper command to switch from root to user1. Then, type a command to view your new uid, gid and the
groups for which user1 is a member. Also find these uid and gid in the appropriate configuration files.
5. Type a command to see if currently user1 appears as logged in the system. Explain which steps do yo need to
follow to login as user1. Once you manage to login as user1, type a command to view who is currently connected,
type another command to view which are the last login accesses to the system and finally, type another command
to view your current uid, gid and groups.
6. Login as user1 and then switch the user to root. As root create a group called “someusers”. Find the gid assigned
this group. Modify the configuration of user1, user2 and user3 to make them “additionally“ belong to the group
”someusers”. Type “exit” to switch back to user1, and then, type a command to view your current uid, gid and
groups. Do you observe something strange?
7. Logout and login again as root. Switch to user1, and then, type a command to view your current uid, gid and
groups. Explain what you see. Switch back to root and create another group called “otherusers“. Then, modify
user2 and user3 so that they additionally belong to this group.
8. Now, we are going to simulate that our UML guest has a second disk or storage device. First, halt the UML guest.
In the host, create an ext3 filesystem of 30M called second-disk.fs. We are going to map second-disk.fs in the
’ubdb“ device of the UML guest. To do so, add the option ”ubdb=second-disk.fs“ to the command line that you
use to start the UML guest.
9. Login as root in the UML guest. As you can see, the new filesystem (ubdb) is not mounted so mount it under
/mnt/extra. Type a command to view the mounted filesystems and another command to see the disk usage of these
filesystems. Inside /mnt/extra, create a directory called ”etc“ and another directory called ”home“.
10. Modify the /etc/fstab configuration file of the UML guest so that the mount point /mnt/extra for the device ubdb
is automatically mounted when booting the UML guest. Reboot the UML guest and check that /dev/ubdb has been
automatically mounted.
11. Halt the UML guest. In the host, mount the filesystem second-disk.fs with the loop option (use any mount point
that you want). Then, copy the file /etc/passwd of your host into the ”etc“ directory of the filesystem of second-
disk.fs you have mounted. Now, unmount second-disk.fs in the host and boot again the UML guest. Login as root
in the UML guest and check that the file passwd exists in /mnt/extra/etc.
12. Now, we are going to simulate that we attach an USB pen-drive into our UML guest. To do so, you have to create
a new filesystem of 10 Megabyte called pen-drive.fs inside the UML guest. Then, mount pen-drive.fs with the
loop option in /mnt/usb. Now, we are going to use our ”pen-drive“: create a subdirectory in /mnt/usb called ”etc“
and then unmount /mnt/usb. Next, mount again pen-drive.fs with the loop option but this time in /mnt/extra. Why
is now the directory /mnt/extra/etc empty? Copy the file /etc/passwd of the UML guest in /mnt/extra/etc and then
umount. Which are now the contents of the /mnt/extra/etc file? why?
13. Type the following commands and explain what they do:
181
guest# ls /mnt/extra
guest# exit
guest# umount /mnt/extra/root2
14. Login as root in the UML guest and create a configuration in the system for the directory /mnt/extra in which the
three users: user1, user2 and user3 can create and delete files in this directory. Check the configuration.
15. Now, create a configuration for the directory /mnt/extra so that user1 and user2 can create and delete files in
/mnt/extra but user3 can only list the contents of this directory. Check the configuration.
16. Create three subdirectories called user1, user2 and user3 in /mnt/extra. These directories have to be assigned
to their corresponding users (user1, user2 and user3). Then, create a configuration in the UML guest for these
subdirectories so that each user can only create and delete files in her assigned subdirectory but can only view the
contents of the subdirectory of another user. Finally, users in the system different from user1, user2 and user3 must
not have access to these subdirectories.
17. As root, create and execute a Bash script called ”links.sh“ that creates a symbolic link in the home of each user to
the directory /mnt/extra.
18. Add a symbolic link to /mnt/extra in the directory /etc/skel. Then, create a new user called user4 without password
but with a home and Bash as shell and explain what you see in the home of this user.
19. Type a command to write ”probing logging system“ in the general system log (the /var/log/syslog file in Debian-
like distributions). Then, type a command to view the last 10 lines of /var/log/syslog.
182
Part V
183
Chapter 25
X window system
185
186
Chapter 26
X window system
• hostname is the name or IP address of the host in which is running the X server.
1 The X.Org Foundation leads the X project, with the current reference implementation, X.Org Server, available as free and open source software
187
User space User space
X server xeyes
TCP/IP network
• displaynumber identifies the X server within the host since there can be different X servers running at a
time in the same host.
• screennumber specifies the screen to be used on that server. Multiple screens can be controlled by a single
X server.
When DISPLAY does not contain a hostname, e.g. it is set to :0.0, Unix sockets will be used. When it does contain
a hostname, e.g. it is set to localhost:0.0, the X client application will try to connect to the server (even localhost as in
the example) via TCP/IP sockets. Another example, if you type:
$ export DISPLAY="192.168.0.1:0.0"
$ xeyes &
The first command-line sets and exports the variable DISPLAY and the second command-line executes an xeyes.
In this case, we are specifying that the X server in which the xeyes are going to be displayed is running in a host that
can be reached with IP address 192.168.0.1 and that our target X server is 0 (screen 0). More precisely, when we say
server 0, we mean that we expect the X server to be listening to port 6000, which is the default port for server 0. If
you want to connect to server 1, you should type:
$ export DISPLAY="192.168.0.1:1.0"
And this would mean that for X clients launched after setting this display expect an X server listening on port
6001 of host 192.168.0.1. Another way to do remote display is by a command line option that all X applications have:
”-display [displayname]“. Then, to produce the same result as before but without using the DISPLAY variable, we
can type:
188
Table 26.1: xhost parameters.
xhost shows the X server access control list.
xhost +hostname Adds hostname to X server access control list.
xhost -hostname Removes hostname from X server access control list.
xhost + Turns off acccess control (all remote hosts will have access to X server).
xhost - Turns access control back on.
$ xhost +192.168.0.1
We must notice that for security reasons, options that affect access control may only be run from the controlling
host, i.e. the machine with the display connection.
189
190
Chapter 27
Secure Shell
191
192
Chapter 28
Secure Shell
TCP/IP network
encrypted
The first component provides server authentication, confidentiality, integrity and optionally compression. The
transport layer will typically be run over the top of any reliable data stream, in particular over TCP/IP connection.
193
The second one, running over SSH transport layer protocol, authenticates the client-side user to the server. Finally,
the third one, running over user authentication protocol, multiplexes the encrypted tunnel into several logical channels
(see figure 28.2).
TCP
Transmission Control Protocol provides reliable, connection-oriented
end-to-end delivery
IP
Internet Protocol Provides datagram delivery across multiple networks
The SSH communication works on top of the packet-level protocol, and proceeds in the following
Figure 28.2: SSH protocol stack.
phases:
1. The
TheSSH
clientcommunication works
opens a connection on top
to the of the packet-level protocol, and proceeds in the following phases:
server.
1. The client opens a connection to the server.
2. The server sends its public host key and another public key (“server key”) that changes every hour. The client
compares the received host key against its known host keys.
3. The client generates a 256 bit random number using a cryptographically strong random number generator, and
chooses an encryption algorithm from those supported by the server. The client encrypts the random number
(session key) with a public key algorithm, using both the host key and the server key, and sends the encrypted
key to the server.
4. The server decrypts the public key encryptions and recovers the session key. Both parties start using the session
key (until this point, all traffic has been unencrypted on the packet level). The server sends an encrypted confir-
mation to the client. Receipt of the confirmation tells the client that the server was able to decrypt the key, and
thus holds the proper private keys. At this point, the server machine has been authenticated, and transport level
encryption and integrity protection are in use.
5. The user is authenticated to the server. The client sends requests to the server. The first request always declares
the user name to log in as. The server responds to each request with either “success” (no further authentication
is needed) or “failure” (further authentication is required).
6. After the user authentication phase, the client sends requests that prepare for the actual session. Such requests
include allocation of a tty, X11 forwarding, TCP/IP forwarding, etc. After all other requests, the client sends
a request to start the shell or to execute a command. This message causes both sides to enter the interactive
session.
7. During the interactive session, both sides are allowed to send packets asynchronously.
4 194
The packets may contain data, open requests for X11 connections, forwarded TCP/IP ports, or the agent, etc.
Finally at some point the client usually sends an EOF message. When the user’s shell or command exits, the server
sends its exit status to the client, and the client acknowledges the message and closes the connection
• The client has a local database that associates each host name (as typed by the user) with the corresponding
public host key. This method requires no centrally administered infrastructure and no third-party coordination.
The downside is that the database of name-to-key associations may become burdensome to maintain.
• The host name-to-key association is certified by a trusted Certification Authority (CA). The client knows only
the CA root key and can verify the validity of all host keys certified by accepted CAs. This alternative eases the
maintenance problem, because ideally only a single CA key needs to be securely stored on the client. On the
other hand, each host key must be appropriately certified by a central authority before authorization is possible.
The SSH transport layer is a secure low level transport protocol, that works normally over TCP/IP listening for
connections on port 22. It supplies strong encryption, cryptographic host authentication, and integrity protection.
Authentication is host-based, in fact it’s the higher level protocol to provide user authentication. The SSH transport
layer permits negotiation of key exchange method, public key algorithm, symmetric encryption algorithm, message
authentication algorithm, and hash algorithm. Furthermore it minimizes the number of round-trips that will be needed
for full key exchange, server authentication, service request, and acceptance notification of service request. The
number of round-trips is normally two, but it becomes three in the worst case. The subphases of this protocol are three
and the sequence is: compression, MAC and encryption. During the exchanging period each packet has the format
of table 28.1 (see Figure 28.3). The payload contains the exchanging data, which may be compressed. The random
padding is necessary to make become the length of the string (packet_length||padding_length||payload||padding)
divisible by the cipher block size or 8.
Figure 28.4 illustrates the sequence of events in the SSH Transport Layer Protocol. First, the client establishes
a TCP connection to the server with the TCP protocol and is not part of the Transport Layer Protocol. When the
connection is established, the client and server exchange data, referred to as packets, in the data field of a TCP
segment.
Compression algorithm may be different for each direction of communication. To guarantee data integrity, each
packet includes a MAC. The MAC phase is not included at the first time when its length is zero. Only after the key
exchange, MAC will be in effect. The MAC is calculated from a shared secret, packet sequence number and part of
packet (the length fields, payload and padding):
M AC = mac(key, sequence_number||unencrypted_packet)
195
Payload
�
Compress
�
seq# pktl pdl Compressed Payload Padding
I I
I I
•
Encrypt MAC
� �
•
ciphertext
number range begins from zero and ends to (232 -1); the packet sequence number is not sent inside the exchanged
packets. The MAC algorithm (see table 28.2) may be different for each direction of communication.
Finally, the string
(packet_length||padding_length||payload||padding)
is encrypted. It is worth noting that also the packet length field is encrypted. This trick is adopted to hide the
dimension of payload, because a part of exchanged packet is random padding, the other one is the exchanged datas.
The minimum size of a packet is 16 or the cipher block size bytes plus MAC length. The maximum size is respectively
of 32768 bytes for uncompressed payload length and of 35000 bytes for total packet. It’s specified the dimension
of uncompressed payload, because this field, and only it, may be compressed using an appropriate algorithm. In
the encryption phase the packet length, padding length, payload and padding fields of each packet are encrypted
with negotiated algorithm and key. Encryption algorithm (see table 28.3) may be different for each direction of
communication, in particular all encrypted packets sent in one direction is considered as single data stream. It is
recommended to not use key length less than 128 bits.
196
Client Server
SSH-protoversion-softwareversion
Identification String
SSH-protoversion-softwareversion Exchange
'
SSH_MSG_KEXINIT
-
Algorithm
SSH_MSG_KEXINIT Algorithm Negotiation
Key Exchange
SSH_MSG_NEWKEYS
End of Key
SSH_MSG_NEWKEYS Exchange
SSH_MSG_SERVICE_REOUEST
Service Request
In the following we describe the key exchange method at the beginning of that period. During the first connection,
the key exchange method indicates the generation of keys for encryption and authentication, and how the server
authentication is done. The only required key exchange method is Diffie-Hellman. This protocol has been designed
to be able to operate with almost any public key format, encoding, and algorithm (signature and/or encryption); see
table 28.4.
At the beginning of key exchange each side sends a list of supported algorithms (see Table 28.3). Each side has an
own algorithm in each category and tries to guess that one of the other side. The guess may be wrong if kex algorithm
and/or the host key algorithm are different for server and client or if any of the listed algorithms cannot be agreed
upon. Otherwise, the guess is right and the sent packet are handled as the first key exchange packet. Before beginning
the communication, the client waits for response to its service request message.
Encryption and authentication keys are derived, after the key exchange by an exchange hash H and a shared secret
K. From the first key exchange, the hash H is used additionally as the session identifier, in particular as authentication
method. In fact this is the proof of possession of the right private key. The exchange hash H lives for the session
duration, and it will not be changed. Each key exchange method specifies a hash function that is used in the key
exchange to compute encryption keys:
197
Table 28.3: SSH classification of cipher algorithms.
Name Explanation
3des-cbc REQUIRED three-key 3DES in CBC mode
blowfish-cbc RECOMMENDED Blowfish in CBC mode
twofish256-cbc OPTIONAL Twofish in CBC mode, with 256-bit key
twofish-cbc OPTIONAL alias for "twofish256-cbc"
twofish192-cbc OPTIONAL Twofish with 192-bit key
twofish128-cbc RECOMMENDED Twofish with 128-bit key
aes256-cbc OPTIONAL AES (Rijndael) in CBC mode, with 256-bit key
aes192-cbc OPTIONAL AES with 192-bit key
aes128-cbc RECOMMENDED AES with 128-bit key
serpent256-cbc OPTIONAL Serpent in CBC mode, with 256-bit key
serpent192-cbc OPTIONAL Serpent with 192-bit key
serpent128-cbc OPTIONAL Serpent with 128-bit key
arcfour OPTIONAL the ARCFOUR stream cipher
idea-cbc OPTIONAL IDEA in CBC mode
cast128-cbc OPTIONAL CAST-128 in CBC mode
none OPTIONAL no encryption; NOT RECOMMENDED
198
• Initial IV client to server: HASH(K || H || “A” || session_id)
• Initial IV server to client: HASH(K || H || “B” || session_id)
• Encryption key client to server: HASH(K || H || “C” || session_id)
• Encryption key server to client: HASH(K || H || “D” || session_id)
• Integrity key client to server: HASH(K || H || “E” || session_id)
• Integrity key server to client: HASH(K || H || “F” || session_id)
Where K is a known value and “A” as byte and session_id as raw data. Key data is taken from the beginning
of the hash output, at least 128 bits (16 bytes). Key exchange ends by each side sending an SSH_MSG_NEWKEYS
message. This message is sent with the old keys and algorithms. All messages sent after this message must use the
new keys and algorithms.
The shared secret K is provided by the Diffie-Hellman key exchange. Each side is not able to determine alone
a shared secret. To supply host authentication, the key exchange is combined with a signature with the host key. In
the following description: p is a large safe prime, g is a generator for a subgroup of GF(p), and q is the order of the
subgroup; V_S is server’s version string; V_C is client’s version string; K_S is server’s public host key; I_C is client’s
KEXINIT message and I_S server’s KEXINIT message which have been exchanged before this part begins.
• Client generates a random number x (1 < x < q) and computes e = gx mod p. C sends “e” to server.
• Server generates a random number y (0 < y < q) and computes f = gy mod p. Server receives “e”. It computes K
= ey mod p, H = hash(V_C || V_S || I_C || I_S || K_S || e || f || K) (table 4), and signature s on H with its private
host key. Server sends “K_S || f || s” to C. The signing operation may involve a second hashing operation.
• Client verifies that K_S really is the host key for Server (e.g. using certificates or a local database). Client is also
allowed to accept the key without verification (insecure). Client then computes K = fx mod p, H = hash(V_C ||
V_S || I_C || I_S ||K_S || e || f || K), and verifies the signature s on H.
Either side must not send or accept “e” or “f” values that are not in the range [1, p-1]. If this condition is violated,
the key exchange fails.
It’s possible re-exchange keys and above all it’s recommended when the communication transmitted more than a
gigabyte of data or it is during more than an hour. However, the re-exchange keys operation requires computational
resources, so it should not be performed too often. This operation is equal to the initial key exchange: it’s possible
to negotiate some or all of the algorithms, all keys and initialization vectors are recomputed after the exchange and
compression and encryption contexts are reset. The only thing that remains unchanged is the session identifier.
After key exchange, the client requests a service, passing through SSH user authentication protocol. The service
is identified by a name. When service end, user may want to terminate the connection. After this message the client
and server must not send or receive any data.
199
Table 28.8: hash components.
Type Name Explanation
string V_C the client’s version string (CR and NL excluded)
string V_S the server’s version string (CR and NL excluded)
string I_C the payload of the client’s SSH_MSG_KEXINIT
string I_S the payload of the server’s SSH_MSG_KEXINIT
string K_S the host key
mpint e exchange value sent by the client
mpint f exchange value sent by the server
mpint K the shared secret
Authentication Requests
In table 28.9 it is shown the authentication request message format. The user and service name appear in every new
authentication attempt. These may change; therefore the server checks them in every message. If there are some
changes the server must update authentication states. If it is unable to make it, it disconnects the user for that service.
200
password are sent over the encrypted channel to the SSH server, which authenticates the us er using
the supplied password. HP SIM 5.x also supports this method.
The diagram below shows how the key files are used by the SSH server and client.
Known Hosts
(public key) SSH Server keys
User auth
User Keys
(public key)
Ssh_known_hosts
Host Keys
(public key)
When the server accepts authentication and only when the authentication is complete, it responds with the message
of table 28.11.
The client is not obligated to wait for responses from previous requests, so it may send several authentication
requests. Otherwise the server must process each request before processing the next request. It is not possible to
send a second request without waiting for a response from the server. In fact if the first request will result in further
201
Table 28.11: acceptance of authentication request message.
byte SSH_MSG_USERAUTH_SUCCESS
exchange of messages, that one will be aborted by a second request. No SSH_MSG_USERAUTH_FAILURE message
will be sent for the aborted method. SSH_MSG_USERAUTH_SUCCESS is sent only once any further authentication
requests received after that is silently ignored, passing directly that request to the service being run on top of this
protocol. It’s possible for a client to use the “none” authentication method. If no authentication is needed for user,
server returns SSH_MSG_USERAUTH_SUCCESS. Otherwise, server returns SSH_MSG_USERAUTH_FAILURE.
Authentication Method
In the authentication at the lower-level it’s been created a secure channel between the client and the server, but different
user may run this operation from the same client to the same server; so it’s required, as user authentication method.
The main user authentication methods are two: a public key or a password authentication. In the first method the
possession of a private key serves as authentication. The user sends a signature created with its private key. The server
checks that the public key and the signature for the user are valid. If it’s valid, the authentication request is accepted;
otherwise it is rejected. In table 28.12 it is shown the message format for querying whether authentication using the
key would be acceptable.
Table 28.12: message format for querying whether authentication using the key would be acceptable.
byte SSH_MSG_USERAUTH_REQUEST
string user name
string Service
string “publickey”
boolean FALSE
string public key algorithm name
string public key blob
Public key algorithms are defined in the transport layer specification (table 28.4); in particular the list is not
constrained by what was negotiated during key exchange. The public key blob may contain certificates. The server
responds to this message with either SSH_MSG_USERAUTH_FAILURE or with the SSH_MSG_USERAUTH_PK_OK
(table28.13). Public key authentication is one of the most secure methods to authenticate using Secure Shell. Public
key authentication uses a pair of computer generated keys – one public and one private. Each key is usually between
1024 and 2048 bits in length, and appears like the sample below. Even though you can see it, it is useless unless you
have the corresponding private key:
202
Table 28.13: acceptance of authentication message.
byte SSH_MSG_USERAUTH_PK_OK
string public key algorithm name from the request
string public key blob from the request
To perform actual authentication, the client may send the signature, generated using its private key, directly without
first verifying whether the key is acceptable (table 28.14).
Signature is created by the corresponding private key over the data of table 28.15.
When the server receives this message, it checks whether the supplied key is acceptable for authentication, and if
so, it checks whether the signature is correct. If both checks succeed, this method is successful. (Note that the server
may require additional authentications.) The server responds with SSH_MSG_USERAUTH_SUCCESS (if no more
authentications are needed), or SSH_MSG_USERAUTH_FAILURE (if the request failed, or more authentications are
needed). In the second method, the possession of a password serves as authentication (see table 28.16). The server
interprets the password and validates it against the password database; it may request user to change password. It’s
important to remember that if a cleartext password is transmitted in the packet, however this one is encrypted by SSH
transport layer. Although passwords are convenient, requiring no additional configuration or setup for your users, they
are inherently vulnerable in that they can be guessed, and anyone who can guess your password can get into your
system. Due to these vulnerabilities, it is recommended that you combine or replace password authentication with
another method like public key.
If the lower-level does not provide confidentiality, password authentication is disabled. Beside if the lower-level
does not provide confidentiality or integrity password change is disabled. In general, the server responds to the
password authentication packet with success or failure. In a failure response if the user password has expired server
can reply to user with SSH_MSG_USERAUTH_PASSWD_CHANGEREQ (table 28.17). In any case server must prevent
from logging users with expired password; then the client may continue with another authentication method, or request
a new user password (table 28.18).
Finally the server must reply to request message with one of this message:
203
Table 28.16: Password authentication packet.
byte SSH_MSG_USERAUTH_REQUEST
string user name
string service
string "password"
boolean FALSE
string plaintext password
• SSH_MSG_USERAUTH_SUCCESS: the password has been changed, and authentication has been successfully
completed.
• SSH_MSG_USERAUTH_FAILURE with partial success: the password has been changed, but more authentica-
tions are needed.
• SSH_MSG_USERAUTH_FAILURE without partial success: the password has not been changed. Either pass-
word changing was not supported, or the old password was bad.
• SSH_MSG_USERAUTH_CHANGEREQ: the password was not changed because the new password was not ac-
ceptable.
204
Table 28.19: request format.
Byte SSH_MSG_GLOBAL_REQUEST
String request name
Boolean want reply
... request-specific data follows
a message is received to indicate that window space is available. To open a new channel, each side allocates a local
number and sends the message of table28.22 to the other side including the local channel number and initial window
size.
In particular the fields of this message are defined following. Sender channel is a local sender identifier for the
channel. Initial window size specifies how many bytes of channel data can be sent to the sender of this message
without adjusting the window. Maximum packet size specifies the maximum size of an individual data packet that can
be sent to the sender. On other side the remote machine decides whether it can open the channel, and it may respond
with its open confirmation message (see table 28.23).
In particular recipient channel is the channel number given in the original open request. Whether it can not open the
channel, because does not support the specified channel type, it responds with SSH_MSG_CHANNEL_OPEN_FAILURE
(see table 28.24). If the recipient of the SSH_MSG_CHANNEL_OPEN message does not support the specified channel
type, it simply responds with SSH_MSG_CHANNEL_OPEN_FAILURE. After the opening of the channel some data
is transmitted on this one. During the communication on channel, the data transfer is controlled by the window size
that specifies how many bytes the other party can send before it must wait for the window to be adjusted.
The maximum amount of allowed data is the current window size. The amount of data sent decrements the window
size. Each side may ignore all extra data sent after the allowed window is empty. When a side will no send more data
205
Table 28.24: message to fail the opening of the channel
Byte SSH_MSG_CHANNEL_OPEN_FAILURE
Uint32 recipient channel
Uint32 reason code
String additional textual information
String language tag
to a channel, it send MSG_CHANNEL_EOF (see table 28.25), but no explicit response is sent to this message. It is
important to note that the channel remains open after MSG_CHANNEL_EOF, and more data may be transferred in
each direction.
Finally, when each side wants to terminate the channel, one sends a SSH_MSG_CHANNEL_CLOSE message (see
table 28.26) and waits to receive back the same message. The channel is closed for each side when it has both sent
and received SSH_MSG_CHANNEL_CLOSE.
If want reply is FALSE, no response will be sent to the request. Otherwise, the recipient responds with either
SSH_MSG_CHANNEL_SUCCESS or SSH_MSG_CHANNEL_FAILURE, or request-specific continuation messages.
If the request is not recognized or is not supported for the channel, SSH_MSG_CHANNEL_FAILURE is returned.
Request types are local to each channel type. To complete this section, it is defined a session as a remote execution of
a program. The program may be an X11 Forwarding, a TCP/IP Port Forwarding, a Shell, a command, an application,
a system command, or some built-in subsystem. Multiple sessions can be active simultaneously.
206
Client Server
--
SSH_MSG_CHANNEL OPEN
Open a Channel
-
SSH_MSG_CHANNEL OPEN_CONFIRMATION
SSH_MSG_CHANNEL_DATA
-
SSH_MSG_CHANNEL_DATA
-
-
Data Transfer ·
•
SSH_MSG_CHANNEL_DATA
SSH_MSG_CHANNEL_DATA
SSH_MSG_CHANNEL_CLOSE
Close a channel
ssh (Client) Configuration The default configuration file for sshd on most distributions is /etc/ssh/ssh_config.
Key settings that one will want to check and possibly change:
• ForwardAgent – Specifies whether the connection to the authentication agent (if any) will be forwarded to
the remote machine.
• ForwardX11 – Specifies whether X11 connections will be automatically redirected over the secure channel.
“yes” means that when one starts a GUI/X11 program from an ssh session, the program will run on the remote
machine, but will display back on the machine the ssh client was started on. All traffic to the GUI program is
encrypted, basically being carried in an ssh tunnel.
• Protocol – Determines which protocol clients can connect with. Should be only “2”.
207
Table 28.28: OpenSSH Suite
ssh The command-line client establishes encrypted connections to remote
machines and will run commands on these machines if needed.
scp To copy files (non-interactively) locally to or from remote computers,
you can use scp (“secure copy”).
sftp The ftp client (“secure ftp”) supports interactive copying. Like other
command-line ftp clients, the tool offers many other options in addition
to copying, such as changing and listing directories, modifying permis-
sions, deleting directories and files, and more.
sshd The SSH server is implemented as a daemon and listens on port 22 by
default. SSH clients establish connections to the sshd.
ssh-copy-id Nice little program for installing your personal identity key to a remote
machine’s authorized_keys file
ssh-keygen Generates and manages RSA and DSA authentication keys.
ssh-keysign The tool ssh-keysign is used for host-based authentication.
ssh-keyscan This application displays public hostkeys and appends them to the
~/.ssh/known_hosts file.
ssh-agent The SSH agent manages private SSH keys and simplifies password han-
dling. Remembers your passphrases over multiple SSH logins for auto-
matic authentication. ssh-agent binds to a single login session, so
logging out, opening another terminal, or rebooting means starting over.
ssh-add The ssh-add tool introduces the ssh-agent to new keys.
sshd (Server) Configuration The default configuration file for sshd on most distributions is /etc/ssh/sshd_config.
Key settings that one will want to check and possibly change:
• MaxAuthTries – Determines how many authentication failures are allowed before the connection closes. For
more security, you can set this to 2 or even 1. Note, though, that while this can slow down password attacks, it
cannot prevent them because it simply closes the current connection, and an attacker will then simply call ssh
again to re-connect.
• PermitRootLogin – Determines whether someone can attempt to login directly as root (administrator). For
highest security, you should set this to: “no” . While this means that root will have to login as a normal user and
then use su to become root, it can help prevent brute-force password attacks. This does make it impossible to
run scp and sftp as root, which can be annoying—though it is certainly more secure. An alternative is to set
it to “without-password” which will allow direct root login using public key authentication.
• Protocol – Determines which ssh protocol (1 and/or 2) clients are allowed to connect with. For security, you
should make sure that this is just “2”, not “1,2”.
• X11Forwarding – Specifies whether X11 forwarding is permitted. Usually yes in modern distributions.
sshd is usually configured to observe the access specifications in the standard server access control files /etc/hosts.deny
and /etc/hosts.allow. These files provide an important mechanism to prevent automated brute force password
attacks against an ssh server. The best security is provided by having /etc/hosts.denycontain the line: ALL:ALL
and then having an sshd line in /etc/hosts.allow, e.g., like: sshd: 192.168.1.1; to permit logins from
only certain ranges of IP addresses or hostnames.
208
The initial key discovery On first establishing contact, the other end of the connection reveals its public hostkey
fingerprint. When warned that the authenticity of the machine has not been verified, you need to say yes, and then you
will be prompted to enter the password. This initial key discovery process is to ensure security. It is possible for an
attacker to steal information from the remote user log-in by impersonating the server, i.e., if the attacker can provide a
server with the same host name and user authentication, the user connecting from the remote machine will be logged
into a fraud machine and data may be stolen. Each server will have a randomly generated RSA server key. To ensure
security, in cases where the server key changes, the SSH client will issue a serious warning reporting that the host
identification has failed and that it will stop the log-in process.
The remote system’s hostkeys are stored in the ~/.ssh/known_hosts file. The next time you log in to the
machine, SSH will check to see whether the key is unchanged and, if not, will refuse to cooperate: WARNING:
REMOTE HOST IDENTIFICATION HAS CHANGED!. This example could be a deliberate attempt at a man-in-
the-middle attack, but a changed hostkey often has a more harmless explanation: The administrator changed the key
or reinstalled the system. If you are sure that the explanation is harmless, you can launch a text editor, open the
~/.ssh/known_hosts file, and delete the entry in question.
Using Public-Key Authentication An alternative to standard password authentication to connect to a remote ssh
server is to use public key authentication. With this approach, a user creates a public-private key pair on their local
machine, and then distributes the public key to all the machines he wants to connect to. When an SSH client attempts to
connect to one of these machines, the key pair will be used to authenticate the user, instead of the user being prompted
for his password.
A key pair is generated with the ssh-keygen command. One of the decisions to make when creating a key
pair is whether it will have a passphrase or not. If you create a key with a passphrase, you will be prompted for
this passphrase every time you attempt to connect to the server you copied the key to. This will not appear much
different from password authentication, though it allows you to use the same passphrase to login to multiple remote
machines rather than having to remember different passwords for each of them. If you choose to create a key without
a passphrase, you will be logged in to a remote server without having to type a password or passphrase. This provides
extreme ease of use. However, it also means that should anyone get ahold of your private key, they would be able to
login to any machine where you set the public key up. OpenSSH allows to generate a new identity key pair (RSA or
DSA keys) as an ordinary unprivileged user, and store it in your ~/.ssh directory on your local workstation.
$ ssh-keygen -t rsa
Distributing the public key is accomplished by appending it to the ~/.ssh/authorized_keys file in the
user’s directory on each remote machine. This can be accomplished (via password authentication) using this ssh
command:
(your key may be in the file id_dsa.pub instead). Note that the umask 077 is in the command because SSH
requires that this file be readable only by the owner.
You can also copy your new public key (id_rsa.pub) using the ssh-copy-id command:
ssh-copy-id copies identity keys in the correct format, makes sure that file permissions and ownership are correct,
and ensures you do not copy a private key by mistake.
Key-based authentication is good for brute-force protection, but entering your key passphrase at each and every
new connection does not make your life any easier than using regular passwords.
A more secure but still relatively easy to use alternative, is to create your key pair with a passphrase, but then set
up the ssh-agent program, which can automatically supply your passphrase when requested. Once ssh-agent
is running, you use the ssh-add program to add your key(s). It is at this point that you are prompted for your
passphrase(s). From this point on in the session, whenever an SSH client would need a passphrase, the agent supplies
it. Normally the ssh-agent will be set up to start automatically on login. This program is a session-based daemon
that caches the current user’s decrypted private key. When the ssh client starts running, it tries to identify a running
209
instance of ssh-agent that may already hold a decrypted private key and uses it, instead of asking again for the
key’s passphrase. When starting the first instance of ssh-agent on your system, you have to manually add the keys
that ssh-agent should keep in its cache. Starting ssh-agent usually happens in one of the session startup files such as
~/.bash_profile. This means a new instance of ssh-agent starts and asks for your key passphrase every time
a new terminal session is started. This inconvenience prompted the creation of keychain, an ssh-agent wrapper
that looks for a prior running instance before starting ssh-agent.
keychain improves on ssh-agent by checking for a running ssh-agent process when you log in and either
hooking into one if it exists or starting a new one up if necessary (Some X GUI environments also have something
similar available.) To set up keychain, first install the keychain and ssh-askpass packages for your system.
For Debian/Ubuntu, use this:
$ sudo apt-get install keychain ssh-askpass
Now edit your ~/.bash_profile. file to include these lines:
#!/bin/bash
/usr/bin/keychain ~/.ssh/id_rsa ~/.ssh/id_dsa > /dev/null
source $HOME/.keychain/$HOSTNAME-sh
In line 02, you should include the paths to any keys that you want keychain to manage; check your ~/.ssh
directory. Route the output to /dev/null to avoid getting the agent information output to the screen every time you start
a new terminal. Line 03 sources ~/.keyring/$HOSTNAME-sh(which is created when keychain first runs to store
ssh-agent information) to access SSH_AUTH_S0CK, which records the Unix socket that will be used to talk to
ssh-agent. The first time the keychain process is started up (so, the first time you log in after setting this up), you’ll
be asked to enter the passphrases for any keys that you’ve listed on line 02. These will then be passed into ssh-agent.
Once you have a bash prompt, try logging into a suitable machine (that is, one where you have this key set up) with
one of these keys. You won’t need to use your passphrase.
Now, if log out, and then log back in again, you shouldn’t be challenged for your passphrases, because the
ssh-agent session will still be running. Again, you should be able to log in to that suitable machine without
typing your passphrase. The ssh-agent session should keep running until the next time you reboot.
This long-term running may be seen as a security issue. If someone managed to log in as you, they’d automatically
be able to use your keys, as well. If you use the –clear option for keychain, then every time you log in, your passphrases
will be cleared, and you will be asked for them again. Substitute this line for the earlier line 02:
02 /usr/bin/keychain --clear ~/.ssh/id_rsa ~/.ssh/id_dsa > /dev/null
The passphrases are cleared when you log in, not when you log out. This feature means that you can set your
cronjobs, and so on, to get the passphrases via ssh-agent, and it won’t matter if you’ve logged out overnight.
SSH even understands how to forward passphrase requests back through a chain of connections, to ssh-agent
running on the ultimate originating machine (i.e., if you ssh from localhost to remote1 and from there ssh to remote2,
the passphrase request can be sent back to localhost). This is known as agent forwarding, and it must either be enabled
in the client configuration files on the machines, or the A option to ssh must be used when each connection is started.
Executing remote commands The main purpose of SSH is to execute commands remotely. As we have already
seen, immediately after a successful SSH log-in, we’re provided with the remote user’s shell prompt from where we
can execute all sorts of commands that the remote user is allowed to use. This ‘pseudo’ terminal session will exist
as long as you’re logged in. It is also possible to execute commands on a one-at-a-time basis without assigning a
pseudo-terminal, as follows:
$ ssh [email protected] ‘uname -a’
[email protected]’s password:
Linux 192.168.0.1 2.6.28-9-generic #31-Ubuntu SMP Mon Jul 01 15:43:58 UTC
2013 i686 GNU/Linux
Thus, the ssh client can be used to execute a single command line on a remote machine. The basic format for a
command to do this is:
210
ssh username@hostname ’COMMAND_LINE’
COMMAND_LINE is any legal Bash command line, which may contain multiple simple commands separated by
“;” or pipelines, etc. Note that unless COMMAND_LINE is a simple command, it is good to quote it as shown.
For example, to count the number of files in username’s Documents directory on hostname:
ssh username@hostname ’ls -1 Documents | wc -l’
or
ssh username@hostname ’ls -1 Documents’ | wc -l
The difference between these examples is that the first command does the counting remotely, while the second does
it locally. One of the most versatile aspects of the remote command capability is that you can use it in conjunction
with piping and redirection.
Running XWindow applications remotely Another interesting feature is that SSH can do X11 Session Forwarding.
To enable this feature you have to use the -X parameter1 of the client. This parameter provides a form of tunneling,
meaning ”forward the X connection through the SSH connection”. Example:
$ ssh -X [email protected]
If this doesn’t work, you may have to setup the SSH daemon (sshd) on the remote host to allow X11Forwarding.
Check that the following lines are set in the /etc/ssh/sshd_config file on that host:
X11Forwarding yes
X11DisplayOffset 10
X11UseLocalhost yes
If you modify the sshd configuration file, you will need to restart the SSH service. You can to this typing the
following command:
Port forwarding On the other hand, like X11 session forwarding, SSH can also forward other TCP ports, both,
forward and backwards across the SSH session that you establish. As an illustrative example, let us consider the
scenario of Figure 28.7. In this example, we want to use the TCP port 8080 in our localhost to send traffic to an
apache WEB server that is running on 192.168.0.3:80. To do so, we are going to use a SSH tunnel through the host
192.168.0.2. The command to accomplish this is:
<tunnel-listening-port>:<connect-to-host>:<connect-to-port>
211
ssh client
192.168.0.1 ssh port sshd
listening
11300
port 22
Encrypted SSH
TCP/IP sshd sever
Tunnel 192.168.0.2
listening network
port 8080 client
port 12444
User space
ssh connect-to-host
192.168.0.3
Connect to
listening
Listen TCP client TCP port 80
port 8080 port 11300
Kernel space
Kernel space
You can also create a reverse port forwarding (see Figure 28.8). To set up a reverse tunnel, you can type:
<tunnel-listening-port>:<connect-to-host>:<connect-to-port>
• When any process creates a connection to the port <tunnel-listening-port> (2222 in our example) in
the SSH server, sshd sends the received data through the SSH connection back to the ssh client (192.168.0.1:11300).
• Then, the client ssh establishes a new TCP connection using a new client port (32244) to
<connect-to-host>:<connect-to-port> (192.168.0.3:22).
1 For some newer programs and newer versions of X windows, you may need to use the -Y option instead for trusted X11 forwarding.
212
• Again, from the point of view of <connect-to-host> (192.168.0.3), it is as if the connection came from
the SSH client (192.168.0.1) and this second TCP connection is not encrypted.
We must point out that the <tunnel-listening-port> is only available to local processes in a SSH tunnel.
Another way to express this is to say that these ports are bound to the local loopback interface only.
Finally, we have some other useful options for setting SSH tunnels. The “-n” option tells ssh to associate standard
input with /dev/null, “-N“ tells ssh to just set up the tunnel and not to prepare a command stream, and “-T“ tells ssh
not to allocate a pseudo-tty on the remote system. These options are useful for tunnels because, unlike a normal SSH
login session, usually through a tunnel no actual commands are sent. Thus, we can use the following command to set
up our previous reverse tunnel using these parameters:
Note how the lack of a destination filename just preserves the original name of the file. This is also the case if the
remote destination includes the path to a directory on the remote host. To copy the file back from the server, you just
reverse the from and to:
If you want to specify a new name for the file on the remote host, simply give the name after the colon on the to
side:
If you want to copy a file to a directory relative to the home directory for the remote user specified, you can type:
You can also use absolute paths, which are preceded with a ”/“. To copy a whole directory recursively to a remote
location, use the ”-r“ option. The following command copies a directory named ”Documents“ to the home directory
of the user on the remote host.
Sometimes you will want to preserve the timestamps of the files and directories and if possible, the users, groups
and permissions. To do this, use the ”-p“ option.
SFTP stands for Secure File Transfer Protocol. It is a secure implementation of the traditional FTP protocol with
SSH as the backend. Let us take a look at how to use the sftp command:
213
sftp user@hostname
For example:
[email protected]:~$ sftp 192.168.0.2
Connecting to 192.168.0.2...
[email protected]’s password:
sftp> cd django
sftp> ls -l
drwxr-xr-x 2 user user 4096 Apr 30 17:33 website
sftp> cd website
sftp> ls
__init__.py __init__.pyc manage.py settings.py settings.pyc
urls.py urls.pyc view.py view.pyc
sftp> get manage.py
Fetching /home/user/django/website/manage.py to manage.py
/home/user/django/website/manage.py 100% 542 0.5KB/s 00:01
sftp>
If the port for the target SSH daemon is different from the default port, we can provide the port number explicitly as
an option, i.e., -oPort=port_number. Some of the commands available under sftp are:
• cd—to change the current directory on the remote machine
• lcd —to change the current directory on localhost
• ls—to list the remote directory contents
• lls—to list the local directory contents
• put—to send/upload files to the remote machine from the current working directory of the localhost
• get—to receive/download files from the remote machine to the current working directory of the localhost
• sftp also supports wildcards for choosing files based on patterns.
28.5 Practices
28.6 Practices
Exercise 28.1–
Start the scenario basic-netapps on your host platform by typing the following command:
host$ simctl basic-netapps start
This scenario has two virtual machines virt1 and virt2. Each virtual machine has two consoles (0 and 1)
enabled. E.g. to get the console 0 of virt1 type:
host$ simctl basic-netapps get virt1 0
214
What is the message you get? Note the key fingerprint of the machine before accepting the connection. What is
the difference between the fingerprint and the public key? Why did ssh give you the fingerprint? Could this be
a “man in the middle” attack?
4. On your virt2, look into the directory ~/.ssh directory. Which files are there and what is their content?
What effect do they have on connecting to virt1?
5. Close the ssh session and start it again. What is the message you get now? Why?
6. Close the ssh session. Modify the corresponding files in virt2 to recreate a“man in the middle” attack. Which
file did you modified?
Exercise 28.2–
1. Capture in tap0 and open an SSH session from virt2 to virt1 with the command:
3. Explain the content of client Key Exchange packet and the content of server Key Exchange packet?
4. Does your trace contains Diffie–Hellman key exchange packets? What are the purpose of these packets?
5. Are the payload and the message authentication code (MAC) encrypted together or separately?
6. Using the time command and openssl, write a simple script to find out which is the fastest encryption
algorithm Configure the ssh service to use this algorithm. Which files did you modified?
Exercise 28.3– So far to log in a remote machine using SSH we have used a password as authentication method. Now
let’s set up user accounts so you can log in using SSH keys.
1. Generate both a DSA and a RSA key pair. Make sure you use a passphrase. Which files have been created?
What is their purpose? Are the permssions configured correctly? Where is the passphrase stored?
4. Once you have successfully configured the private key authentication public, you will need to enter the passphrase
each time you open an SSH session to virt1. Configure the ssh-agent to avoid entering the passphrase
each time. What commands did you have to use and that machine?
215
5. Close the ssh session and the terminal where the ssh-agent was running? Open a new session, do you need to
enter the passphare again? How could you solve that?
Exercise 28.4–
In this exercise set you will practice transferring files between systems using the secure copy (scp) and secure
ftp (sftp) commands. Although both scp and sftp can be used to transfer files, each has its own strengths and
weaknesses.
1. Create a text file in virt2 and copy it to virt1 using both sftp and scp. Specify the commands used. Can
you transfer a file using a sftp in a single command line? If so, specify the command.
2. Create another text file in virt2 in your home directory. Using a single sftp command create a ~/tmp/
directory in virt1, copy there the file and transfer it back to virt2 and move it to the desktop.
Exercise 28.5–
In this practice, we are going to examine two ways of using an SSH session to establish X connections between a
graphical application and an X server. For this purpose, we are going to use two Linux machines: your physical host
and a virtual guest. Both machines will be connected with an uml_swith. Then, we will use the xeyes graphical
application in the guest to establish an X connection to the X server of the host to view the “eyes”. You have to try the
two following ways of doing this:
or:
host$ xhost +192.168.0.2
host$ ssh [email protected]
guest$ export DISPLAY=192.168.0.1:0.0
guest$ xeyes
1. Explain in detail the differences between these two ways of establishing the X connection between the xeyes
running in the guest and the X server running in the host. Use netstat and wireshark to provide a detailed
explanation about how the different TCP connections are established.
2. Explain why when you use the first way (with the -X option of SSH) the command does not require the xhost
command.
216
Part VI
Appendices
217
Appendix A
Ubuntu in a Pen-drive
219
220
Ubuntu in a Pen-drive
Install
In this appendix we show how to install Unbuntu in an USB pen-drive. Once the system has been installed, we
configure properly the system in order to extend as much as possible the life of the USB device. Important note. This
installation procedure is only valid for x86 architectures, which means that it is not valid for MAC computers.
Let’ start with the process. First, we must insert the Unbutu CD and configure the BIOS of our computer to boot
from the CD unit. Next, we have to choose the language for installation, then select “Install Ubuntu” and choose the
language for the OS (Figure A.1)
221
Then, we select the time zone and the keyboard layout (Figure A.2).
Now, we must select the disk and the partition in which we want to install the OS.
This step is tricky, because if we make a mistake when selecting the disk/partition we can
cause data loss in the system.
Figure A.3:
We must properly identify the disk device (view section 6.13 in Chapter ??).
Figure A.5:
222
In this sample installation /sda is our SATA hard disk so the disk partition selected is /sdb, which is the device of
our USB pen-drive. Note. The size of disks can help us to select correctly the device of the pen-drive.
The next step requires filling in the fields for our account (see Figure A.6). Note. This account will be by default
in the “sudoers” file with administration capabilities.
The next step allows us to import documents and settings for programs (like Firefox) or other OS (like Windows)
already installed in the disk units. In this case, we will press "Next" as we do not want to import anything.
We are at the last step before starting the installation. This window reports all the parameters we have set. However,
it also has an option called “Advanced”. We must type this button.
223
Figure A.8: Final installation step
In this window we have the possibility of selecting where we want to install the boot loader (Grub2 in the current
Ubuntu version). In this case, we simply need to specify /dev/sdb, which is the device in which we are doing the
installation.
GRUB (GRand Unified Bootloader) is a multi boot manager used in many “Linux distros” to start several operating
systems within the same computer. It is very important to correctly make this final step because with this window
we modify the boot sector of the corresponding hard disk (USB pen-drive) called MBR (Master Boot Record)
in order to properly point to the boot loader code. If we miss this step we will install the boot loader over the
224
defalt disk, probably /dev/sda and the computer will not be able to boot neither from its main disk (/dev/sda)
nor from the USB pen-drive (/dev/sdb).
Once the installation is complete, we might have to modify our BIOS settings to select the USB device as primary
boot device.
• The simplest tweak is to mount volumes using the noatime option. By default Linux will write the last accessed
time attribute to files. This can reduce the life of your disk by causing a lot of writes. The noatime mount option
turns this off. Ubuntu uses the relatime option by default. For your disk partitions, replace relatime with noatime
in /etc/fstab.
• Another tweak is using a ramdisk instead of a physical disk to store temporary files. This will speed things up
and will protect your disk at the cost of a few megabytes of RAM. We will make this modification in our system
so, edit your /etc/fstab file and add the following lines:
This lines tell the Kernel to mount these temporary directories as tmpfs, a temporary file system. We will avoid
unnecessary disk activity and in addition when we halt the system we will automatically get rid of all temporal
files.
• The final tweak is related with our web browser. If you use Firefox, this browser puts its cache in your home
partition. By moving this cache in RAM you can speed up Firefox and reduce disk writes. Complete the previous
tweak to mount /tmp in RAM, and you can put the cache there as well. Open about:config in Firefox. Right
click in an open area and create a new string value called browser.cache.disk.parent_directory
and set the value to /tmp (see Figure A.10). When we reboot the system all the previous changes must be active.
225
Figure A.10: Firefox cache in /tmp
226
Appendix B
Answers to Practices
Exercise 4.1
2. Type: ps -o pid,tname,cmd
3. The option comm shows the command without parameters, while cmd shows the command line parameters.
4. The father is a bash, the grandfather is a gnome-terminal and finally these are children of init.
5. The new windows or tabs are not new gnome-terminals, they are the same process but we have new bash processes.
6. Each xterm terminal is a new process connected with its corresponding bash.
7. The xeyes and the xclock are executed. The former is in foreground, while the latter is in background.
227
17. $ ps -ñ || ps
first sleep error, second sleep not executed, ls executed and ps not executed.
Exercise 4.2
3.
#!/bin/bash
trap "echo waiting operand" USR1
echo Type the number to be multiplied by 7:
read VAR
echo The multiplication of $VAR by 7 is $[VAR*7].
To figure out the correct PID to which send the kill, you can lauch the script in background. Then, you can return
the script execution to foreground.
Exercise 6.1
1. $ cd ../../etc
2. $ cd /home/XXXX
3. $ cp ../../etc/passwd .
5. $ rmdir dir[A,B][1,2]
or
$ rmdir dir[A,B]?
6. $ rmdir dirC?
7. $ touch temp
228
8. $ cat temp
You can also use less or more
9. $ stat temp
Exercise 6.2
1. $ touch orig.txt
$ ln -s orig.txt link.txt
2. $ vi orig.txt
7. $ cat link.txt
cat: link.txt: No such file or directory
When editing a new file orig.txt is created
9. $ ln orig.txt hard.txt
$ stat hard.txt
Each file has two links. In fact, they are the same file.
229
14. ñ in UTF8 is 0xc3b1 a=0x61 b=0x62 (both ISO and UTF8)
15. 0x0a is LF
16. 0x0d is CR
$ hexdump text2.txt
0000000 6261 b1c3 0a0d
0000006
18. ñ=f1
Exercise 8.1
1. $ ls -R > mylist.txt
$ echo "CONTENTS OF ETC" >> mylist.txt
$ tail -10 mylist.txt
3. $ ls /bin | wc -l
4. $ ls /bin | head -n 3
$ ls /bin | head -n 3 | sort -r
Exercise 8.2
2. Assuming that the first terminal (t1) is in /dev/pts/1 and the second terminal (t2) is in /dev/pts/2. Then:
230
Note. This also works without the dash in cat, as this command reads from stdin by default.
Exercise 8.3
With these commands we create a loop, so CPU usage is high.
Exercise 8.4
Exercise 8.5
2. $ fuser -k /etc/passwd
4. You see nothing in the output and nothing in file.txt. This is because the open file descriptor of the bash of the
first pseudo-terminal has open files that currently do not exist.
Exercise 20.1
Exercise 20.2
Exercise 20.3
231
Exercise 20.4
Exercise 20.5
Exercise 20.6
En la transfer binaria hay conversion y desconversion LF a CRLF. El cliente linux lo deja como el original.
Exercise 22.1
This script provides information about the script. It is self-explanatory.
Exercise 22.2
This script removes temporary files from our working directory. It asks the user to make sure about this and uses a
case to perform the correct action. It also manages three different exit status.
Exercise 22.3
#!/bin/bash
# AUTHOR: teacher
# DATE: 4/10/2011
# NAME: hypotenuse2.sh
# SYNOPSIS: procedure.sh arg1
# DESCRIPTION: Calculates the squared hypotenuse of a rectangular triangle.
# HISTORY: First version
hyp2(){
local c1 c2
c1=$1; c2=$2
((c1*=c1))
((c2*=c2))
echo "The squared hypotenuse of cathetus $1 and $2 is: $((c1+c2))"
}
if [ $# -ne 2 ]; then
echo "Usage: $0 c1 c2"
exit 1
fi
hyp2 $1 $2
Exercise 22.4
We find out if the character hexchar is printable. If the character is not printable, we return an error code. Printable
characters are those of the ASCII code 0x21-0x7F (33-127).
Exercise 22.5
This script searches recursively a file from a working directory.
232
#!/bin/bash
# AUTHOR: teacher
# DATE: 4/10/2011
# NAME: recfind.sh
# SYNOPSIS: recfind.sh file_to_be_found
# DESCRIPTION: Search recursively a file from the working directory
# HISTORY: First version
# Function: search_in_dir
# Arguments: search directory
function search_in_dir() {
local fileitem
# Enter in the working directory
[ $DEBUG -eq 1 ] && echo "Entrant a $1"
cd $1
# For each file in the current directory we have to figure out
# if the file is a directory or if it is a regular file.
# If the file is a directory,
# we call the function with the found directory as argument.
# If the file is a regular file,
# we look if the file is the file we were looking for.
for fileitem in *
do
if [ -d $fileitem ]; then
search_in_dir $fileitem
elif [ "$fileitem" = "$FILE_IN_SEARCH" ]; then
echo ‘pwd‘/$fileitem
fi
done
[ $DEBUG -eq 1 ] && echo "Sortint de $1"
# We exit the inspected directory
cd ..
}
# main
# We define a global variable that manages the printing
# of "debug" messages.
# If DEBUG=1, then we print debug messages
DEBUG=0
233
# The search will take place from the current directory to the children directories.
search_in_dir ‘pwd‘
Exercise 22.6
#!/bin/bash
# AUTHOR: teacher
# DATE: 4/10/2011
# NAME: factorial.sh
# SYNOPSIS: factorial.sh arg
# DESCRIPTION: Calculates the factorial of arg
# HISTORY: First version
factorial() {
local I AUX H
I=$1
H=$1
if [ $I -eq 1 ]; then
echo 1
else
((I--))
AUX=‘factorial $I‘
echo $((H*AUX))
fi
}
if [ $# -eq 1 ]; then
echo The factorial of $1 is: $(factorial $1)
exit 0
else
echo "Usage: $0 integer_num"
exit 1
fi
Exercise 24.1
234
uid is in the /etc/passwd file
gid is in the /etc/group file
user1@guest:~$ who
root tty0 Sep 3 02:30
First we need to provide a password with the passwd command for user1, then exit and then login again as user1.
user1@vm:~$ who
user1@vm:~$ last
user1@vm:~$ id
6. user1@guest:~$ su
guest# groupadd someusers
guest# cat /etc/group
guest# usermod -G someusers -a user1
guest# usermod -G someusers -a user2
guest# usermod -G someusers -a user3
guest# exit
user1@guest:~$ id
uid=1000(user1) gid=1000(user1) groups=1000(user1)
As shown by the id command, the user1 still does not belong to the group someusers. As you see, these changes
are not effective over currently logged users.
7. When you switch to user1 you now see that now the user belongs to the group ”someusers“.
10. /etc/fstab:
proc /proc proc defaults 0 0
devpts /dev/pts devpts mode=0622 0 0
/dev/ubda / ext3 defaults 0 1
/dev/ubdb /mnt/extra ext3 defaults 0 1
235
guest# mount
/dev/ubda on / type ext3 (rw)
/dev/ubdb on /mnt/extra type ext3 (rw)
guest# ls /mnt/extra/etc/
passwd
13. We create root2 and mount /dev/ubda. As shown, a single device can be mounted several times in different mount
points. Then, we change the root of the filesystem to /mnt/extra/root2 We still see with the mount command that
/dev/ubda is mounted twice but we see nothing inside /mnt/extra. Finally, we get out of chroot with exit and
unmount. Note. chroot is used to change the root of the system. This allows for example, to use another
filesystem containing a different linux distro (like a distro in a CD).
17. links.sh:
#!/bin/bash
ln -s /mnt/extra /home/user1
ln -s /mnt/extra /home/user2
ln -s /mnt/extra /home/user3
236
# chmod u+x
#./links.sh
Exercise 28.5
237