Advanced UNIX Tutorial: Pipes and Redirection
Advanced UNIX Tutorial: Pipes and Redirection
There are numerous UNIX tools available at CAEN that you may find useful. The goal of this
tutorial is to help familiarize you with some of these tools to help you with your computing needs.
Note, in very few of the commands mentioned here do we cover all of the switches, flags and
variations of the command. If you are interested in more detail about any of these commands, it is
advisable to consult the UNIX man pages (man command).
Some of the examples in this tutorial also make use of the GNU utilities. If you have not already
selected the GNU software, you may do so now with the command swselect. Within swselect,
choose GNU Compilers/Utils, then ALL. To activate these changes, type source ~/.software.
Directory Maintenance
Wildcards
Selecting a subset of files within a directory to perform a command can be accomplished using
reular expressions and wildcards. Common wildcards include the asterisk (*) and question mark (?).
The asterisk can represent zero or more characters. The question mark represents exactly one
character. Examples of these are as follows:
Other infrequently used wildcards include the square brackets ( [ ] ) and the curly braces ( { } ). The
square brackets are used to specify a range of characters such as the lowercase letters a through z
with [a-z], or the numbers 0 through 9 with [0-9]. The curly braces select files that could contain one
of a group of possible words such as either the word doc or txt with {doc,txt}. Examples follow:
grep
Grep is a utility to search through one or more files for a string of text. Grep supports some aspect
of regular expressions, but in its simplest form the syntax is as follows: grep “string to search for”
filename(s). The output of grep is the lines that contain the string. You may use wildcards for files.
If a string has spaces or other characters that may be interpreted by the shell, you will need to put
quotes around it. Also, if no file is specified grep will search through stdin, standard input. Now try
out a few commands using grep:
grep animal * Search through all files in the present directory for the word
animal.
grep “Chapter 2” *.doc Search through all files that end in .doc for the string Chapter
2.
cat summer.txt | grep rose Display the lines in the file summer.txt that contain
the word rose.
ps -elf | grep uniqname List only those processes that the user with the loginID
uniqname has running.
find
The find command can be used to locate these files recursively in a directory structure. The find
command has numerous options, but -name, -exec, and -print are the most common. With the find
options, you can search for files by file names, file types, modification times and so forth.
The syntax of the find command is as follows: find directoryname options. To display the results
of the find command, you need to include the -print option. Without this option, find will still
function, but all found files will never be displayed to the screen. The -name option followed by a
wildcard sequence or specific file name will instruct find to locate files that fit that file space. If you
do use wildcards with the -name option, you must backslash them. If you do not backslash them,
they will be interpreted by the shell and not passed on to find. The -exec switch is useful to perform a
command on every found file (for example, -exec command {} command_options \; ). The curly
braces indicate where find is to insert the actual file name found, the \; tells find where to terminate
the command, and the text between them is the actual command that will be executed on each
found file. Examples of the find command follow:
find . -name cisco -print Search through the current directory as well as all directories
under the current for files named cisco and print the results
to the screen.
find . -name \?\?e\* -print Search for any file that matches the wildcard ??e* or all files
that have e as their third character and print the results.
find . -name \*.txt -exec cat {} \; Search for all files that end in .txt and cat those files
that match.
tar
Originally for tape archive, tar is now mainly used to consolidate a group of files and directories into
one file, or to extract files from a previously tarred file. The way to consolidate a series of files in the
current directory into a single file called cons.tar would be to use the command tar -cvf cons.tar *.
Note that tar does not compress these files, it merely stores them into this one file. The switches at
the front of the command are short for create/verbose/file. When you consolidate a series of files
into a single tarred file, the originals are not deleted.
To extract the files from within the tarred file games.tar issue the command tar -xvf games.tar. All
of the files and directories stored within games.tar would be extracted into the current directory.
Tar will create directories while it is extracting so that the original hierarchy of files and directories
are preserved upon extraction.
gzip/gunzip
These two utilities are responsible for compressing and expanding large files to save disk space; the
syntax is very straightforward: gzip options filename or gunzip filename. Gzip compresses the
given file. One option frequently used is the speed/size compression switch. This switch is a
number -1 through -9. The -9 switch will compress the file into as small a file as the algorithm will
support taking as much time as is necessary. The -1 switch will compress the file as quickly as
possible at the cost of a little bit looser compression. Gzip will add a .gz extension to the original
file, and upon completion, it will erase the original file.
A good candidate for compression is a tarred file. Frequently, you will see a file ending in .tar.gz or
.tgz. This indicates a tarred file that also has been gzipped. Gunzip will uncompress a gzipped file.
You do not need to specify any options here as the decompression will be calculated from the base
file.
Another useful shortcut is the bang operator (!) or exclamation point. You can use this special
character to retype a line or portion of a line that you have previously typed either just before this
command line or further back in the history of your work in UNIX. You can use the history
command to list the last 20 or so commands that you have recently typed with a historical number
next to each command.
Typing !! it repeats the last command you entered. If you type !num it will repeat the num-th
command in your history. Also, using the !num command, you can specify a negative number which
will retype the command num commands before the present one. !char looks for the last command
starting with that character and retypes it. You may include more than one character to narrow
down the search. Lastly, you may specify portions of a command to repeat. !:num will repeat the
num-th parameter on the last command line. To repeat all of the parameters except the first you use
!:*. Some examples follow:
Job Control
UNIX is a multi-tasking operating system that allows multiple jobs to be running concurrently.
There are features of UNIX that let you control these jobs from commands entered at the shell
prompt or from the console.
First, a job is any application that you begin as a command from a shell prompt. Normally when you
execute an application, you continue to work with that program in this shell until it completes. Some
programs run so quickly that you just wait momentarily for them to finish executing. Some
commands, however, may take a long time to complete. These include compiling, database searches,
file retrieval via ftp, and so on. While these jobs are running, you would probably like to do
something else with the machine, for instance read mail or edit a file.
There are a couple ways to place jobs in the background. The first, which is the easiest, is to append
an ampersand (&) to the end of the command. This puts the job in the background upon execution
and the shell prompt returns immediately. Another way is to type Ctrl-z while the job is running in
the foreground. This suspends execution of the job. Then you can type in the command bg. That
will put the most recently stopped job in the background. If you are interested in killing the job
altogether while it is in the foreground, type Ctrl-c.
The jobs command will show a list of all of your jobs that are currently running in the background
along with their job number. If you wish to observe and interact with a job, you can put it into the
foreground by typing fg job_number. To stop, or suspend, a job from the job list, type stop
job_number,. To kill one of your jobs, type kill job_number.
Other useful job control commands that are worth looking into are ps, babysit, nohup, and nice. For
more information on job control, consult the CAEN Technote: Running Remote Jobs.
Important dotfiles that exist in your account include .login and .cshrc. The .login file is sourced
whenever you first log into a UNIX workstation. This file contains shell commands that may print
news messages, set up initial terminal specifications, set variables, or execute programs. The .cshrc
file is sourced every time a shell is started. This is helpful for setting local environment variables. All
of the standard CAEN dotfiles all reside in the /usr/local/skel directory and have the prefix std.
To make sure you have all of CAEN’s dotfiles in your account you can issue the redot (or
/usr/caen/bin/redot) command.
You can customize your X Windows environment by modifying the following three files in your
Public directory: .xsession, .Xresources and .mailcap. The .xsession file sets up the windows
when you first log in. The .Xresources file is used to set resource values (such as colors and scroll
bars). If these files do not exist, they can be copied from the /usr/local/skel directory to your
Public directory. The .mailcap file sets up your email environment.
You also will need to make sure there is a link from your home directory to each of these dotfiles.
Type ls -la to see if the links exist. To create a link if one does not exist, issue the ln command from
your home directory as follows:
ln -s Public/.Xresources .Xresources
Alternatively, you can issue the redot command to copy the standard files to your Public directory
and create the necessary links from your home directory.
AFS is composed of file servers and clients. A file server stores files that can be accessed by other
computers. These other computers are called clients of the server. AFS defines a protocol that
automatically determines which file server the client can find a desired file. AFS utilizes the Kerberos
authentication scheme to validate users’ rights to access files. Kerberos is an extension to the normal
UNIX authentication mechanism and significantly improves network security. Files are only
exchanged between a client and a server if both machines are able to recognize a valid electronic
authentication token (also known as a ticket.)
Two important AFS commands to remember are fs (file service )and pts (protection service).
For more information on AFS as well as the commands that service this system, consult CAEN
Technote: Setting AFS File Permissions and Creating and Managing AFS Groups.
klog
The klog command allows to you obtain or extend an AFS token (priviledges) for files in a particular
AFS cell. To obtain a token and ticket writing priviledges to CAEN files, type klog -t options, or
klog -t -cell engin.umich.edu. To obtain privileges for IFS files, type klog -t -cell umich.edu.
You may use klog to obtain privileges in other AFS cells (at MIT, CMU, etc.) if you have computer
access accounts there also.
The klog -tmp -setpag command is different from the klog command in two ways. The obvious
difference is that it doesn’t allow you to enter a cell name, but instead follows the same automatic
algorithm that the remote login and Xdm login programs use, with the end result being that you may
be authenticated in the engin.umich.edu or the umich.edu cells, depending on the password you
enter.
The other difference between klog and klog -tmp -setpag is considerably more complicated, and is
described in the next section.
A PAG is a group of UNIX processes (running programs) defined by a starting process (such as a
login shell or a sub-shell) and all of the processes that were started by that starting process. For
example, when you log in remotely, you are given a single UNIX shell (which displays the UNIX %
prompt for you and accepts commands). This shell, and all of the programs that you execute or start
within that shell, are part of the same PAG, unless you explicitly create a new PAG.
When you log in to a UNIX machine locally using the Xdm program, your .xsession file is executed
by a process. That process, and all of the programs that are started by that process (including all the
X applications that are started in your .xsession file) are part of the same PAG, unless you explicitly
create a new PAG.
There is one special PAG, which we will call the system PAG, which is the group of all processes that
aren’t part of another PAG. This will become important in a moment.
If you authenticate within a PAG, the privileges you receive are only usable by other processes in
that PAG. So, for example, if you log in and get a PAG and are authenticated in that PAG, your
privileges will only work while you remain in that PAG. If you log in again and receive a different
PAG, you will need to authenticate separately.
PAGs at CAEN
The CAEN Xdm, remote login program, and klog -tmp -setpag command all give you a new PAG
when they execute. Thus, each display login, each remote login using the remote login program, and
each klog -tmp -setpag shell have their own sets of AFS privileges. If you log in using the rlogin or
rsh commands, however, you are not given a new PAG, so your privileges are associated with the
system PAG. This causes some potentially confusing effects:
• If you log in using the Xdm program on a machine’s display and open two xterm windows,
then use klog -tmp -setpag in one of the xterm windows; the unlog and klog commands
given in one window will not affect the privileges you have in the other window.
• If you log in using the Xdm program, and then log in to the same machine remotely, you will
have different PAGs in the two login sessions, so your privileges for each login session will
be independent.
• If you start a program in a shell, then use klog -tmp -setpag in the same shell, the privileges
you obtain after using this will not affect the privileges of the program you started previously
because the klog -tmp -setpag command has created a new PAG.
• For comparison, if you start a program in a shell, then use klog in the same shell, the
privileges you obtain after using klog will affect the privileges of the program you started
previously because the klog command does not create a new PAG.
It is highly recommended that readers of this section experiment with these examples, and verify the
results predicted. This will familiarize you with how PAGs work, and may serve to explain otherwise
bizarre behavior that you have seen before. All CAEN UNIX workstations have the Xdm program
and the klog -tmp -setpag, klog, and unlog commands.
% groups
users
When you have a PAG, this group list will be preceded by two large numbers.
% groups
33536 32593 users
At least one of these two numbers will be different for different PAGs. Thus, if you see different
numbers given by separate groups commands, it means that the commands were executed in
separate PAGs. If the numbers are the same, then the commands were executed in the same PAG.
If you don’t see the numbers at all (as in the first example above), then you are using the system
PAG.
Further Information
There is a wealth of information about UNIX on the web. If you are interested in learning more
about UNIX, consult the online UNIX man pages. Local bookstores also have many introductory
books on UNIX; suggested titles include The UNIX Programming Environment by Kernighan and Pike
and the O’Reilly book UNIX in a Nutshell.