Ch7_IO Redirection and Text Processing Tools
Ch7_IO Redirection and Text Processing Tools
1
IOS203_Ch7
Objectives
Upon completion of this course, the student will be able to:
• Redirect I/O channels to files;
• Connect commands using pipes;
• Use tools for extracting, analyzing and manipulating text data.
Keywords
I/O Redirection, stdin, stdout, stderr, pipe, /dev/null, tr, tee, lpr, set, grep, egrep, sed,
REGEX, wc, head, tail, sort, uniq, cut, paste, diff, pr, uptime.
2
IOS203_Ch7
Linux has the ability to redirect a command's input, output and error data. It allows the input of
a program to come from any source, and the output to go to any source. Furthermore, the output
from one command can be fed directly to the input of another command through a pipe and a
filter can modify the stream.
1. Standard I/O Redirection
Linux provides three I/O channels to commands:
Note: Commands produce two kinds of output, normal output and error message output, and the
shell can redirect each of these separately.
Consider the following command and its output, which assumes that there is a file called file2 in
your home directory, but no file called file5:
Shell Redirection allows standard I/O channels to be redirected to/from a file. The following
table shows the common shell redirection operators:
3
IOS203_Ch7
The find comand illustrates the reason behind separating stdout from stderr, especially when run
as an unprivileged user. For example, the following find command will search for all files named
passwd in the /etc directory and its subdirectory. Usually, it shows so many “permission denied”
error messages:
In this situation, the stderr output is redirected to the /dev/null device that discards all data
written to it. Therefore, the error messages do not come to the terminal window. /dev/null has
unlimited storage, but nothing can be retrieved from it. Anything written to /dev/null will be lost
forever. For this reason, /dev/null can be useful to discard unwanted output from commands.
4
IOS203_Ch7
This command line redirects matching paths to the fout file, discards errors, and sends nothing to
the terminal window.
Note: If the fout file exist, it will be overwritten. If it does not exist, it will be created.
You can append the result of any other command in the same file fout, for instance:
[student@StudentHost ~]$ ls –l file1 fiel2 file5 >>fout 2> /dev/null
This command line adds non-error results to the end of the fout file, the errors to the /dev/null
device, and sends nothing to the terminal window. If the file does not exist, it will be created.
You can also redirect both stdout and stderr to the same fout file:
[student@StudentHost ~]$ find /etc -name passwd >fout 2>&1
The input of a command can be redirected from a file. As (>) is used for output redirection, the
(<) is used to redirect the input of a command. The commands that normally take their input
from the standard input can have their input redirected from a file. For example, the following
command counts the number of lines in the fout file generated above:
[student@StudentHost ~]$ wc –l <fout
or
[student@StudentHost ~]$ wc –l 0<fout
tr command
The tr command is another example for input redirection, it doesn’t accept filenames as
arguments and it requires its input to be redirected from somewhere:
This command translates the uppercase characters in .bash_profile to lowercase characters, and
the command line:
does the same thing, but redirects the result to the fout file.
5
IOS203_Ch7
To create a pipe, use the “|” character, and put stdout on the left, and stdin on the right. In order
to reduce the amount of information displayed on the terminal window, you can use multiple
pipes on one command line. For example, you can pipe the output of the cat command to less
which will show you only one scroll length of content at a time:
Suppose you wanted to run two commands back to back and send their output through a pipe
like this:
You would find that only the calendar 2020 was printed and the calendar 2019 went to the
terminal window. This can be solved by updating the command line in this way:
tee command
This command allows you to redirect the stdout to multiple targets. It receives information from
stdin, stores stdout of command1 in a file, then pipes to command2. For instance, in the next
command line, the output from set command is written to the file set.out while also being piped
to less:
[student@StudentHost ~]$ set | tee set.out | less
The set command sets or unsets shell variables. But, when used without any argument it will print
on the terminal window a list of all variables including environment and shell variables, and shell
functions.
6
IOS203_Ch7
While the date command shows the current date and time, the uptime command returns
information about how long your system has been running, number of users with running
sessions, and the system load averages for the past 1, 5, and 15 minutes. The option –a is used to
append the result to file2.
Linux has filters (e.g., grep, sed) to take the standard input, does something useful with it, and
then returns it as a standard output. Filters use Regular Expressions (or REGEX) for complex
searches.
grep command
grep stands for “global regular expression print”. It displays the lines in a file that match a pattern.
It can also process standard input. The pattern may contain regular expression metacharacters.
For example, the following command list lines containing ‘bash’ from the fout file.
Option Description
-v return lines that do not contain the pattern
-n precede returned lines with line numbers
-c only return a count of lines with the matching pattern
-l only return the names of files that have at least one line containing the pattern
-i perform a case-insensitive search
7
IOS203_Ch7
regular expressions
“Regular expressions” or (REGEX) are text-matching patterns written in a standard and well-
characterized pattern-matching language. They are a universal standard used by most programs
that do pattern matching, although there are minor variations among implementations. REGEX
parse and manipulate text. For example, the command:
shows the lines beginning with ‘root’ in the passwd file. The common symbols used with text
patterns to form REGEX are listed in the following table:
8
IOS203_Ch7
Symbol matches
. (period) any single occurrence of any character except a newline
[chars] any character from a given set
[^chars] any character not in a given set
^ the beginning of a line
$ the end of a line
\w any “word” character (same as [A-Za-z0-9_])
\d any digit (same as [0-9])
| either the element to its left or the one to its right
(expr) Limits scope, groups elements, allows matches to be captured
? zero or one match of the preceding element
* zero, one, or many matches of the preceding element
+ one or more matches of the preceding element
{n} exactly n instances of the preceding element
{min,} at least min instances (note the comma)
{min, max} any number of instances from min to max
REGEX examples
9
IOS203_Ch7
egrep (Extended Regular Expression) is a similar command, but it uses a more powerful set of
regular expressions; it behaves exactly like grep –E
‘ab+c’ matches ‘a’ followed by a ‘bc’ or more b’s followed by one ‘c’
sed command
sed stands for stream editor which is used to perform basic text transformations on an input
stream (a file or input from a pipeline). It is very helpful for using regular expressions to change
something in the text. For example, the following command substitutes the ‘BASH’ string with the
‘SH’ string:
This command replaces the first occurrence of ‘BASH’ on each line containing it. To replace all
occurrences of ‘BASH’, you should use the following command line:
sed makes no change to the original input file. It can shows some selected lines from a given file:
[student@StudentHost ~]$ sed –n ‘1,2p’ fout # prints the first two lines from the fout file
[student@StudentHost ~]$ sed –n ‘/Linux/p’ fout # prints any line containing Linux from the fout file
[student@StudentHost ~]$ sed –n ‘/1,2d/’ fout # deletes lines 1 and 2 from the fout file
Notes:
10
IOS203_Ch7
The head command outputs the first lines (default: 10) of files, while the tail command outputs
the last lines (default: 10) of files. Here are some examples:
wc command
This command prints line, word, and byte counts for a given file. It can perform such statistics for
files or output from other commands passed to it through pipe.
sort command
The sort command sorts lines of text files. Like the cat command, it can concatenate multiple files,
but it prints the sorted result of concatenation. The sort command has the following syntax:
11
IOS203_Ch7
Option Description
-r Reverse sort to sort descending
-n Numeric sort
-f Ignore case of characters in strings
-u Unique (remove duplicate lines in output)
-t ‘x’ Use ‘x’ as field separator
-k pos1 Sort from field pos1
By default, the sort command sorts the lines according to alphabetical order. However, it can sort
them according numeric order:
Consider you need to sort the /etc/passwd file according to the first field using ‘:’ as separator
and you are interesting in the first three lines only:
Consider you need to sort the files in your home directory according to their size (from smaller
to larger), and you need to know the biggest three files:
uniq command
If there is duplication in some lines, the uniq command detects the adjacent duplicate lines,
removes the repeated lines and keeps only one. The following table shows some options used
with this command:
12
IOS203_Ch7
Option Description
-u Print only unique lines
-d Print only duplicated lines
-c Prefix line with the number of its occurrences
13
IOS203_Ch7
Note: Because the uniq command only works on already sorted data, it is almost used in
conjunction with the sort command.
cut command
This command displays specific columns of a file or stdin.
Option Description
-d Specify the column delimiter(default is TAB)
-f Specify the column to print
-c Cut by characters
paste command
The paste command merges corresponding or subsequent lines of files. The general syntax for
the paste command is as follows:
14
IOS203_Ch7
You can use the option –d to specify a delimiter instead of TAB separator.
diff command
The diff allows you to compare two files line by line. It can also compare the contents of directories. It
is most commonly used to create a patch containing the difference between one or more files. For
example, if you want to compare c.txt with cc.txt which is new version of the same file, you could
write:
15
IOS203_Ch7
Remarks
pr command
This command converts text files into a paginated, columned version. If no file specified, pr read
standard input. By default, pr formats files into single-column pages of 66 lines. To print in
formatted form, you should pipe formatted document to lpr.
pr syntax is:
$ pr [options] [arguments]
fetchs a listing of all files in the current directory using the ls command, and pipe the output to pr,
which formats the data in a printer-friendly format with a custom header and numbered lines.
The formatted pr output is written to the file Result.txt, which can then be printed.
16
IOS203_Ch7
Questions
1. How many files found in the /usr/bin directory?
2. How many times the string conf appears in the file names of the /etc directory?
3. How many directories (not sub-directories) found in the /etc directory?
4. From the /etc/passwd file, display the line of any account that starts with the letter ‘C’
5. How many lines found in the /etc/passwd file
6. Display a list of usernames (and no other data) from the /etc/passwd file
7. From the /etc/passwd file, display the line for any account that is using the bash shell
8. From the /etc/passwd file, display the line for any account that is not using the bash shell
9. From the /etc/passwd file, display the lines that contain the word root. Display only the
filenames and do not print errors
10. Create a sorted list of all bash users and store it in users.txt.
11. Create a sorted list of all logged on users and store it in onUsers.txt.
12. Create a sorted list of all filenames stored in the /etc directory that contain the string conf at
the end of their filename.
13. Create a sorted list of all files stored in the /etc directory that contain the case insensitive
string conf in their filename.
14. Write a line that displays only ip address and the subnet mask from the /sbin/ifconfig file.
15. What command line should you type to remove all non-letters from a stream.
16. What command line should you type to receive a text file, and outputs all words on a separate
line.
17. What command line should you type to keep only small letters from a stream.
18. What command line should you type to keep only small letters and digits from a stream.
19. Create a sorted list of all users their UID greater than 510 and append it to the file users.txt.
20. Open two shells on the same computer. Create an empty story.txt file. Then type tail -f
story.txt. Use the second shell to append a line of text to that file. Verify that the first shell
displays this line.
References
[1] Red Hat Linux Essentials RH033-RHEL5-en-2-20070306
[2] Paul Cobbaut, “Linux Fundamentals”, https://linux-training.be/funhtml/index.html. Updated
on 2015-05-24
17