Unix Unit 2 Part 2
Unix Unit 2 Part 2
Simpl~ Fi_lters
This chapter features the simple filters of the system-commands which accept data from standard
input, manipulate it and write the results to standard output. Filters are the central tools of the
UNIX tool kit, and each filter featured in this chapter performs a simple function. This chapter
shows their use both in standalone mode and in combination with other tools using redirection
and piping.
Many UNIX files have lines containing fields-strings of cha~acters representing a meaningful
entity. Some commands expect these fields to be separated by a -suitable delimiter that's not used by
the data. Typically this delimiter is-.a : (as in /etc/passwd and $PATH), but we have used the I (pipe)
as delimiter for some of the sample files in this and other chapters. Many filters work well with
delimited fields, and some simply won't work without them.
WHAT You WILL LEARN
• Use pr to format text to provide margins and headers, doublespacing and multiple column
output. • •
• Pick up lines from the beginning with head, and from the end with ta11.
• Extract characters or fields with cut.
• Join two files laterally, and multiple lines to a single line with paste.
• Sort, merge and remove repeated lines with sort.
• Find out the unique and nonunique lines with uniq.
• Change, delete or squeeze individual characters with tr.
This is a text file designed in fixed format and containing a personnel database. There arc 151'r
in the file, where each line has six fields separated from one another by the delimiter 1- Thedrol
of an employee arc stored in one line. A person is identified by the emp-id, name, dcsigoatn
department, date of birth and salary, as indicated by the fields (in the same order). You'll bcwil
~is ~les, ~r ones derived from them, in various ways to see the extent of manipulation tb;
possible with the UNIX tool kit. ;1
\ '
Simple Filters ~29 I
12.2 pr: PAGINATING FILES
The pr command prepares a file for printing by adding suitable headers, footers and formatted
text. A simple invocation of the command is to use it with a filename as argument:
$ pr dept.1st
12.2. 1 pr Options
pr's -k option (where k is an integer} prints in k columns. If a program outputs a series of 20
numbers, one in each line, then this option can make good use of the screen's empty spaces. And
because pr is a filter, it can obtain its input from the standard output of another program. Let's use
the -t option also to suppress the headers and footers:
$ a.out I pr -t -5
0 4 8 12 16
1 5 9 13 17
2 6 10 14 18
3 7 11 15 19
If you are not using the -t option, then you can have a header of your choice with the -h option.
This option is followed by the header string. There are some more options that programmers will
find useful:
• -d Doublespaces input, reduces clutter.
• -n Numbers lines, helps in debugging code.
• -on Offsets lines by n spaces, increases left margin of page.
Combine these various options to produce just the format you need:
$ pr -t -n -d -o 10 dept.1st
1 01:accounts:6213
2 02:admin:5423
G3ol UNIX: Concepts and Applicatiol)S
3 03:marketing:6521
4 04:personnel:2365
5 05:production:9876
6 06:sales:1006
There's one option that uses a number prefixed by a+ to print from a specific page numbctAo,,
option (-1) sets the page length: •
pr +10 chapOl Starts pn'ntingfrom page 10
pr -1 54 chapOl Page length set to 54 lines
Because pr formats its input by adding margins and a header, it's often used as a "pre-pr~
before printing with the lp command:
pr -h "Department list" d~pt.lst I lp Use l pr in Linux
Since pr output often lands up in the hard copy, pr and 1p form a common pipeline sequCltt
Note: fur numbering lines, you can also use the nl command (not covered in this edition). ft'seai(
•
Tip: Use tai 1 -f when you are running a program that continuously writes to a file, and you want to see
how the file is growing. You have to terminate this command with the interrupt key. •
J
@ UNiX: Concepts and Applications
Extracting User List from who Output cut can be used to extracl the first word of a line by
specifying the space as the delimiter. The example used in Section 3.10 now run in tandem with
cut displays the list of users only:
$ who I cut -d • • -fl Space is the delimiter
root
kumar
shanna
project
sachin
cut is a powerful text manipulato r often used in combination with other commands or fiJters.
You'll be using the command a number 9f times in this text.
Note: You must indicate to cut whether yop are extracting fields or columns. One of the options -f and
-c must be specified. These options are r lly not optional; one or them is compulsory.
This sorts the file by designation and name. - k 3 • 3 indicates that sorting starts on the thirJ
and ends on the same field.
Sorting on Columns You can also sp~cify a charactbr position within a field to be the be~
ofsort. If you are to sort the file according to the year/of birth, then you need to sort on the
and eighth column positions within the fifth field:
$ sort -t•j• -k 5.7.5.8 shortlist
5678lsumit chakrobartyld.g .m. !marketing 19/04/4316000
23651 barun sengupta Idi rector Ipersonnel 11/PS/4717800
9876ljai shanna !director !production 12/03/50l7000
2233 Ia. k. shukl a Ig .m. Isales I 12/12/52 I6000
5423ln.k. gupta lchainnan ladmin 13_0/08/5615400
The -k option also uses the form -km .n where n is the character position in the mth fi<~I
5.7,5.8means that sorting starts on column 7 of the fifth field and ends on column 8.
Numeric Sort (-n) When sort acts on numerals, strange things can happen. Whenyousonal
containing only numbers, you get a curious result: •
$ sort numfile
10
2
• 27
' 4
This is probably not what you expected, but the ASCII collating sequence places 1aboit.
and 2 above 4. That's why 10 preceded 2 and 27 preceded 4. This can be overridden byrk-
(numcric) option:
$ sort -n n1111ftle
2
4
10
27
&moving Repeated Lines (-u) The -u (unique) option lets you remove repeated lincsfioo
file. If you "cut" out the designation field from emp.1 st, you can pipe it to sort to find mii
unique designations that occur in the file:
$ cut -d•I• -f3 eap.lst I sort -u I tee destgx.lst
chainnan
d.g.m.
di rector
executive
g.m.
manager
We used three commands to solve a text manipulation problem. Herc, cut select the thirtl6i
from shortl 1st forsort to work on. -
,·
Simple Filters ,r
I
mi
Other sort Option s Even thoug h sort's output can be redirected to a file, we can use' its -o
filenam es can even
option to specif y the outpu t filenam e. Curiou sly enoug h, the input and output
• . '
be the same:
sort -o sorted l i st -k 3 short list Outpu t stored in sorted l i st •
sort -o short list short list Outpu tstored insam efile
the -c (check ) option :
To check wheth er the file has actual ly been sorted in the defaul t order, use
$ sort -c short list
$ File i.s sorted
field is sorted:
. You can also add the -k option to the above to check wheth er a specific
$ sort -t•1• -c -k Z short list
sort: short list:2 : disord er: 2365lb arun sengup ta !direc tor !perso nnel 111/05 /4
717800
them and sorts them
When sort is used with multip le filenam es as argum ents, it concat enates
suffers. The -m (merge )
collectively. When large files arc sorted in this way, perfor mance often
option can merge two or more files that are sorted individ ually:
sort -m fool foo2 foo3
only if the three files are
This comm and will run faster than the one used withou t the -m option
sorted .
Tip: Comm it to memo ry the defaul t delimit er used by cut, paste and
sort. cut and paste use the tab,
but sort uses a contig uous string of spaces as a single delimiter.
/ementing Values ofExp~esSion (-c) Finally, ·the -c (compl ement) option comple ments the
C0111{,haracters i~ the expresSlOn. Thus, to delete all characters except the I and/, you can combin e
set 0 d -c options : .
the -d an
cd , I/ ' <emp. 1st
1, HIi 11111 / / 1111' ,' ( ( ,' ,1)) ,' ( /.11111 / / I I I 11 / / I I I I II/• 11111 / / 11111 / / 11111 / / 11111 / / 1111 .
l//11111//11111/ -
ance
• l output indeed! tr has deleted all charact ers except the I and / from the file. The appear
• d. d f
Vnusua th
th rornpt at e imme iate en °
output shows that the newlin e charact er has also not been
of edp We'll use the -c and. -d optibns to place each word in a separat e line in our examp_le
spare • .
·on (12.10).
secU
• ASCII Octal Values and Escape Sequences Like echo, tr also uses octal values and escape
c ·1· · y suited for using nonprintab le c h aracter s
• speciall
uences to represe nt c h aracter s.
V.1ing Th"1s 1ac1 1ty 1s
::ie expression. So to have each field on a separat e line, replace the I with the LF charac
ter (octal
value 012):
$ tr • I ' '\012' < emp. 1st I head -n 6 Can also use \ n
instead of\012
2233
a.k. shukla
g.m.
sales
12/12/52
6000
s
Ifyou reverse the two expressions, you'll make the newline charact er ~isible. Study these tr option
of the tr
closely, and you'll discover many areas where you can apply them. We'll be using some
• •
options in the exampl e that's conside red next.