0% found this document useful (0 votes)
92 views40 pages

Simple Filters

The document discusses several Unix commands for filtering and manipulating text-based files including head, tail, cut, sort, uniq, tr, and pr; it provides examples of how to use each command's various options to view, extract, reorder, remove duplicates from, translate, and paginate the contents of files.

Uploaded by

Pranav Paste
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
92 views40 pages

Simple Filters

The document discusses several Unix commands for filtering and manipulating text-based files including head, tail, cut, sort, uniq, tr, and pr; it provides examples of how to use each command's various options to view, extract, reorder, remove duplicates from, translate, and paginate the contents of files.

Uploaded by

Pranav Paste
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 40

Simple filters

Head:Display the beginning of file


• The head command displays the top of the file
• When used without any option it displays the first
ten lines of the file

$ head emp.lst
• U can use –n option to specify the line count and
display the first three lines of the file
$ head –n 3 emp.lst
2233|a.k.shukla|g.m.|sales|12/12/52| 6000

9876|jaiSharma|director|production|03/12/50|7000

5678|sumit chakrobarty|d.g.m|marketing |04/19/43|6000


Tail:Displaying the end of file
Tail command displays the end of file
• Like head it displays last ten lines by default
• To display last three lines
$tail –n 3emp.lst
3564|sudhir Agarwal
2345|j.b.saxena
0110|v.k.agarwal
Tail:Displaying the end of file
• The +count option allows you to do
that ,where count represents a line number
from where the selection should begin

• Since the file contains last 15 lines,selecting


the last five line implies using
• $tail +11 emp.lst (11th line onwards,possible
with + symbol)
Tail options
Extracting bytes rather than lines(-c):-
• Tail supports –c option followed by positive or
negative integer depending on whether the
extraction is performed relative to the beginning or
end of file
• $tail –c -512 foo copies last 512 bytes from file

• $tail –c +512 foo copies everything after skipping


511 bytes
cut:cutting a file vertically
$head –n 5 shortlist
2233|a.k.shukla|g.m|sales|12/12/52|6000

9876|jaiSharma|director|production|03/12/50|7000

5678|sumitchakrobarty|d.g.m|marketing|04/19/43|600

2365|barunsengupta|director|personnel|05/11/47|780

5423|n.k.gupta|chairman|admin|08/30/56|5400
Cutting columns(-c)

• To extract specific columns you need to follow the


-c option with a list of column numbers,delimited
by comma

• Here’s how we extract the name and designation


from shortlist:

• $cut –c 6-22,24-32 shortlist


Cutting columns(-c)

a.k.shukla gm
jai sharma director
sumit chakrobrty d.g.m
barun sengupta director
n.k.gupta chairman

• cut also uses special form for selecting a column


from the beginning and up to the end of line

• $cut –c -3,6-22,28-34,55- shortlist


Cutting fields(-f)
• The –c option is useful for fixed length lines
• To extract useful data from these files you
need to cut fields rather than columns

cut uses the tab as default field delimiter ,but


can also work with other delimiter
• $cut –d “|” –f 2,3 shortlist
-d for field delimiter;-f for field list
a.k.shukla|g.m
jai sharma|director
sumit chakraborty|d.g.m
barun sengupta|director
n.k.gupta|chairman
Extracting user list from who output
• $who | cut –d “ “ –f1 space is delimiter
Root
Kumar
Sharma
Project
sachin
Sort:ordering a file

• Sorting is the ordering of data in ascending or


descending sequence.
• Sort command orders a file.
• Like cut it identifies fields and it can sort on
specified fields
• By default the entire line is sorted
• $ sort shortlist
2233|a.k.shukla|g.m|sales |12/12/52|6000
2365|barunsengupta|director|personnel|05/11/47|7800
5423|n.k.gupta|chairman|admin|08/30/56|5400
5678|sumitchakrobarty|d.g.m|marketing|04/19/43|6000
9876|jaiSharma|director|production|03/12/50|7000

• By default sort reorders the line in ASCII collating


sequence-whitespace first, then numerals, uppercase
letters and finally lowercase letters
Sort options
• Sort on primary key(-k):-
• $ sort –t “|” –k 2 shortlist

2233|a.k.shukla|g.m|sales |12/12/52|6000
2365|barunsengupta|director|personnel|05/11/47|7800
9876|jaiSharma|director|production|03/12/50|7000
5423|n.k.gupta |chairman|admin |08/30/56|5400
5678|sumit chakrobarty|d.g.m |marketing |04/19/43|6000
• The sort order can be reversed with –r(reverse)
option
• Sort –t “|” –r –k 2 shortlist

5678|sumitchakrobarty|d.g.m|marketing|04/19/43|6000
5423|n.k.gupta |chairman|admin |08/30/56|5400
9876|jai Sharma |director |production |03/12/50|7000
2365|barun sengupta |director |personnel|05/11/47|7800
2233|a.k.shukla |g.m |sales |12/12/52|6000
Sorting on columns:

• You can also specify character position within


a field to be the beginning of sort.

• If you want to sort the file according to year


of birth,then you need to sort on the seventh
and eighth column positions within the fifth
field
• $ sort –t “|” –k 5.7,5.8 shortlist
5678|sumitchakrobarty|d.g.m|marketing|04/19/43|6000
2365|barun sengupta|director|personnel |05/11/47|7800
9876|jai Sharma |director|production |03/12/50|7000
2233|a.k.shukla |g.m |sales |12/12/52|6000
5423|n.k.gupta |chairman|admin|08/30/56|5400
Numeric sort(-n)

• When you sort a file containing only numbers


what you get is this

• $ sort numfile
10
2
27
4
• So if you want to sort according to numbers
• $ sort –n numfile
2
4
10
27
Predict if want to reverse the order
Removing repeated lines(-u)
• The –u(unique)option lets you remove repeated lines
from a file
• If you “cut” out the designation field from shortlist
you can pipe it to sort to find out the unique
designations that occur in the file
• $ cut –d “|” –f3 shortlist | sort -u
chairman
d.g.m
director
executive
g.m
manager
Other sort options
• Even though sorts output can be redirected to
a file,we can use its –o option to specify the
output filename

• $sort –o sortedlist –k 3 shortlist


(output stored in sortedlist)
• sort –o shortlist shortlist
(output stored in same file)
• When sort is used with multiple filenames as
arguments, it concatenates them and sorts
them collectively.
• The –m(merge) option can merge two or
more files that are sorted individually

• $sort –m foo1 foo2 foo3


Uniq:locate repeated and nonrepeated lines

• When you concatenate or merge files you face


the problem of duplicate entries
• With the help of sort command you remove
that duplicate entries
• Unix offers one more command for this called
uniq
• $ cat dept.lst
01|accounts|6213
01|accounts|6213
02|admin|5243
03|marketing|6521
03|marketing|6521
03|marketing|6521
04|personnel|2365
05|production|9876
06|sales|1006
• Uniq simply fetches one copy of each line and
writes it to standard output
• $uniq dept.lst
01|accounts|6213
02|admin|5243
03|marketing|6521
04|personnel|2365
05|production|9876
06|sales|1006
Uniq requires a sorted file as input
Uniq options
Selecting the non repeated lines(-u):
• To determine the designation that occurs
uniquely in the emp.lst cut out the third
field,sort it,and then pipe it to uniq
• The –u(unique) option selects only lines that
are not repeated

• $cut –d “|” –f3 emp.lst| sort | uniq –u


chairman
Selecting the duplicate lines(-d):-
The –d(duplicate) option selects only one copy
of repeated lines
• $cut –d “|” –f3 emp.lst | sort | uniq –d

d.g.m
director
g.m.
Counting frequency of occurrence(-c):
• The –c (count) option displays the frequency
of occurrence of all the lines

$cut –d “|” –f3 emp.lst | sort |uniq –c


1 chairman
2 g.m
2 d.g.m
3 director
tr:translating characters
tr(translate) filter manipulates individual
characters in a line
• Syntax:
tr options exp1 exp2 standard input
• By default it translates each character in exp1
to its mapped counterpart in exp2
• The first character in the first expression is
replaced with the first character in second
expression
• $ tr ‘|/’ ‘~-’ <emp.lst | head –n 3

5678~sumitchakrobarty~d.g.m~marketing~04-19-43~6000

5423~n.k.gupta ~chairman~admin ~08-30-56~5400

9876~jai Sharma ~director ~production ~03-12-50~7000


tr:translating characters
tr also accepts ranges in expression

• Since tr does not accept a filename as


argument,the input has to be redirected from
a file or pipe.

Example:the foll sequence changes the case of


first three lines from lower to upper
$ head –n 3 emp.lst | tr ‘[a-z]’ ‘[A-Z]’

2233|A.K.SHUKLA|G.M|SALES|12/12/52|6000
9876|JAISHARMA|DIRECTOR|PRODUCTION|12/03/50|7000
5678|SUMIT CHK |D.G.M|MARKETING|19/04/43|6000

tr command is often used to change the case of file


contents
Tr options
• Deleting characters(-d):-
• If you want to store the date in ddmmyy format you
can use –d(delete)option to delete the characters

$ tr –d ‘|/’ <emp.lst | head –n 3

2233a.k.shukla g.m sales 121252 6000


9876jaiSharmadirectorproduction031250 7000
5678sumitchakrobartyd.g.mmarketing041943 6000
Pr:paginating files
• Pr command prepares the file for printing by
adding suitable headers,footers and formatted text
$ pr dept.lst
May 06 10:38 1997 dept.lst Page1
01|accounts|6213
02|admin|5243
03|marketing|6521
04|personnel|2365
05|production|9876
06|sales|1006
• The pr adds five lines of margin at top and five
at bottom
• The header shows the date and time of last
modification of the file along with the
filename and page number
Pr options
• Pr –k option prints in k columns –t option is
used for suppressing the headers and footers

$pr –t -5 a.out
0 4 8 12 16
1 5 9 13 17
2 6 10 14 18
3 7 11 15 19
• There are other various options such as:
• -d doublespaces input
• -n number of lines
• -o n:offsets the lines by n spaces,increases left
margin of page
• Combine these various options to produce just
the format you need:
• $ pr –t –n –d –o 10 dept.lst

1 01|accounts|6213

2 02|admin|5243

3 03|marketing|6521

4 04|personnel|2365

5 05|production|9876

6 06|sales|1006
• There is one option that uses a number prefixed
by a + to print from a specific page number
• -l option sets the page length

• pr +10 chap01 (starts printing from page 10)


• pr –l 54 chap01 (Page length set to 54 lines)

• If you want to use header use foll command


pr –h “Department list” dept.lst

You might also like