Simple Filters
By: Prof. Brijesha Rao
Assistant Professor,
IT Department,
DDU Nadiad
Filters:
head - Displaying the beginning of a file
tail - Displaying the end of a file
cut - Slitting a file vertically
paste - Pasting files
sort - Ordering a file
uniq - Locate repeated & nonrepeated lines
grep - scans its input for a pattern and display lines
containing that pattern.
sed - stream editor and it can perform lots of functions
on file like searching, find and replace, insertion or
deletion.
awk - simple command line filtering tool
head - Displaying the beginning of a file:
It displays the top of the file.
When used without an option, it displays the first
10 lines of the specified file.
Syntax : $ head [options] filename
Options:
-n –display first n lines
-c –display first n bytes of the file
Ex:- $ head data_list
$ head –n 3 data_list
$ head -3 data_list
$ head –c 50 data_list
$ vi `ls –t –l | head –n 1`
tail - Displaying the end of a file:
It just reverse of the head.
Syntax : $ tail [options] filename
Options:
-n –display last n lines
-c –extracts byte instead of lines
Ex:- $ tail data_list
$ tail –n 3 data_list
$ tail -3 data_list
$ tail –c -50 data_list
$ tail –c +50 data_list
$ tail –c 50 data_list
cut - Slitting a file vertically:
We are now able to cut and paste particular
characters or fields from the files, vertically not
horizontally.
Syntax : $ cut [options] filename
Options:
-c –to extract particular columns by characters
-b –to extract particular columns by bytes
-f –cutting fields
-d –use DELIM instead of TAB for field
delimiter
Ex:- $ cut –c 3-5, 15-18 data_list
$ cut –d \| -f 2,3 data_list
$ who | cut –d “ ” –f 1, 2
$ cat data_list | cut -d “|” -f 1,3
paste - Pasting files:
Whatever we have cut, we can paste it –but
vertically rather than horizontally.
Syntax : $ paste [options] file1 file2
Options:
-d –for specifying delimiters
-s –joins lines
Ex:- $ cut -d “|” -f 1,2 data_list | tee ab
$ cut -d “|” -f 3,4 data_list | tee ab1
$ paste ab ab1
$ paste -d “$” ab ab1
$ cut -d “|” -f 1,2,data_list | paste -d “#” ab -
$ cut -d “|” -f 1,2,data_list | paste -d “#” - ab
sort – Ordering a files:
It can sort on the specified fields.
Also there are so many options for this
command.
Syntax : $ sort [options] filename
Options Description
-t char Uses delimiter character to identify fields.
-k n Sort on nth field.
-k m,n Starts sort on mth field & ends sort on nth
field.
-k m.n Starts sort on nth column of mth field.
-u Removes repeated lines
-n Sorts numerically
-r Reverses sort order.
-f Case –insensitive sort
-c Checks if file is sorted
-o f_name Places output in file f_name
Options:
-k – sort on specified field.
Ex:- $ sort -t “|” –k 2 data_list
-r – reversed sort order
Ex:- $ sort -t “|” -r –k 2 data_list
-k m,n – sort start from mth field & ends at nth
field.
Ex:- $ sort -t “|” –k 3,3 –k 2,2 data_list
-k m.n – sort on nth column of mth field.
Ex:- $ sort -t “|” –k 4.7,4.8 data_list
-n – sort on numbers
Ex:- $ sort -n data_list
-u – removing repeated lines
Ex:- $ cut –d “|” –f 3 data_list | sort –u
-o f_name – stores the output in f_name.
Ex:- $ sort –o abc –t “|” –k 3 data_list
-c – to check that file is sorted or not.
Ex:- $ sort -c data_list
Ex:- $ sort –t “|” –c –k 2 data_list
uniq – Locate repeated & nonrepeated lines:
When you merge files, you‟ll face the problem of
duplicate entries.
But we are having a command „uniq‟ which just
display the unique lines in the „sorted‟ files.
Syntax: $ uniq [options] filename.
Options:
-u –selecting the nonrepeated lines
-d –selecting the duplicate lines
-c –counting frequency of occurrence
Ex:- $ cut –d “|” –f 3 data_list | sort | uniq –u
Ex:- $ cut –d “|” –f 3 data_list | sort | uniq –d
Ex:- $ cut –d “|” –f 3 data_list | sort | uniq –c
Advance Filters: grep & sed
grep:
grep scans its input for a pattern and display
lines containing that pattern.
When used with different options, it can also
display line numbers or filenames containing
the required pattern.
Syntax:
grep options pattern filename(s)
Ex: grep “abc” std_db
As it is a filter, it can also work with the standard
input and search for the desired pattern from the
standard input.
It can also save the standard output in a file.
Note: We could write the pattern without the
quotes, but it is safe to use either double or single
quotes while writing the pattern.
Ex: grep bbb patel std_db
Ex: grep “bbb patel” std_db
When grep doesn't match the pattern, it would
silently return the prompt.
When grep is used with multiple filenames, it
would display the respective filenames along
with the output.
grep Options:
Ignoring case (-i) :
When you are not sure of the case of the
required pattern, you could use the –i option.
Ex: grep -i “agarwal” std_db
Deleting Lines (-v)
To inverse the role of grep, i.e. to select all the
lines except those containing the pattern, you
can use the -v option.
Ex: grep -v “agarwal” std_db
Displaying Line Numbers (-n)
When you want to display the line numbers
containing the pattern, you can use the –n
option.
Ex: grep -n “agarwal” std_db
If you want to extract only the line numbers
containing the pattern, you can use cut along
with this.
Counting Lines containing Pattern (-c):
If you want to know the total lines which are
containing the pattern, you can use the –c
option.
Note: This count is different from the number
of occurrence of that pattern.
Example:
grep -c “professor” *.txt
cat *.txt | grep –c “professor”
Displaying Filenames (-l):
The -l (list) option displays only the names of
the files containing the pattern.
Example:
grep -l “professor” *.txt
Matching Multiple Patterns (-e):
If you want to match multiple patterns, like
agarwal, aggarwal, Agrawal, etc., then you need
to use -e option.
Example:
grep -e “agarwal” -e “Agrawal” f1
Taking Patterns from a file (-f) :
If you want to explicitly mention each pattern,
you have the option to store them in a file and
use that filename instead of the patterns.
Example:
File – patternfile (one pattern per line)
agarwal
Agrawal
grep -f patternfile f1
Basic Regular Expression (BRE)
Like the Shell's Wild-Card Characters, grep uses
an expression of a different type to match a
group of similar patterns.
However, unlike Wild-Cards, this expression is a
feature of the command that uses it and has
nothing to do with the shell.
If an expression uses any of the below listed
characters, it is termed as a Regular Expression.
Regular Expressions belong to two categories:
(i) Basic Regular Expressions
(ii) Extended Regular Expressions.
grep supports Basic Regular Expressions (BRE)
by default and Extended Regular Expressions
(ERE) with the -E option.
sed supports only the BRE set.
• BRE Character Set:
* : Zero or more occurrences of the
previous characters
a* : Nothing or a or aa or aaa, etc.
. : A single Character
.* : Nothing or any no. of Characters
[ijk] : A single Character either i, j or k
[x-z] : Any single character between x & z
[^x-z] : Any single character not between x &
z
^abc : Pattern abc at beginning of the line
abc$ : Pattern abc at end of the line
^abc$ : abc as the only word in the line
^$ : Line contains nothing
Examples:
If you want to match Agarwal and agrawal both,
you could use the below expression:
[aA]g[ar][ar]wal
grep “[aA]g[ar][ar]wal” f1
Note here that the expression [ar][ar] here
matches four pattenrs – ar, aa, ra & rr but only
two patterns are of importance to us.
Examples :
If you want to match aggarwal in addition to
Agarwal and agrawal, you could use the asterisk
in your expression:
grep “[aA]gg*[ar][ar]wal” f1
As * means either zero or any no. of
occurrences of the previous character, it works
fine here. But it would not work if used as Wild-
Cards.
Example:
While the shell uses ' ? ' to match a single
character, BRE set has ' . ' (dot) to match a single
character.
Ex: grep “emp*.c” f1
emp1.c
emp2.c
And so on....
Ex: grep “ a.* aggrwal” f1
Examples:
If you want all the lines beginning with Hello,
you could use –
grep “Hello” f1
But would it be correct? - No
Because Hello could occur anywhere in the line.
So you need to use –
grep “^Hello” f1
Similarly use $ for the end of line matching.
Examples:
If you want to reverse your search and search
for all the lines not containing H in the
beginning, then the expression would be:
grep “^[^H]” f1
Hence, the caret (^) has three roles to play.
1) [^abc] – Not a, b or c
2) ^abc – abc as the beginning of the line
3) a^b – Here it matches literally
Examples:
$ ls -l | grep “^d”
$ grep “5...$” f1
Examples:
The ‘ – ’ loses its meaning when not used
properly or when used outside the char class.
The ‘ . ’ and ‘ * ’ loses their special meaning
when placed inside the character class.
If ‘ * ’ is the first character of expression, it is
matched literally.
Extended Regular Expression (ERE)
a+ : Matches one or more occurrences of a
a? : Matches zero or one occurrence of a
Exp1 | Exp2 : Matches either exp1 or exp2
Expression
(x1|x2)x3 : Matches either x1x3 or x2x3
Examples :
The characters + and ? restrict the scope of
match as compared to the * For matching
Agarwal and Aggarwal, we can use the
expression –
Agg*arwal
But this would also match Aggggggarwal.
To restrict this, we could use the expression –
Agg?arwal
Usage: grep -E “Agg?arwal” f1
Examples :
For matching two strings – foolish or girlish, we
could use two expressions with pipe:
1) foolish | girlish
2) (foo | gir) lish
• sed- stream editor and it can perform lots of
functions on file like searching, find and replace,
insertion or deletion.
• works well with character-based processing.
• Example1- sed -n „/hello/p‟ file1.
• This command will display all the lines which
contains hello.
• Example2- sed „s/hello/HELLO/‟ file1.
• This command will substitute hello with HELLO
everywhere in the file.
• Example3- sed „/hello/,+2d‟ file1.
• This command will delete the two lines starting with
the first match of „hello‟
• awk- it is a simple command line filtering tool.
• awk is mostly used for pattern scanning and
processing. It searches one or more files to see
if they contain lines that matches with the
specified patterns and then perform the
associated actions.
• Syntax:
• awk 'script' filename
• Where 'script' is a set of commands that are
understood by awk and are execute on file,
filename.
• $ awk '/manager/ {print}' employee.txt
• $ awk '{print $1,$4}' employee.txt