0% found this document useful (0 votes)
31 views45 pages

SW LAB 10 Filter

The document discusses simple Linux filters such as head, tail, cut, paste, sort, uniq, grep, and sed which can be used to view, extract, modify, and search text in files. It provides the syntax and common options for each filter, with examples of how to use them to display parts of files, sort lines, find patterns, and more. Advanced regular expressions are also covered that allow matching multiple patterns with a single expression.

Uploaded by

vaidikkumar2508
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views45 pages

SW LAB 10 Filter

The document discusses simple Linux filters such as head, tail, cut, paste, sort, uniq, grep, and sed which can be used to view, extract, modify, and search text in files. It provides the syntax and common options for each filter, with examples of how to use them to display parts of files, sort lines, find patterns, and more. Advanced regular expressions are also covered that allow matching multiple patterns with a single expression.

Uploaded by

vaidikkumar2508
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

Simple Filters

By: Prof. Brijesha Rao


Assistant Professor,
IT Department,
DDU Nadiad
Filters:

 head - Displaying the beginning of a file


 tail - Displaying the end of a file
 cut - Slitting a file vertically
 paste - Pasting files
 sort - Ordering a file
 uniq - Locate repeated & nonrepeated lines
 grep - scans its input for a pattern and display lines
containing that pattern.
 sed - stream editor and it can perform lots of functions
on file like searching, find and replace, insertion or
deletion.
 awk - simple command line filtering tool
head - Displaying the beginning of a file:
It displays the top of the file.
When used without an option, it displays the first
10 lines of the specified file.
Syntax : $ head [options] filename

Options:
 -n –display first n lines
 -c –display first n bytes of the file
Ex:- $ head data_list
 $ head –n 3 data_list
 $ head -3 data_list
 $ head –c 50 data_list
 $ vi `ls –t –l | head –n 1`
tail - Displaying the end of a file:
It just reverse of the head.
 Syntax : $ tail [options] filename

Options:
 -n –display last n lines
 -c –extracts byte instead of lines
Ex:- $ tail data_list
 $ tail –n 3 data_list
 $ tail -3 data_list
 $ tail –c -50 data_list
 $ tail –c +50 data_list
 $ tail –c 50 data_list
cut - Slitting a file vertically:
We are now able to cut and paste particular
characters or fields from the files, vertically not
horizontally.
Syntax : $ cut [options] filename

Options:
 -c –to extract particular columns by characters
 -b –to extract particular columns by bytes
 -f –cutting fields
 -d –use DELIM instead of TAB for field
delimiter
Ex:- $ cut –c 3-5, 15-18 data_list
 $ cut –d \| -f 2,3 data_list
 $ who | cut –d “ ” –f 1, 2
 $ cat data_list | cut -d “|” -f 1,3
paste - Pasting files:
Whatever we have cut, we can paste it –but
vertically rather than horizontally.
Syntax : $ paste [options] file1 file2

Options:
 -d –for specifying delimiters
 -s –joins lines
Ex:- $ cut -d “|” -f 1,2 data_list | tee ab
 $ cut -d “|” -f 3,4 data_list | tee ab1
 $ paste ab ab1
 $ paste -d “$” ab ab1
 $ cut -d “|” -f 1,2,data_list | paste -d “#” ab -
 $ cut -d “|” -f 1,2,data_list | paste -d “#” - ab
sort – Ordering a files:
It can sort on the specified fields.

Also there are so many options for this


command.

 Syntax : $ sort [options] filename


Options Description
-t char Uses delimiter character to identify fields.
-k n Sort on nth field.
-k m,n Starts sort on mth field & ends sort on nth
field.
-k m.n Starts sort on nth column of mth field.
-u Removes repeated lines
-n Sorts numerically
-r Reverses sort order.
-f Case –insensitive sort
-c Checks if file is sorted
-o f_name Places output in file f_name
Options:
 -k – sort on specified field.
Ex:- $ sort -t “|” –k 2 data_list
 -r – reversed sort order
Ex:- $ sort -t “|” -r –k 2 data_list
 -k m,n – sort start from mth field & ends at nth
field.
Ex:- $ sort -t “|” –k 3,3 –k 2,2 data_list
 -k m.n – sort on nth column of mth field.
Ex:- $ sort -t “|” –k 4.7,4.8 data_list
 -n – sort on numbers
Ex:- $ sort -n data_list
-u – removing repeated lines
Ex:- $ cut –d “|” –f 3 data_list | sort –u
 -o f_name – stores the output in f_name.
Ex:- $ sort –o abc –t “|” –k 3 data_list
 -c – to check that file is sorted or not.
Ex:- $ sort -c data_list
Ex:- $ sort –t “|” –c –k 2 data_list
uniq – Locate repeated & nonrepeated lines:
When you merge files, you‟ll face the problem of
duplicate entries.
But we are having a command „uniq‟ which just
display the unique lines in the „sorted‟ files.
Syntax: $ uniq [options] filename.

Options:
 -u –selecting the nonrepeated lines
 -d –selecting the duplicate lines
 -c –counting frequency of occurrence
Ex:- $ cut –d “|” –f 3 data_list | sort | uniq –u
Ex:- $ cut –d “|” –f 3 data_list | sort | uniq –d
Ex:- $ cut –d “|” –f 3 data_list | sort | uniq –c
Advance Filters: grep & sed
 grep:
 grep scans its input for a pattern and display
lines containing that pattern.
 When used with different options, it can also
display line numbers or filenames containing
the required pattern.

 Syntax:
 grep options pattern filename(s)
 Ex: grep “abc” std_db
 As it is a filter, it can also work with the standard
input and search for the desired pattern from the
standard input.
 It can also save the standard output in a file.
 Note: We could write the pattern without the
quotes, but it is safe to use either double or single
quotes while writing the pattern.
 Ex: grep bbb patel std_db
 Ex: grep “bbb patel” std_db
 When grep doesn't match the pattern, it would
silently return the prompt.
 When grep is used with multiple filenames, it
would display the respective filenames along
with the output.
 grep Options:
 Ignoring case (-i) :
 When you are not sure of the case of the
required pattern, you could use the –i option.
 Ex: grep -i “agarwal” std_db
 Deleting Lines (-v)
 To inverse the role of grep, i.e. to select all the
lines except those containing the pattern, you
can use the -v option.
 Ex: grep -v “agarwal” std_db
 Displaying Line Numbers (-n)
 When you want to display the line numbers
containing the pattern, you can use the –n
option.
 Ex: grep -n “agarwal” std_db
 If you want to extract only the line numbers
containing the pattern, you can use cut along
with this.
 Counting Lines containing Pattern (-c):
 If you want to know the total lines which are
containing the pattern, you can use the –c
option.
 Note: This count is different from the number
of occurrence of that pattern.
 Example:
 grep -c “professor” *.txt
 cat *.txt | grep –c “professor”
 Displaying Filenames (-l):
 The -l (list) option displays only the names of
the files containing the pattern.

 Example:
 grep -l “professor” *.txt
 Matching Multiple Patterns (-e):
 If you want to match multiple patterns, like
agarwal, aggarwal, Agrawal, etc., then you need
to use -e option.

 Example:
 grep -e “agarwal” -e “Agrawal” f1
 Taking Patterns from a file (-f) :
 If you want to explicitly mention each pattern,
you have the option to store them in a file and
use that filename instead of the patterns.

 Example:
 File – patternfile (one pattern per line)
 agarwal
 Agrawal
 grep -f patternfile f1
 Basic Regular Expression (BRE)
 Like the Shell's Wild-Card Characters, grep uses
an expression of a different type to match a
group of similar patterns.
 However, unlike Wild-Cards, this expression is a
feature of the command that uses it and has
nothing to do with the shell.
 If an expression uses any of the below listed
characters, it is termed as a Regular Expression.
 Regular Expressions belong to two categories:
 (i) Basic Regular Expressions
 (ii) Extended Regular Expressions.
 grep supports Basic Regular Expressions (BRE)
by default and Extended Regular Expressions
(ERE) with the -E option.
 sed supports only the BRE set.
• BRE Character Set:
 * : Zero or more occurrences of the
previous characters
 a* : Nothing or a or aa or aaa, etc.
 . : A single Character
 .* : Nothing or any no. of Characters
 [ijk] : A single Character either i, j or k
 [x-z] : Any single character between x & z
 [^x-z] : Any single character not between x &
z
 ^abc : Pattern abc at beginning of the line
 abc$ : Pattern abc at end of the line
 ^abc$ : abc as the only word in the line
 ^$ : Line contains nothing
 Examples:
 If you want to match Agarwal and agrawal both,
you could use the below expression:
 [aA]g[ar][ar]wal
 grep “[aA]g[ar][ar]wal” f1

 Note here that the expression [ar][ar] here


matches four pattenrs – ar, aa, ra & rr but only
two patterns are of importance to us.
 Examples :
 If you want to match aggarwal in addition to
Agarwal and agrawal, you could use the asterisk
in your expression:
 grep “[aA]gg*[ar][ar]wal” f1

 As * means either zero or any no. of


occurrences of the previous character, it works
fine here. But it would not work if used as Wild-
Cards.
 Example:
 While the shell uses ' ? ' to match a single
character, BRE set has ' . ' (dot) to match a single
character.
 Ex: grep “emp*.c” f1
 emp1.c
 emp2.c
 And so on....
 Ex: grep “ a.* aggrwal” f1
 Examples:
 If you want all the lines beginning with Hello,
you could use –
 grep “Hello” f1
 But would it be correct? - No
 Because Hello could occur anywhere in the line.
So you need to use –
 grep “^Hello” f1
 Similarly use $ for the end of line matching.
 Examples:
 If you want to reverse your search and search
for all the lines not containing H in the
beginning, then the expression would be:
 grep “^[^H]” f1
 Hence, the caret (^) has three roles to play.
 1) [^abc] – Not a, b or c
 2) ^abc – abc as the beginning of the line
 3) a^b – Here it matches literally
 Examples:
 $ ls -l | grep “^d”

 $ grep “5...$” f1
 Examples:
 The ‘ – ’ loses its meaning when not used
properly or when used outside the char class.
 The ‘ . ’ and ‘ * ’ loses their special meaning
when placed inside the character class.
 If ‘ * ’ is the first character of expression, it is
matched literally.
 Extended Regular Expression (ERE)
 a+ : Matches one or more occurrences of a
 a? : Matches zero or one occurrence of a
 Exp1 | Exp2 : Matches either exp1 or exp2
Expression
 (x1|x2)x3 : Matches either x1x3 or x2x3
 Examples :
 The characters + and ? restrict the scope of
match as compared to the * For matching
Agarwal and Aggarwal, we can use the
expression –
 Agg*arwal
 But this would also match Aggggggarwal.
 To restrict this, we could use the expression –
 Agg?arwal
 Usage: grep -E “Agg?arwal” f1
 Examples :
 For matching two strings – foolish or girlish, we
could use two expressions with pipe:
 1) foolish | girlish
 2) (foo | gir) lish
• sed- stream editor and it can perform lots of
functions on file like searching, find and replace,
insertion or deletion.

• works well with character-based processing.


• Example1- sed -n „/hello/p‟ file1.
• This command will display all the lines which
contains hello.
• Example2- sed „s/hello/HELLO/‟ file1.
• This command will substitute hello with HELLO
everywhere in the file.
• Example3- sed „/hello/,+2d‟ file1.
• This command will delete the two lines starting with
the first match of „hello‟
• awk- it is a simple command line filtering tool.
• awk is mostly used for pattern scanning and
processing. It searches one or more files to see
if they contain lines that matches with the
specified patterns and then perform the
associated actions.
• Syntax:
• awk 'script' filename
• Where 'script' is a set of commands that are
understood by awk and are execute on file,
filename.
• $ awk '/manager/ {print}' employee.txt
• $ awk '{print $1,$4}' employee.txt

You might also like