0% found this document useful (0 votes)
54 views30 pages

Unit - IV

Filters like grep, sed, and awk are useful for searching, filtering, and processing text. Grep searches for patterns in files. Sed is a stream editor that performs text transformations on each line. Awk splits input into fields and performs actions on lines that match patterns, allowing for selection, processing, and rearrangement of columnar data. These filters are often combined in pipelines to efficiently extract and manipulate information from text files.

Uploaded by

Siva Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views30 pages

Unit - IV

Filters like grep, sed, and awk are useful for searching, filtering, and processing text. Grep searches for patterns in files. Sed is a stream editor that performs text transformations on each line. Awk splits input into fields and performs actions on lines that match patterns, allowing for selection, processing, and rearrangement of columnar data. These filters are often combined in pipelines to efficiently extract and manipulate information from text files.

Uploaded by

Siva Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 30

Unit -IV

Filters
Contents
 The Grep Family
 Other Filters
 The stream editor sed
 The awk pattern scanning and processing
language
 Good Files and Good Filters
The Grep Family

• searches the named pipes or the standard input and prints each
line that contains an instance of the pattern
• Patterns are a slightly restricted form of the string specifiers called regular
expressions

• The option –n prints line numbers,-v inverts the sense of the test and –y
makes lower case letters in the pattern match letters of either case in the file
• The meta characters ^ and $ anchor the pattern to the beginning(^) and
end($) of the line.
• For example prints lines that begin with from, which are more
likely to be message header lines.
• grep supports character classes. so [a-z] matches any lower case letter.[^0-9]
matches any non-digit.
• A period ‘.’ matches any character.

• The closure operator ‘*’ matches applies to the previous character or metacharacters
in the expression, and collectively they match any number of successive matches of
the character or metacharacter.
– For example x* matches a sequence of x’s as long as possible
– [a-zA-Z]* matches an alphabetic string
– .* matches anything up to a newline
– .*x matches anything upto and including the last x on the line
• Closure applies to only one character
• No grep expression matches a newlne
• This command searches for users without passwords
• fgrep searches for many string literals simultaneously
• egrep interprets true regular expression but with an
or operator and parenthesis to group expressions
• Both egrep and fgrep accept a –f option to specify a
file from which to read the pattern
• There are two other closure operators in egrep- +
and ?.
– The pattern x+ matches one or more x’s
– The pattern x? Matches zero or one x
• egrep is excellent at word games that involves searching the
directory for words with special properties
– To find all words of six or more letters that have the letters in alphabetical
order

• Why are three grep patterns?


– fgrep interprets no metacharacters, but can look efficiently thousands of
words in parallel and thus is used primarily for bibliographic searches.
– egrep interprets more general expressions and runs signinficantly faster.
Other Filters

• Sort

• Given a list of words, one per line, the command prints


the unique words
• Uniq –d prints only those lines that are duplicated.
• Uniq –u prints only those that are unique i.e. Not
duplicated
• Uniq –c counts the number of occurrences of each line
• comm command is a file comparison program
• print only those lines that are in both files.
• print lines that are in the first file but not in second file
• This is useful for comparing directories and for comparing a word list with a
dictionary
• tr command transliterates the characters in its input.By far the most
common use of tr is case conversion

• dd command will do the case conversion from ASCII to EBCDIC and


viceversa

• dd command is intended primarily for processing tape data from other


systems
• What can be accomplished by combining
filters?

which prints the 10 most frequent words in its


input
The stream editor sed
• The basic idea of sed is simple

reads lines one at a time from the input files, it


applies the commands from the list, in order, to each
line and writes its edited form on standard output.
For example , you can change UNIX to UNIX(TM)
,everywhere it occurs in a set of files with

• sed does not alter the contents of the input files


• du prints the size and the filename
• The substitution deletes all characters (.*) up
to and including the rightmost tab.
• In a similar way, you could select usernames
and login times from the output of who
• The s command replaces a blank and everything
that follows it up to another blank by a single
blank.
• The same sed command can be used to make a
program getname that will return your user name

• It is also possible to put sed commands in a file


and execute them from there,with
• prints its input up to and including the
first line matching pattern, and
• deletes every line that contains the
pattern.
• sed provides the ability to write on multiple
output lines.

writes lines matching pat on file1 and lines not


matching pat on file2
The awk pattern scanning and processing language

• The idea in awk is much the same as in sed,but the details are based more
on the C programming language than on a text editor
• Usage is just like sed

but the program is different

• awk reads the input in the filenames one line at a time.


• Each line is compared with each pattern in order;for each pattern that
matches the line , corresponding action is performed.
• Like sed, awk doesn't alter its input files
• prints every line that matches the regular
expression
• IF the action is omitted,the default action is to print matched lines
• If the pattern is omitted, then the action part is
done for every input line.
• does what cat does
• It is possible to present the program to awk from
a file
• awk splits each line automatically into fields, that
is strings of non-blank characters separated by
blanks or tabs.
• By definition, the output of who has five fields
• awk call the fields $1,$2,....$NF, Where NF is a variable
whose value is set to the number of fields.(Here NF=5)
• To print the names of people logged in and the time of
login, one per line

• To print name and time of login sorted by time

• To print the usernames,which comes from the first field,


• The built-in variable NR is the number of
current input “record” or line
• So to add line numbers to an input stream,use
this
• To print line numbers in a field 4 digits wide,

• Suppose you want to look in /etc/password for


people who have no passwords
• You can write this pattern in a variety of ways

• One common use of patterns in awk is for


simple data validation tasks
• For example, the following pattern ensures
that every input record has an even number of
fields
• Printing a warning and part of the too-long
line using another built-in function substr
• Selecting the hour and minute from output of date

• awk provides two special pattterns BEGIN and END.


• BEGIN actions are performed before the first input line has been read

• END actions are performed after the last line of input has been
processed

• print the number of lines of input


• To illustrate the use of variables in awk,
– Add up all the numbers in first column

– Print both sum and average

– Count the input lines(count the lines,words and


characters)
• The if statement is just like that in C

If condition is true, statement 1 is executed; if


it is false and if there is an else part, statement
2 is executed; else part is optional;
• The for statement is a loop like the one in C

• for is identical to following while statement


• For example, runs the loop with i set in turn
to 2,3,.. upto the Number of fields NF.
• The break statement causes an immediate exit from
enclosing while or for
• The continue statement causes the next iteration to begin
• The next statement causes the next input line to be read
and the pattern matching to resume at the beginning of
the awk program.
• The next statement causes an immediate transfer to END
pattern
• awk provides arrays

– For example , this awk program collects each line of


input in a separate array element, indexed by line
number , then prints them out in reverse order.
• splits the string s into fields that
are stored in elements 1 through n of the array
arr. If the separator character is provided, it is
used; otherwise the current value of FS is used
• awk provides associative arrays
– is the complete program for
adding up and print the sums for the name value pairs.
• Syntactically, this is a variant of the for statement

• In awk, there is no explicit string concatenation operator


• As in C, the assignment statement can be used as an
expression, so the construction,
assigns the length of the input line
to n before testing the value. Notice the parenthesis
• A program field n that will print the nth field from
each line of input
– For example , to print only the login names.
– One implementation is
– Another approach uses double quotes
• A second example is addup n, which adds up
the numbers in nth field

• A third example, separate sums of each of n


columns plus a grand total
Good Files and Good Filters
• Many uses of awk are simple one –or –two line programs to do
some filtering as part of a larger pipeline
• Programs like wc or grep can count interesting items or search for
them by name.When more information is present for each object,
the file is still line-by-line but columnated into fields separated by
blanks or tabs, as in output of ls –l
• Given data divided into fields, programs like awk can easily select,
process or rearrange the information
• The arguments of filters specify input never output, so the output
of command can always be fed to a pipeline. Optional arguments
precede any file names. Finally error messages are written on the
standard error, so they will not vanish down a pipeline

You might also like