0% found this document useful (0 votes)
66 views26 pages

L3 - Grep ND Egrep

The document discusses different tools for searching text, including grep, fgrep, and egrep. Grep searches for patterns using regular expressions, fgrep searches for fixed strings only, and egrep allows extended regular expressions. Examples are provided of using grep and regular expressions to search files.

Uploaded by

gauri Varshney
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
66 views26 pages

L3 - Grep ND Egrep

The document discusses different tools for searching text, including grep, fgrep, and egrep. Grep searches for patterns using regular expressions, fgrep searches for fixed strings only, and egrep allows extended regular expressions. Examples are provided of using grep and regular expressions to search files.

Uploaded by

gauri Varshney
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 26

Searching for something in a file

GREP
• The grep family is a collection of three related programs
for finding patterns in files. Their names are grep, fgrep,
and egrep.
• The name grep has its origin in the phrase "Get Regular
Expression and Print”
• grep is a full-blown regular-expression matcher
• fgrep = "fixed string grep” only searches for strings
• egrep = “extended grep”

LIN 6932 1
Searching for something in a file
fgrep
fgrep: the easiest (but not fastest) one to use

Syntax:
% fgrep [options] 'search string’ filenames

Interpretation:
In the name fgrep the f stands for "Fixed string", and not "Fast" (contrary
to what the man page may tell you). The fgrep program finds all the lines
in a file that contain a certain fixed string. So, for example, I could find
all occurrences of CA in the files in the current working directory simply
by typing this command:

% fgrep CA *
LIN 6932 2
Searching for something in a file
fgrep
• Like many UNIX filters, it can take as many file names as
you like to supply. And of course it permits various
adverbs that specify options; two useful ones are
• -i ignore the difference between upper case and
lower case when deciding what is a match
• -v reverse the effect of the search by outputting only
the lines that don't match
% fgrep -i CA *
% fgrep -v CA *

LIN 6932 3
Searching for something in a file
fgrep
The key limitation of fgrep is that you cannot use it to get
approximate matches, or matches of more complicated
patterns that cannot be described by just giving a fixed string.
Sometimes you are not quite sure what string you are looking
for; for example, you might know only that the word you are
seeking begins with z and ends with -ic, and had the sequence
gm in it somewhere. What you need, then, is not a program
that will find the matching lines for you if you give it the
exact string you need to find, but rather a program that can
understand a language in which you can say things like
"begins with z and ends with -ic or -ics and had gm in it
somewhere."

LIN 6932 4
Searching for something in a file
grep
called up by giving a command that has this form:

% grep [options] pattern description files_to_search_in


% grep -i 'pull[aeiou][mn]’ shakespeare bad_phone_numbers display

• This means, "without distinguishing between upper and lower case, search the files
shakespeare bad_phone_numbers display for lines that contain pull followed by a vowel
letter followed by an m or an n". Thus it is looking for Pullum, Pullam, Pullen,
PULLUN, [email protected], etc., etc.
• The expression pull[aeiou][mn] is a pattern description covering the name Pullum and
most common variants of it. Thus it is looking for Pullum, Pullam, Pullen, PULLUN,
[email protected].
• The pattern descriptions used with grep are in a language called the language of
regular expressions. This is one of the most important and fruitful developments
in modern computer science, and in order to use grep you need to understand
regular expressions thoroughly.
LIN 6932 5
Searching for something in a file
grep
There are various dialects of the regular expression language that are used by various
UNIX programs.

Here we will be talking about grep and its extended cousin egrep. (Read the excellent
summary with examples in Unix in a Nutshell, particularly chapter 6, and do man grep on
a NetBSD machine to check the details of the GNU grep that runs on those machines.
(GNU: pronounced guh-noo, approximately like canoe; launched in 1984 to develop a
complete Unix-like operating system which is free software, often referred to as LINUX)

Note that the grep that runs on other machines may be a different program, with lots of
differences in its behavior from the GNU version.

LIN 6932 6
Searching for something in a file
grep
There are various dialects of the regular expression language that are used by various
UNIX programs.

Here we will be talking about grep and its extended cousin egrep. (Read the excellent
summary with examples in Unix in a Nutshell, particularly chapter 6, and do man grep on
a NetBSD machine to check the details of the GNU grep that runs on those machines.
(GNU: pronounced guh-noo, approximately like canoe; launched in 1984 to develop a
complete Unix-like operating system which is free software, often referred to as LINUX)

Note that the grep that runs on other machines may be a different program, with lots of
differences in its behavior from the GNU version.

LIN 6932 7
Searching for something in a file
grep
• Example: “The match the phrase that begins with z at the beginning of a line and ends in -ic or -ics
at the end of the line, and it has gm in it somewhere”, is expressed in the language of regular
expressions in this form:

^z.*gm.*ics*$

To be more precise, what this regular expression means is:


"beginning of line followed by z followed by optional other material followed by gm followed by
optional other material followed by -ic followed by zero or more occurrences of s followed by end of
line"

• It can therefore be used in a grep command to search for a word in a dictionary where each word is
on a separate line meeting the description:

% grep '^z.*gm.*ics*$' dictionary


Search result: zeugmatic
LIN 6932 8
Searching for something in a file
grep
The most trivial case of a regular expression is that of a fixed string of the sort that fgrep recognizes.
Fixed strings are regular expressions that are matched only by strings identical to themselves.

The regular expression Z is matched by any occurrence of Z. There happens to be only one line in
The Great God Pan (/class/lin6932/c6932aab/machen.txt) that matches it, namely the middle line of
these three:

remained. These three, however, were 'good lives,' but yet


not proof against the Zulu assegais and typhoid fever, and so
one morning Aubernoun woke up and found himself Lord

Because the middle line matches the expression Z, you can fetch (a copy of) that line out of the file
like this:

% grep Z machen.txt
not proof against the Zulu assegais and typhoid fever, and so

LIN 6932 9
Searching for something in a file
grep
% fgrep Z machen.txt
fgrep would do the same thing.

But what fgrep cannot do is to call for all lines with Au possibly followed by some
other lower-case letters and then an n. That is accomplished by the regular
expression

Au[a-z]*n

this RE is matched by any sequence of a capital A followed by a lower-case u


followed by zero or more letters in the range lower-case a to lower-case z followed
by lower-case n. This means it will be matched by any string containing a word
like word like any of these: Aubernoun, Augustine, Austin, etc.
LIN 6932 10
Searching for something in a file
grep
% fmt -1 machen.txt | tr -d '[:punct:] ' | grep 'Au[a-z]*n' | sort -u

The fmt command is to break the words up and put them one on each line
the tr -d '[:punct:]' command erases all punctuation, and spaces
the sort -u command sorts the search result alphabetically

LIN 6932 11
Searching for something in a file
grep
% grep 'Au[a-z]*n' machen.txt

Au[s t r a l a b r a c a d a b r a l a l i o l a s i a]n
Au[ a-z ]*n

LIN 6932 12
Searching for something in a file
Example:
grep
The zipcodes in the near vicinity of the UC campus are 95060 (Santa Cruz west of
the river), 95062 (Live Oak), 95064 (UCSC), 95065 (East Santa Cruz), 95066
(Scotts Valley).

Suppose you wanted to extract from a file called addresses, containing one full
name and address on each line, just the addresses of people living in these areas.
Assume some people type a space after CA and others don't, and some write
several spaces.
The following regular expression describes the set of zipcodes you want:
CA *9506[024-6].
This grep command will find just the lines in the file addresses that contain
zipcodes for people who live in near the campus:
% grep 'CA *9506[024-6]' addresses
LIN 6932 13
Searching for something in a file
Example:
grep
Suppose you want only the 9-digit zipcodes, that's easy too:

% grep 'CA *9506[024-6]-[0-9]\{4\}' addresses

LIN 6932 14
Searching for something in a file
grep
Example:
Suppose you were looking to see whether there were any words
beginning with a in a file called shakespeare.
You might type

% grep a* shakespeare

LIN 6932 15
Searching for something in a file
egrep
Some simple tasks would be a bit of a chore just using grep. Suppose we wanted to add
Ben Lomond (CA 95005), Davenport (CA 95017), and Felton (CA 95018). What we need
here is the disjunction: for the 5-digit zipcodes, the strings we want will match either
CA *9506[024-6] or CA *95005 or CA *9501[78] or.

Now, we can certainly do that: we can simply call grep three separate times, and
amalgamate all the results. We cannot amalgamate all the searches into something like
CA *950[016][024-8], because that defines a set that is too big; it lets in 95004, for
example, and that's Aromas, way the other side of Watsonville.

The way to do it is to use the extended regular expressons provided by the egrep program.
In egrep, you can use parentheses to group parts of the expression and the pipe symbol to
mean or. So (AB)|C means "either AB or C", while A(B|C) means "A followed by either B
or C", and so on. Thus we could use:% egrep 'CA *950((05)|(6[024-6])|(1[78]))'
addressesThere are a few other things that egrep allows but grep does not. For example, in
egrep regular expressions you can say a+ to mean "a sequence of one or more as", or [a-z]
+ to mean "a sequence of one or more lower-case letters". In grep regular expressions you
would have to say aa* and [a-z][a-z]* respectively to get these effects.
LIN 6932 16
Searching for something in a file
egrep
The way to do it is to use the extended regular expressons provided by the egrep
program. In egrep, you can use parentheses to group parts of the expression and
the pipe symbol to mean or. So (AB)|C means "either AB or C", while A(B|C)
means "A followed by either B or C", and so on. Thus we could use:% egrep 'CA
*950((05)|(6[024-6])|(1[78]))' addressesThere are a few other things that egrep
allows but grep does not. For example, in egrep regular expressions you can say
a+ to mean "a sequence of one or more as", or [a-z]+ to mean "a sequence of one
or more lower-case letters". In grep regular expressions you would have to say aa*
and [a-z][a-z]* respectively to get these effects.

LIN 6932 17
Searching for something in a file
So we can use:
egrep
% egrep 'CA *950((05)|(6[024-6])|(1[78]))' addresses

There are a few other things that egrep allows but grep does not. For example, in
egrep regular expressions you can say a+ to mean "a sequence of one or more as",
or [a-z]+ to mean "a sequence of one or more lower-case letters". In grep regular
expressions you would have to say aa* and [a-z][a-z]* respectively to get these
effects.

LIN 6932 18
File Management with Shell
Commands
Changing to another directory
% cd .. [RETURN] go up a directory tree
% cd [DIRECTORY] [RETURN] change to a subdirectory
% cd /tmp to change to some other
directory on the system,
you must type the full path
name

LIN 6932 19
File Management with Shell
Commands
• Create a directory
% mkdir [DIRECTORY.NAME] [RETURN]

• Remove a directory
% rmdir [DIRECTORY.NAME] [RETURN]

LIN 6932 20
Searching for something in a file
> cd ..
> cd c6932aab
> ls
display shakespeare

> cp shakespeare ~c6932aad


> cd
> ls
shakespeare

LIN 6932 21
Searching for something in a file
% grep [options] pattern filenames
% fgrep [options] string filenames

fgrep (or "fast grep") only searches for strings


grep is a full-blown regular-expression matcher

Some of the valid options are:


-i case-insensitive search
-n show the line# along with the matched line
-v invert match, e.g. find all lines that do NOT match
-w match entire words, rather than substrings

LIN 6932 22
Searching for something in a file
with GREP
% grep -inw ”thou" shakespeare

find all instances of the word ”though" in the file “shakespeare”, case-
insensitive but whole words and display the line numbers

LIN 6932 23
Grep

grep '^smug' files {'smug' at the start of a line}


grep 'smug$' files {'smug' at the end of a line}
grep '^smug$' files {lines containing only 'smug'}
grep '\^s' files {lines starting with '^s'}
grep '[Ss]mug' files {search for 'Smug' or 'smug'}
grep 'B[oO][bB]' files {search for BOB, Bob, BOb or BoB }
grep '^$' files {search for blank lines}
grep '[0-9][0-9]' file {search for pairs of numeric digits}

LIN 6932 24
Grep

grep '[^a-zA-Z0-9] {anything not a letter or number}


grep '[0-9]\{3\}-[0-9]\{4\}' {999-9999, like phone numbers}
grep '^.$' {lines with exactly one character}
grep '"smug"' {'smug' within double quotes}
grep '"*smug"*' {'smug', with or without quotes}
grep '^\.' {any line that starts with "."}
grep '^\.[a-z][a-z]' {line start with "." and 2 lc letters}

LIN 6932 25
Egrep
The version of grep that supports the full set of
operators mentioned above is generally called egrep
(for extended grep)

% egrep '(mine|my)' shakespeare

LIN 6932 26

You might also like