0% found this document useful (0 votes)
4 views

130-Linux Shell Scripting

The document provides an overview of regular expressions in Linux, explaining their purpose as patterns used to filter text in various utilities. It discusses different types of regular expression engines, such as POSIX Basic and Extended Regular Expressions, and details how to define patterns using special characters, character classes, and quantifiers. Additionally, it includes examples of using regular expressions in shell scripts to count executable files in directories defined by the PATH environment variable.

Uploaded by

alborzjfrnk
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

130-Linux Shell Scripting

The document provides an overview of regular expressions in Linux, explaining their purpose as patterns used to filter text in various utilities. It discusses different types of regular expression engines, such as POSIX Basic and Extended Regular Expressions, and details how to define patterns using special characters, character classes, and quantifiers. Additionally, it includes examples of using regular expressions in shell scripts to count executable files in directories defined by the PATH environment variable.

Uploaded by

alborzjfrnk
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

Shell Scripting

Session 12

Vahab Shalchian (ITIL v3 , LPIC-1 , LPIC-2 , LPIC-3)


Regular
Expressions
What Are Regular Expressions?

A regular expression is a pattern template you define that a Linux


utility uses to filter text.
A Linux utility (such as the sed editor or the gawk program)
matches the regular expression pattern against data as that data
flows into the utility. If the data matches the pattern, it's accepted for
processing. If the data doesn't match the pattern, it's rejected.
Regular
Expressions
Types of regular expressions

The biggest problem with using regular expressions is that there isn't
just one set of them. Several different applications use different types
of regular expressions in the Linux environment.

These include such diverse applications as programming languages


(Java, Perl, and Python), Linux utilities (such as the sed editor, the
gawk program, and the grep utility), and mainstream applications
(such as the MySQL and PostgreSQL database servers).
A regular expression is implemented using a regular expression
engine. A regular expression engine is the underlying software that
interprets regular expression patterns and uses those patterns to
match text.
Regular
Expressions
In the Linux world, there are two popular regular expression engines:

• The POSIX Basic Regular Expression (BRE) engine


• The POSIX Extended Regular Expression (ERE) engine

Most Linux utilities at a minimum conform to the POSIX BRE engine


specifications, recognizing all of the pattern symbols it defines.
Unfortunately, some utilities (such as the sed editor) only
conform to a subset of the BRE engine specifications. This is due to
speed constraints, as the sed editor attempts to process text in the
data stream as quickly as possible.
Regular
Expressions
Defining BRE Patterns

The most basic BRE pattern is matching text characters in a data


stream.

Plain text
$ echo "This is a test" | sed -n '/test/p'
This is a test

$ echo "This is a test" | sed -n '/trial/p'


$

patterns are case sensitive. This means they'll only match patterns
with the proper case of characters.
Regular
Expressions
Special characters

Regular expression patterns assign a special meaning to a few


characters. If you try to use these characters in your text pattern, you
won't get the results you were expecting.
The special characters recognized by regular expressions are:

.*[]^${}\+?|()
Regular
Expressions
For example, if you want to search for a dollar sign in your text, just
precede it with a backslash character:

$ cat data2
The cost is $4.00

$ sed -n '/\$/p' data2


The cost is $4.00
$
Regular
Expressions
Starting at the beginning

The caret character (^) defines a pattern that starts at the beginning
of a line of text in the data stream. If the pattern is located any place
other than the start of the line of text, the regular expression pattern
fails.
To use the caret character, you must place it before the pattern
specified in the regular expression:
$ echo "The book store" | sed -n '/^book/p'
$
$ echo "Books are great" | sed -n '/^Book/p'
Books are great
$
Regular
Expressions
Looking for the ending

The opposite of looking for a pattern at the start of a line is looking


for it at the end of a line. The dollar sign ($) special character defines
the end anchor. Add this special character after a text
pattern to indicate that the line of data must end with the text
pattern:
$ echo "This is a good book" | sed -n '/book$/p'
This is a good book
$ echo "This book is good" | sed -n '/book$/p'
$
Regular
Expressions
The dot character

The dot special character is used to match any single character except
a newline character. The dot character must match a character
though; if there's no character in the place of the dot, then the
pattern will fail.
Regular
Expressions
$ cat data6
This is a test of a line.
The cat is sleeping.
That is a very nice hat.
This test is at line four.
at ten o'clock we'll go home.
$ sed -n '/.at/p' data6
The cat is sleeping.
That is a very nice hat.
This test is at line four.
$
Regular
Expressions
Character classes
The dot special character is great for matching a character position
against any character, but what if you want to limit what characters to
match? This is called a character class in regular expressions.
To define a character class, you use square brackets. The brackets
should contain any character that you want to include in the class.

Here's an example of creating a character class:


$ sed -n '/[ch]at/p' data6
The cat is sleeping.
That is a very nice hat.
$
Regular
Expressions
$ echo "Yes" | sed -n '/[Yy]es/p'
Yes
$ echo "yes" | sed -n '/[Yy]es/p'
yes
$

Negating character classes


Instead of looking for a character contained in the class, you can look
for any character that's not in the class. To do that, just place a caret
character at the beginning of the character class range:
$ sed -n '/[^ch]at/p' data6
This test is at line two.
$
Regular
Expressions
Using ranges
You can use a range of characters within a character class by using the
dash symbol. Just specify the first character in the range, a dash, then
the last character in the range.

$ sed -n '/^[0-9][0-9][0-9][0-9][0-9]$/p' data8


60633
46201
45902
Regular
Expressions
The asterisk
Placing an asterisk after a character signifies that the character must
appear zero or more times in the text to match the pattern:
$ cat file1| sed -n '/ie*k/p'
ik
iek
ieek
ieeek
ieeeek
Regular
Expressions
$ cat file2 | sed -n '/b[ae]*t/p'
bt
bat
bet
baat
baaeeet
baeeaeeat
Regular
Expressions
Extended Regular Expressions
The POSIX ERE patterns include a few additional symbols that are
used by some Linux applications and utilities. The gawk program
recognizes the ERE patterns, but the sed editor doesn't.

The question mark


The question mark indicates that the preceding character can appear
zero or one time, but that's all. It doesn't match repeating
occurrences of the character:
Regular
Expressions
$ echo "bt" | gawk '/be?t/{print $0}'
bt
$ echo "bet" | gawk '/be?t/{print $0}'
bet
$ echo "beet" | gawk '/be?t/{print $0}'
$
$ echo "beeet" | gawk '/be?t/{print $0}'
$
Regular
Expressions
The plus sign
The plus sign indicates that the preceding character can appear one
or more times, but must be present at least once. The pattern doesn't
match if the character is not present:
$ echo "beeet" | gawk '/be+t/{print $0}'
beeet
$ echo "beet" | gawk '/be+t/{print $0}'
beet
$ echo "bet" | gawk '/be+t/{print $0}'
bet
$ echo "bt" | gawk '/be+t/{print $0}'
$
Regular
Expressions
Using braces
Curly braces are available in ERE to allow you to specify a limit on a
repeatable regular expression.
This is often referred to as an interval. You can express the interval in
two formats:
• m: The regular expression appears exactly m times.
• m,n: The regular expression appears at least m times, but no more
than n times.

Note: By default, the gawk program doesn't recognize regular


expression intervals. You must specify the --re-interval command line
option for the gawk program to recognize regular expression
intervals.
Regular
Expressions
$ echo "bt" | gawk --re-interval '/be{1}t/{print $0}'
$
$ echo "bet" | gawk --re-interval '/be{1}t/{print $0}'
bet
$ echo "beet" | gawk --re-interval '/be{1}t/{print $0}'
$
Regular
Expressions
The pipe symbol
The pipe symbol allows to you to specify two or more patterns that
the regular expression engine uses in a logical OR formula when
examining the data stream. If any of the patterns match the data
stream text, the text passes. If none of the patterns match, the data
stream text fails.

The format for using the pipe symbol is:


expr1|expr2|...
Regular
Expressions
Here's an example of this:

$ echo "The cat is asleep" | gawk '/cat|dog/{print $0}'


The cat is asleep
$ echo "The dog is asleep" | gawk '/cat|dog/{print $0}'
The dog is asleep
$ echo "The sheep is asleep" | gawk '/cat|dog/{print $0}'
$
Regular
Expressions
Grouping expressions
Regular expression patterns can also be grouped by using
parentheses. When you group a regular
expression pattern, the group is treated like a standard character. You
can apply a special
character to the group just as you would to a regular character. For
example:
$ echo "Sat" | gawk '/Sat(urday)?/{print $0}'
Sat
$ echo "Saturday" | gawk '/Sat(urday)?/{print $0}'
Saturday
$
Regular
Expressions
$ echo "cat" | gawk '/(c|b)a(b|t)/{print $0}'
cat
$ echo "cab" | gawk '/(c|b)a(b|t)/{print $0}'
cab
$ echo "bat" | gawk '/(c|b)a(b|t)/{print $0}'
bat
$ echo "bab" | gawk '/(c|b)a(b|t)/{print $0}'
bab
$ echo "tab" | gawk '/(c|b)a(b|t)/{print $0}'
$
Regular
Expressions
Regular Expressions in Action
Counting directory files

Write a shell script that counts the executable files that are present
in the directories defined in your PATH environment variable.
Regular
Expressions
#!/bin/bash
# count number of files in your PATH
mypath=`echo $PATH | sed 's/:/ /g'`
count=0
for directory in $mypath
do
check=`ls $directory`
for item in $check
do
count=$[ $count + 1 ]
done
echo "$directory - $count"
count=0
done

You might also like