0% found this document useful (0 votes)
65 views

U4 - Shell Pattern Matching

Pattern matching in shell scripts allows matching filenames and strings using glob patterns that function similarly to regular expressions. Extended globbing features in bash add capabilities like matching zero or more characters (?*), one or more characters (?+), or an exact match (?@). Glob patterns can be used for pattern matching not just with filenames but also within bash expressions, case statements, and parameter expansions, offering flexible string matching for scripting tasks.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
65 views

U4 - Shell Pattern Matching

Pattern matching in shell scripts allows matching filenames and strings using glob patterns that function similarly to regular expressions. Extended globbing features in bash add capabilities like matching zero or more characters (?*), one or more characters (?+), or an exact match (?@). Glob patterns can be used for pattern matching not just with filenames but also within bash expressions, case statements, and parameter expansions, offering flexible string matching for scripting tasks.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Pattern Matching

Shell globbing

Pattern matching in the shell against filenames has metacharacters defined differently from the
rest of unix pattern matching prgorams. * is match any character except whitespace, ? is match
one character except whitespace. so *.c is match any filename ending with the two characters .c
(this will list out all c source files in the directory, assuming the directory's owner is sane).

grep, sed

Table of metacharacters:

1. ^ (caret) match beginning of line. Anchors match.


2. $ (dollar sign) match end of line. Anchors match.
3. . (dot) match any character. Beware, command line globbing uses ? instead.
4. * (star) matches zero or more of preceding chracters. Beware, command line uses * as in
.*.
5. [] (square braces) set of characters inside braces, match any one of.
6. [^ ] (carat at first character inside braces), match any character except those inside braces
7. [a-z] (use of dash inside braces) match a range. If - is to be matched, must be first
character, to avoid misinterpretation as range operator.
8. () {parenthesis, must be escaped with backslash), save match for later use with \n, where
n is a number.
9. {m}, {m,} and {m,n} (braces, which must be escaped with a backslash), matched m,
more than m, or between m and n repretitions of preceeding character.
10. & (ampersand) expands to the matched string, used in sed.

Grep, sed Flags for grep of note:

● -i, case insensitive


● -v, invert, select non-matching lines
● -c, give count of matching lines.

Flags for sed of note:

● -n, print the line only if forced to


● -f, commands from a file

Sed commands,

● form is [address][,address][!]command [arguments] You tend to have to enclose this in


single quotes of the shell will demolish it. Or double quotes if you want shell variables
expanded inside the mess.
● No address: all lines; one address: lines matching address are processed; two addresses:
first address starts processing, second address ends processiong.
● Addresses can be line numbers, the dollar sign or a reg. exp enclosed in //.
● example: s/a/b/g, substitute b for a, globally. Drop the g and you only substitute the first
occurrance of a on a line. Add p with the g to print out the line, especially if you are using
sed -n.
● example: /but/d, delete any line that says "but", not buts allowed!

Examples

Match three letter reversal patterns:

grep '\(.\)\(.\)\(.\)\3\2\1' web2


Subsitution using sed:
sed 's/^.*:\*:\([^:]*\).*$/\1/' /etc/passwd

Pattern Matching In Bash

Wildcards have been around forever. Some even claim they appear in the hieroglyphics of the
ancient Egyptians. Wildcards allow you to specify succinctly a pattern that matches a set of
filenames (for example, *.pdf to get a list of all the PDF files). Wildcards are also often referred
to as glob patterns (or when using them, as "globbing"). But glob patterns have uses beyond just
generating a list of useful filenames. The bash man page refers to glob patterns simply as
"Pattern Matching".

First, let's do a quick review of bash's glob patterns. In addition to the simple wildcard characters
that are fairly well known, bash also has extended globbing, which adds additional features.
These extended features are enabled via the extglob option.

Pattern Description
* Match zero or more characters
? Match any single character
[...] Match any of the characters in a set
?(patterns) Match zero or one occurrences of the patterns (extglob)
*(patterns) Match zero or more occurrences of the patterns (extglob)
+(patterns) Match one or more occurrences of the patterns (extglob)
@(patterns) Match one occurrence of the patterns (extglob)
!(patterns) Match anything that doesn't match one of the patterns (extglob)

 
For example:

$ ls
a.jpg b.gif c.png d.pdf ee.pdf

$ ls *.jpg
a.jpg

$ ls ?.pdf
d.pdf

$ ls [ab]*
a.jpg b.gif

$ shopt -s extglob # turn on extended globbing

$ ls ?(*.jpg|*.gif)
a.jpg b.gif

$ ls !(*.jpg|*.gif) # not a jpg or a gif


c.png d.pdf ee.pdf

When first using extended globbing, many of them didn't seem to do what I initially thought they
ought to do. For example, it appeared to me that, given a.jpg, the pattern ?(*.jpg|a.jpg)
should not match, because a.jpg matched both patterns, and the ? is "zero or one", right?
Wrong. My confusion was due to a misreading of the description: it's not the filename that
can match only once, it's the pattern that can match only once. Think of it terms of regular
expressions:

Glob Regular Expression Equivalent Description


?(patterns) (regex)? Match an optional regex
*(patterns) (regex)* Match zero or more occurrences of a regex
+(patterns) (regex)+ Match one or more occurrences of a regex
@(patterns) (regex) Match the regex (one occurrence)

So, for example:

$ ls *.pdf
ee.pdf e.pdf .pdf

$ ls ?(e).pdf # zero or one "e" allowed


e.pdf .pdf

$ ls *(e).pdf # zero or more "e"s allowed


ee.pdf e.pdf .pdf

$ ls +(e).pdf # one or more "e"s allowed


ee.pdf e.pdf
$ ls @(e).pdf # only one e allowed
e.pdf

And while I'm comparing glob patterns to regular expressions, there's an important point to be
made that may not be immediately obvious: glob patterns are just another syntax for doing
pattern matching in general in bash. And you can use them in a number of different places:

● After the == in a bash [[ expr ]] expression.


● In the patterns to a case command.
● In parameter expansions (%, %%, #, ##, /, //).

The following example uses pattern matching in the expression of an if statement to test
whether a variable has a value of "something" or "anything":

$ shopt +s extglob

$ a=something
$ if [[ $a == +(some|any)thing ]]; then echo yes; else echo no; fi
yes

$ a=anything
$ if [[ $a == +(some|any)thing ]]; then echo yes; else echo no; fi
yes

$ a=nothing
$ if [[ $a == +(some|any)thing ]]; then echo yes; else echo no; fi
no

The following example uses pattern matching in a case statement to determine whether a file is
an image file:

shopt +s extglob
for f in $*
do
case $f in
!(*.gif|*.jpg|*.png)) # ! == does not match
echo "Not an image: $f"
;;
*)
echo "Image: $f"
;;
esac
done
$ bash script.sh a.jpg b.gif c.png d.pdf e.pdf
Image: a.jpg
Image: b.gif
Image: c.png
Not an image: d.pdf
Not an image: e.pdf

In the example above, the pattern !(*.gif|*.jpg|*.png) will match a filename if it's not a gif,
jpg or png.
The following example uses pattern matching in a %% parameter expansion to remove the
extension from all image files:

shopt -s extglob
for f in $*
do
echo ${f%%*(.gif|.jpg|.png)}
done
$ bash script.sh a.jpg b.gif c.png d.pdf e.pdf
a
b
c
d.pdf
e.pdf

A feature that I just recently became aware of is that you can do the above action in one fell
swoop: if you use "*" or "@" as the variable name, the transformation is done on all the
command-line arguments at once. [Note to self: always read the last half of the paragraph from
now on]:

shopt -s extglob
echo ${*%%*(.gif|.jpg|.png)}
$ bash script.sh a.jpg b.gif c.png d.pdf e.pdf
a b c d.pdf e.pdf

And that works on arrays too:

shopt -s extglob
array=($*)
echo ${array[*]%%*(.gif|.jpg|.png)}
$ bash script.sh a.jpg b.gif c.png d.pdf e.pdf
a b c d.pdf e.pdf

The biggest takeaway here is to stop thinking of wildcards as a mechanism just to get a list of
filenames and start thinking of them as glob patterns that can be used to do general pattern
matching in your bash scripts. Think of glob patterns as regular expressions in a different
language.

You might also like