0% found this document useful (0 votes)
74 views10 pages

4 Filter and Regex

The document discusses various Unix filters and regular expressions. It provides examples of using filters like cut, awk, grep, sed and perl to select and manipulate text from command output and files. Regular expressions are used to match patterns, and extended expressions are discussed. The use of perl as a more powerful version of sed is also covered.

Uploaded by

cdchhabra
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
74 views10 pages

4 Filter and Regex

The document discusses various Unix filters and regular expressions. It provides examples of using filters like cut, awk, grep, sed and perl to select and manipulate text from command output and files. Regular expressions are used to match patterns, and extended expressions are discussed. The use of perl as a more powerful version of sed is also covered.

Uploaded by

cdchhabra
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Client/Server TechnologyPart 1: The Unix Operating System Filters & Regular Expressions

Selecting Fields with cut

The cut command uses one delimiter between two elds A number of whitespaces may confuse it
Example: Try to print only le size and name $ ls -l gnasl -rw-r--r-- 1 hugo staff 2894 Feb 12 14:14 gnasl $ ls -l | cut -d -f 5,9 staff 12 $ _ The awk Filter

Strictly speaking, not just a lter but a programming language Without knowing the language, its still useful for some tasks
Example: Select elds from ls -l output with awk $ ls -l gnasl | awk { print $5, $9 } 2894 gnasl $ ls -l gnasl | awk { print $5, "\t", $9 } 2894 gnasl $ _

Herbert Martin Dietze <[email protected]>

44

Client/Server TechnologyPart 1: The Unix Operating System Filters & Regular Expressions

Regular Expressions

Regular Expressions can be used for describing Text Patterns Example: ^g matches text lines starting with a lowercase g Dialects dier, depending on the tools used
Basic Operators These are understood by most tools supporting regular expressions: \ [AaBbCc] [a-z] [^a-z] . * activate or deactivate an operator, example: \\ produces a backslash matches one character from the set {A, a, B, b, C, c} matches a range, here between a and z matches one character that is not within the range specied here matches one character (any) matches zero to innity occurrances of the preceding expression, example: * matches any number of space characters matches the beginning of the current line matches the current lines end matches the beginning and the end of a word, example: \<Hugo\> matches Hugo as a whole word
45

^ $ \< and \>

Herbert Martin Dietze <[email protected]>

Client/Server TechnologyPart 1: The Unix Operating System Filters & Regular Expressions

Example: ls Output Display only symbolic links: $ ls -l | grep "^l" lrwxrwxrwx 1 hugo staff 17 Jul 26 2001 foo -> bar lrwxrwxrwx 1 hugo staff 17 Sep 13 2001 x -> ../y $_ Example: Log File Select only the entries from the 28th and 29th of March 2001 in the Apache log le. Heres the format from which we want to get the information: $ tail -1 access_log myhost [28/Mar/2001:16:19:07 +0200] "GET /a.html" $ _ This is the regular expression used for getting the entries: $ grep "2[89]/Mar/2001.*/.*\.html" access_log myhost [28/Mar/2001:16:19:07 +0200] "GET /a.html" [...] myhost [29/Mar/2001:17:00:12 +0200] "GET /b.html" [...] $ _

Herbert Martin Dietze <[email protected]>

46

Client/Server TechnologyPart 1: The Unix Operating System Filters & Regular Expressions

The sed Filter

sed stands for Stream Editor It can be used to manipulate text in a data stream Like grep, sed can use regular expressions We concentrate on the substitute command here More than one expression can be specied using -e

Example: Evaluate a conguration le $ cat config.conf # Configuration file set A b set B c $ grep -v "^ *#" config.conf | sed s/^set *// \ > | sed s/ */=/ A=b B=c $ grep -v "^ *#" config.conf \ > | sed -e s/^set *// -e s/ */=/ A=b B=c $ eval grep -v "^ *#" config.conf \ > | sed -e s/^set *// -e s/ */=/ $ echo $A b $ _
Herbert Martin Dietze <[email protected]> 47

Client/Server TechnologyPart 1: The Unix Operating System Filters & Regular Expressions

More sed

The substitute command can take options: Ignore case: i and global replace: g (replace not only the rst match) They get appended to the expression: s/foo/bar/gi What if the source or destination pattern contains slashes? Escape the slashes with backslashes (can be dicult if the pattern is a variables content) or use a dierent separator, any character is allowed!
Example: Remove double slashes in path specs $ echo /usr//local/bin:/home/herbert///data \ > | sed |//*|/|g /usr/local/bin:/home/herbert/data $ _

We can also reference matches from the search pattern \( and \) address a subpattern in the search eld \1 selects the rst, \2 the second etc. in the replace eld
Example: $ echo "Hugo <[email protected]>" \ > | sed "s/[^<]*<\([^@]*\)@\([^>]*\)>.*/\1 at \2/" hugo at hotmail.com $ _
Herbert Martin Dietze <[email protected]> 48

Client/Server TechnologyPart 1: The Unix Operating System Filters & Regular Expressions

Extended Regular Expressions

Some tools understand more than just the basic operators Such tools are e.g. perl and egrep Other tools may support them: use \ to activate!
? + matches none or one occurrance of the preceding pattern matches one to innity occurrances of the preceding pattern matches exactly n occurrances of the preceding pattern matches n to m occurrances of the preceding pattern matches at least n occurrances of the preceding pattern matches text containing either text1 or text2 bundles text to a unit for repetition operators (*, + etc.), and it can now be selected by \1, \2 etc.

{ n} {n,m} {n,}
text1|text2 (text)

Example: $ ls -l | egrep "hugo|harry" -rw-r--r-- 1 harry staff 1315 Feb 14 11:05 annab -rw-r--r-- 1 hugo staff 2894 Feb 12 14:14 gnasl $ _
Herbert Martin Dietze <[email protected]> 49

Client/Server TechnologyPart 1: The Unix Operating System Filters & Regular Expressions

A better sed using perl

The perl interpreter can be used like sed Advantage: no escaping of extended syntax necessary! Also: perl can work on more than one line! Syntax: perl -pe s/source/destination/

Example: $ echo "Hugo <[email protected]>" \ > | sed "s/[^<]*<\([^@]*\)@\([^>]*\)>.*/\1 at \2/" hugo at hotmail.com $ echo "Hugo <[email protected]>" | perl \ > -pe "s/[^<]*<([^@]*)@([^>]*)>.*/\1 at \2/" hugo at hotmail.com Longer Example: Generate HTML from Inline Comments The problem:

It is always nice to keep module descriptions at one place So why not generate HTML from the program sources? Convention: Extract only comments starting with double hash Ignore other comments and program code Add tags for special elements (function, type, variable, ...)

Herbert Martin Dietze <[email protected]>

50

Client/Server TechnologyPart 1: The Unix Operating System Filters & Regular Expressions

Example source: $ cat example.sh #!/bin/sh ############################################### # ## @function hugo ## print a friendly message to stdout. ## ## This function print a "hello world" to ## stdout. Quite nice. # ############################################### hugo () { echo "hello world" } # ## @function main program ## ## The main program calls hugo and exits. # hugo $ _

Herbert Martin Dietze <[email protected]>

51

Client/Server TechnologyPart 1: The Unix Operating System Filters & Regular Expressions

Step 1: Discard unwanted lines $ egrep "^ *##" example.sh | egrep -v "^ *###" ## @function hugo ## print a friendly message to stdout. ## ## This function print a "hello world" to ## stdout. Quite nice. ## @function main program ## ## The main program calls hugo and exits. $ _ Step 2: Add HTML-Tags and remove hashes $ egrep "^ *##" example.sh | egrep -v "^ *###" \ > | perl -pe s/^ *## *$/<p>/; s/^ *## *// @function hugo print a friendly message to stdout. <p> This function print a "hello world" to stdout. Quite nice. @function main program <p> The main program calls hugo and exits. $ _

Herbert Martin Dietze <[email protected]>

52

Client/Server TechnologyPart 1: The Unix Operating System Filters & Regular Expressions

Step 3: Translate pseudo-tags $ egrep "^ *##" example.sh | egrep -v "^ *###" \ > | perl -pe s/^ *## *$/<p>/; s/^ *## *// \ > -e ; s|@function *(.*)|<h2>Function \1</h2>| <h2>Function hugo</h2> print a friendly message to stdout. <p> This function print a "hello world" to stdout. Quite nice. <h2>Function main program</h2> <p> The main program calls hugo and exits. $ _ Last Step: Make it a HTML-File $ ( echo "<html><head>Program Documentation</head> > <body><h1>Program Documentation</h1>" > egrep "^ *##" example.sh | egrep -v "^ *###" \ > | perl -pe s/^ *## *$/<p>/; s/^ *## *// \ > -e ; s|@function *(.*)|<h2>Function \1</h2>| > echo "</body></html>") <html><head>Program Documentation</head> <body><h1>Program Documentation</h1> [...] </body></html> $ _
Herbert Martin Dietze <[email protected]> 53

You might also like