0% found this document useful (0 votes)
64 views

Exercise Answers: (Ab) A B - / (Ab) .. (Ab)

The document provides examples of regular expressions to match various patterns in text including strings containing certain characters, uppercase letters followed by other characters, integers of a certain length, Perl variable names, HTML tags, words with duplicate letters, strings with HTML opening and closing tags, and more. It also provides Perl programs to perform tasks like replacing digits with words, modifying HTML tag names, stripping HTML headers, and creating a student record file from name-value pairs.

Uploaded by

suresh
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
64 views

Exercise Answers: (Ab) A B - / (Ab) .. (Ab)

The document provides examples of regular expressions to match various patterns in text including strings containing certain characters, uppercase letters followed by other characters, integers of a certain length, Perl variable names, HTML tags, words with duplicate letters, strings with HTML opening and closing tags, and more. It also provides Perl programs to perform tasks like replacing digits with words, modifying HTML tag names, stripping HTML headers, and creating a student record file from name-value pairs.

Uploaded by

suresh
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

Exercise Answers

Construct regular expressions (match operators) for the following:

 any string that contains an "a" or "b" followed by any 2 characters followed by an "a" or a
"b". The strings "axxb", "alfa" and "blka" match, and "ab" does not.

[ab] is "either an a or a b".


. is "any character (except newline)".
The entire expression is /[ab]..[ab]/

 upper case "A" followed by anything except "x", "y" or "z".

[^xyz] is "anything except an x, y or a z".


The entire expression is /A[^xyz]/

 any 5 digit integer.

[0123456789] is "any digit".


[0-9] is another way of saying "any digit".
\d is yet another way of saying "any digit".

The entire expression could be any of the following:


/[0123456789][0123456789][0123456789][0123456789][0123456789]/
/[0-9][0-9][0-9][0-9][0-9]/
/\d\d\d\d\d/
/\d{5}/ (we didn't cover this!)

Develop regular expressions for the following:

 Any perl scalar variable name (including the "$"). Perl variable names can contain any
alphanumeric character and the "_" character.

/\$\w+/

 Any string that contains nothing but whitespace.

Can't do it without something we haven't covered yet! If you try to use something like
/\s+/ it will match any string that contains any whitespace.

 An HTML Anchor tag (for example: <A HREF=blahblah>).

/<[aA]\s+[hH][rR][eE][fF]=.*>/

Develop regular expressions for the following:


 Any word (a word is defined as a sequence of alphanumerics - no whitespace) that
contains a double letter, for example "book" has a double "o" and "feed" has a double "e".

/([a-zA-Z])\1/

 Any string that contains an HTML tag and it's corresponding end tag. The following
should match: <H2>Hi Dave</H2> and so should <TITLE>The Test Answers</TITLE>,
but this should not match <TITLE>Not a match</H2>.

/<(\w+)>.*<\/\1>/

The answer above makes some assumptions about what is inside the angle braces (no
whitespace) that are not always true in HTML tags!

 Write a perl program that replaces all digits with the name of the digit, so every "0" is
replaced with "zero" , "1" is replaced with "one", ... "9" is replaced with "nine".

while (<>) { # read input one line at a time


s/0/zero/g; # replace all "0"s with "zero"
s/1/one/g; # replace all "1"s with "one"
s/2/two/g; # replace all "2"s with "two"
s/3/zero/g; # replace all "3"s with "three"
s/4/one/g; # replace all "4"s with "four"
s/5/two/g; # replace all "5"s with "five"
s/6/zero/g; # replace all "6"s with "six"
s/7/one/g; # replace all "7"s with "seven"
s/8/two/g; # replace all "8"s with "eight"
s/9/nine/g; # replace all "9"s with "nine"
print;
}

 Write a perl program that reads in an HTML file (from STDIN) and replaces all
<H1>,</H1> tag pairs with <H3>,</H3> tags.

while (<>) { # read input one line at a time


s/<H1>/<H3>/g; # replace all "lt;H1>" with "<H3>"

s/<\/H1>/<\/H3>/g; # replace all "</H1>" with "</H3>"


print;
}

Here is a better way! A single expression that can replace start or end tags.
while (<>) { # read input one line at a time
s/<(\/?)H1>/<\1H3>/g; # replace all "<H1>" with "<H3>"
# "</H1> with "</H3>"
print;
}

 Write a perl program that removes all HTML tags (anything that looks like an HTML tag
- you don't need to check each tag name).

while (<>) { # read input one line at a time


s/<[^>]*>//g; # remove anything that starts with "<"
# and ends with ">"
print;
}

 You might need to think about this one! Write a perl program that strips the HEAD
from a HTML file (everything between the <HEAD> tag and the </HEAD> tag. Keep in
mind that in HTML newlines mean nothing - any part of a document can be split amongst
lines any possible way.

HINT: It is much easier to read the entire sequence of lines in to a single perl scalar
variable. Since there are newlines in the single string that contains the entire document -
we need to use the "s" modifier to the substitute command if we want "." to match
newline.

@lines = <>; # read everyting until EOF


chop(@lines); # get rid if all newlines
$_ = join("",@lines); # combine lines into one giant string

# remove everything between the first and the last


# we need to use the "s" modifier so the ".*" can match a newline!

s/(.*?)<HEAD>.*<\/HEAD>(.*)/\1\2/s;

#print out whatever remains.


print;

Write a perl program that creates a student record in the form used as input to the above
program. Each line should contain a student name, followed by a tab (no tabs in the name are
allowed), followed by a test1 grade, followed by a tab, etc. A sample output line is:
Joe Smith\t88\t92\t77\n

Your program will accept input in the form of lines that contain name, value pairs with an equal
sign (=) between the name and the value. Here is a sample input file:
name = Joe Student
test1 = 86
test2 = 77
homework = 33
name = Jane Smith
test1 = 98
test2 = 35
homework = 85

for this input, the output should be this (\t is a tab):

Joe Student\t86\t77\t33
Jane Smith\t98\t35\t85

Here is one way to do this:

!/usr/bin/perl

# read in all the lines

@lines = <>;

# get rid of all newlines

chomp(@lines);

# remove junk from each line

foreach $i (@lines) {
$i =~ s/[^=]+=\s(.*)/\1/;
}

#now loop over all lines, handling 4 at a time


for ($i=0;$i<=$#lines;$i=$i+4) {
print join("\t",@lines[$i..$i+3]), "\n";
}

You might also like