CSC 205 Lecture Notes
CSC 205 Lecture Notes
CSC 205 Lecture Notes
STRUCTURED PROGRAMMING
(CSC 205)
~ 2020/2021 ~
TABLE OF CONTENTS
3
More special characters .................................................................................................... 32
Quoting special characters ................................................................................................ 33
Alternatives and parentheses ............................................................................................ 33
LECTURE NINE – PERL AND OPERATING SYSTEM ..................................................... 36
FILES AND I/O ................................................................................................................... 36
Filehandles ........................................................................................................................ 36
Open files .......................................................................................................................... 36
Closing files ...................................................................................................................... 36
Manipulating Files & Directories ..................................................................................... 37
LECTURE TEN – MODULES AND DATABASE ............................................................... 39
MODULES........................................................................................................................... 39
DATABASE AND PERL .................................................................................................... 39
The DBI ............................................................................................................................ 40
Database Drivers............................................................................................................... 40
LECTURE ELEVEN – PERL AND WEB PROGRAMMING .............................................. 42
CGI ....................................................................................................................................... 42
FUNCTION-ORIENTED SCRIPT ...................................................................................... 42
LECTURE TWELVE – SOAP AND SOCKET PROGRAMMING ...................................... 45
SOAP.................................................................................................................................... 45
PERL SOCKET PROGRAMMING .................................................................................... 46
4
LECTURE ONE – INTRODUCTION TO PERL
PERL OVERVIEW
PERL stands for “Practical Extraction and Reporting Language”. It is an interpreted
programming language. An interpreter runs through a program line by line and execute each
command. The code itself is platform independent. It is a scripting language i.e. it contain series
of commands that are interpreted one by one at runtime and it is used to give instruction to
other software such as web browser, server or standalone. It is best known for text processing
– dealing with files, strings and regular expression (good for string processing). Perl are useful
the following:
1. Tool for general system administration
2. Processing textual or numerical data
3. Database interconnectivity
4. Common Gateway Interface (CGI/Web) programming
5. Driving other programs (FTP, Mail, WWW, OLE)
PERL BASICS
The following are Perl basics:
1. A line of code is called statement
2. A block can contain several statements and enclosed in curly braces:
{
statement_1
statement_2
…
statement_n
}
3. All lines of code in Perl must end with one semicolon:
print "Hello.";
The only exception is a one-line statement enclosed in curly braces:
{ statement }
4. Comment lines are preceded with # symbol. The # can come at any point in the line:
Whatever comes after it will be ignored during execution:
# This next line sorts alphabetically
statement; # this statement handles computation
Writing a Perl program (script)
Write the Perl scripts in a text editor such as Notepad. The first line of a Perl program should
begin with the following Hash-bangs or shebangs:
5
#!/usr/bin/perl
The purposes of the hash-bangs is to:
1. tells the server which version of Perl to use
2. points to the location in the server directory of the Perl executable
1. Save the Perl scripts/programs with the extension .pl
Executing a Perl program (script)
Perl scripts can be executed in two ways:
1. Use command line (-e meaning \program follows as next argument")
> perl -e "print 'Hello, World!'"
2. From a text file
Create a plain-text document with the following program code in a text editor:
#!/usr/bin/perl
print 'Hello, World!';
Save the document as "myprogram.pl" and run it from the prompt as follows:
> perl myprogram.pl
Hello, World!
Perl Variables and Data Types
A variables can be considered as a container which holds one or more values. Once defined,
the variable remains, but the value or values can change over and over again. The value(s) can
be numbers or strings data types. There are three types of variables, namely, scalar, array and
hash.
Scalars are variables that holds a single value. Scalar names begin with $:
Variable Value
$name = "Aisha";
$age = "20";
Array is a variable that holds multiple values in series. Array names begin with @:
@names = ("Howard", "Leslie", "Bob");
A hash is a variable that holds pairs of data. Hash names begin with %:
%traits = ("name" => "Howard", "age" => "30", "eyes" =>
"brown");
Variable names begins with the special type-indicating character ($, @, or %, respectively),
followed by any combination of letters, numbers, and underscores. The first character after the
$, etc. must be a letter or underscore, not a number. Note: Variable names are case-sensitive.
Some valid variable names are: $time_of_arrival, $Time_of_Arrival, $timeofdeparture and
$TOD.
6
Naming Convention
There are certain rules about naming variables. Perl has the following rules for naming scalars:
1. All variable names will begin with a $, @ or %.
2. After the first character, alphanumeric characters i.e. a to z, A to Z and 0 to 9 are
allowed. Underscore is also allowed. Use underscore to split the variable names into
two words. But the first character can not be a number.
Examples:
Legal variable Non-legal
names variable names
$var; mohohoh
$var_1; missing
@array; $47
%my_hash; %
Some variables have a predefined and special meaning to Perl. A few of the most
commonly used ones are listed below.
$_ The default input and pattern-searching space
$0 Program name
$$ Current process ID
$! Current value of errno
@ARGV Array containing command-line arguments for the script
@INC The array containing the list of places to look for Perl scripts to
be evaluated by the do, require, or use constructs
%ENV The hash containing the current environment
%SIG The hash used to set signal handlers for various signals
7
LECTURE TWO – PERL VARIABLES I
SCALARS
Scalars are simple variables that are either numbers or strings of characters. Scalar variable
names begin with a dollar sign followed by a letter, then possibly more letters, digits, or
underscores. Variable names are case-sensitive. Two type of scalar data types are:
1. Numbers
2. Strings
Numbers
This type of scalar data is of two types:
1. integers, singly its whole numbers, like 3, 0, 490
2. floating point numbers i.e. real numbers, like 3.14, 0.333, 6.74
Strings
Strings are simply sequences of characters. There are two different types of strings:
1. single quotes string literals
2. double quotes string literals
Single-quoted string literals
Single quotation marks are used to enclose data you want to be taken literally. Example are:
#!/usr/bin/perl
$num = 7;
$txt = ‘it is $num’;
print $txt;
Output
it is $num
$num = 7;
$txt = “it is $num”;
print $txt;
Output:
it is 7
Here due to double quotes values of $num is taken added to the value of $txt. Double-quotes
interpolate scalar and array variables, but not hashes. On the other hand, you can use double-
8
quotes to interpolate slices of both arrays and hashes. Below is a list of common backslash
escapes used for interpolating:
\n Newline
\r Carriage return
\t Tab
\b Backspace
\e Escape
\\ Backslash
\” Double quote
\’ Single quote
\l lowercase next letter
\u uppercase next letter
Basic Operators
Arithmetic
Example Name Result
$a + $b Addition Sum of $a and $b
$a * $b Multiplication Product of $a and $b
$a % $b Modulus Remainder of $a divided by $b
$a ** $b Exponentiation $a to the power of $b
String
Example Name Result
$a . “string” Concatenation String built from pieces
“$a string” Interpolation String incorporating the value of $a
$a x $b Repeat String in which $a is repeated $b times
Assignments Operator
The basic assignment operator is “=”: $a = $b.
Perl conforms to the C idiom that:
variable operator = expression
is equivalent to
variable = variable operator expression
So that $a += $b is equivalent to $a = $a + $b
$a -= $b $a = $a - $b
$a *= $b $a = $a * $b
$a /= $b $a = $a / $b
$a %= $b $a = $a % $b
9
This also works for the string concatenation operator: $a.= “\n”
The autoincrement and autodecrement operators are special cases of the assignment
operators, which add or subtract 1 from the value of a variable:
++$a, $a++ Autoincrement Add 1 to $a
--$a, $a-- Autodecrement Subtract 1 from $a
Logical Operators
Conditions for truth:
Any string is true except for “” and “0”
Any number is true except for 0
Any reference is true
Any undefined value is false
Example Name Result
$a && $b And True if both $a and $b are true
$a || $b Or $a if $a is true; $b otherwise
!$a Not True if $a is not true
$a and $b And True if both $a and $b are true
$a or $b Or $a if $a is true; $b otherwise
not $a Not True if $a is not true
Relational Operators
Relation Numeric String Result
Equal == eq True if $a equal to $b
Not equal != ne True if $a not equal to $b
Less than < lt True if $a less than $b
Greater than > gt True if $a greater than $b
Less than or equal <= le True if $a not greater than $b
Comparison <=> cmp 0 if $a and $b equal
1 if $a greater
-1 if $b greater
Examples
$limit = 100;
$name = "Eliani";
if ($number == 100)
{
print "Limit!"
}
$grade = ($grade * 30)/100;
$name = $first_name . " " . $last_name;
@grades = ("98", "84", "73", "89");
print "$grades[0] and $grades[2]";
@VOTs = ("400", "378", "352");
print @VOTs;
400378352
print "@VOTs";
400 378 352
10
print '@VOTs'; # to review
@VOTs
print "Name:\tBecky\nEyes:\thazel\n";
Name: Becky
Eyes: hazel
$name = "Alejna";
11
The local modifier mask the same variable values to different values without actuaclly
changing the original values of the variable, suppose we have a variable $a for which the value
is assigned 5, you can actually change the vale of that variable by re-declaring the same variable
using local keyword without altering the original value of the variable which is 5 as follows:
#!/bin/perl
my $var=5;
{
Local $var = 3;
Print “local, \$var = $var \n”;
}
print “global, \$var = $var \n”;
The output of the above program will be:
local, $var = 3
global, $var = 5
This way we can change the value of the variable without affecting the original value.
A variable declared with access modifier “our” can be used across the entire package. The
variable can be accessed in any script which will use that package.
12
LECTURE THREE – PERL VARIABLES II
ARRAYS
An array is a special type of variable which stores data in the form of a list; each element can
be accessed using the index number which will be unique for each and every element. You
can store numbers, strings, floating values, etc. in your array.
Defining Arrays
In Perl, you can define an array using ‘@’ character followed by the name that you want to
give. Let define an array @array:
my @array = (a,b,c,d);
This is an array with 4 elements in it. The array index starts from 0 and ends to its maximum
declared size, in this case, the max index size is 3. Sequential arrays are those where you store
data sequentially. Suppose, you want to store 1-10 numbers o alphabets a-z in an array. Instead
of type all the latter, a range can be specified using a list constructor function (such as “..”):
@numbers = (1..10); #(1..10) = (1,2,3,4,5,6,7,8,9,10)
In the case of string values, it can be convenient to use the “quote-word” syntax
@a = (“fred”,”barney”,”betty”,”wilma”);
Accessing Array Elements
List elements are subscripted by sequential integers, beginning with 0
$foo[5] is the sixth element of @foo
The special variable $#foo provides the index value of the last element of @foo.
A subset of elements from a list is called a slice.
@foo[0,1] is the same as ($foo[0],$foo[1])
You can also access slices of list literals:
@foo = (qw( fred barney betty wilma ))[2,3]
Array Size
We can determine the size of an existing array as follows:
@array = (“a”,”b”,”c”,”d”);
$size = scalar (@array);
Dynamic array
Dynamic arrays are those that you declare without specifying any value on them. We store
value in the array during runtime from string scalar variable using the split function. Two
ways it does this is split up on spaces or commas. Example on split by spaces is given below:
$sentence = "Sue and I split up.";
@words = split(/ /, $sentence);
print "$words[4]\n"
13
The split on commas example:
$list = "Eenie, meenie, miney, moe";
@words = split(/,/, $list);
print "$words[3]\n";
Counting Array Element (Array size)
The scalar function is a method of counting the elements in an array.
@people = ("Moe", "Larry", "Curly");
print scalar(@people). "\n";
3
The second method of counting elements in
$count = @people;
print "$count\n";
3
The third method gives the last index of an array as follows:
print "$#people";
2
Sorting array elements
You can sort the elements of an array or the keys of a hash with the function sort. Note: By
default, it sorts both strings and numbers alphabetically!
@array = ("Betty", "Cathy", "Abby");
@array = sort(@array);
print "@array\n";
Abby Betty Cathy
@array = ("3", "40", "24", "100");
@array = sort(@array);
print "@array\n";
100 24 3 40
Sorting Array Keys
A very common type of loop makes use of the functions sort and keys. The latter yields all
the keys (not the values) in an array.
%signs = ("Frank" => "Capricorn", "Amanda" => "Scorpio");
foreach $person (sort keys %signs) {
print "$person: $signs{$person}\n";
}
Amanda: Scorpio
Frank: Capricorn
The reverse function reverses the order of the elements of a list
@b = reverse(@a);
14
Adding elements to arrays
There are two ways of adding new elements to existing arrays. If we know the index we want
the element to have, we can do this:
@numbers = ("210", "450", "333");
$numbers[3] = "990";
If we simply want to add an element to the end of an array, we can use push:
push(@numbers, "990");
210 450 333 990
Push, Pop, shift, unshift for Perl arrays
Many list-processing functions operate on the paradigm in which the list is a stack. The
highest subscript end of the list is the “top,” and the lowest is the bottom.
push Appends a value to the end of the list
push(@mylist,$newvalue)
pop Removes the last element from the list (and returns it)
pop(@mylist)
shift Removes the first element from the list (and returns it)
shift(@mylist)
unshift Prepends a value to the beginning of the list
unshift(@mylist,$newvalue)
splice Inserts elements into a list at an arbitrary position
splice(@mylist,$offset,$replace,@newlist)
Examples:
@numbers = ("210", "450", "333");
$last = pop(@numbers);
print "$last\n";
333
Note that this is different from saying
$last = $numbers[2];
because this doesn't remove the element from the array. After pop, the array will have only 2
elements!
HASHES
A hash (or associative array) is an unordered set of key/value pairs whose elements are
indexed by their keys. Hash variable names have the form %foo.
A hash can also hold as many scalars as the array can hold. The only difference is that instead
of index we have keys and values. A hash contains an unordered set of key/value pairs whose
elements are indexed by their keys. A has can be declared starting with % followed by the name
of the hash.
15
Hash Variables and Literals
A literal representation of a hash is a list with an even number of elements (key/value
pairs) as described by the following examples:
%hash = (‘Femi’=>18, ‘Amina’=>17, ‘Chinedu’=>19);
Hash Functions
The keys function returns a list of all the current keys for the hash in question.
@hashkeys = keys(%hash);
As with all other built-in functions, the parentheses are optional:
@hashkeys = keys %hash;
In a scalar context, the keys function gives the number of elements in the hash. Conversely,
the values function returns a list of all current values of the argument hash:
@hashvals = values(%hash);
You can remove elements from a hash using the delete function:
delete $hash{‘key’};
16
LECTURE FOUR – PERL I/O AND CONTROL STRUCTURES
I/O BASICS
Perl needs data to interact with. Input refers to getting information into your program while
the output is the information obtained from a program.
Taking Inputs
The input from the keyboard (standard input) can be achieved as follows:
print “Please enter your name: “;
my $input = <STDIN>;
print "\nhello $name!\n";
my $name declares a scalar variable. It can hold a number (integer or real), or an arbitrary
length string. <STDIN> means read one line from stdin. The line read is then assigned to
$name. The second print statement prints the result of the string "\nhello $name!\n" after
variable interpolation: the current value of $name is interpolated into the string in place of the
character sequence $name. For instance, if $name = "Kola" then the string would be "\nhello
Kola!\n". The < > operator reads a line and returns it including the newline (\n) at the end. To
get rid of these new line use chomp as follows:
chomp $name;
immediately after reading $name. This deletes a trailing newline.
Writing Output
The output is displayed using print function. The print function can take any number of
arguments and prints them to the standard output (usually the screen).
print “Hello World!\n”;
$a=4;
print “$a”;
CONTROL STRUCTURES
Conditional Structures (If/elsif/else)
The basic construction to execute blocks of statements is the if statement. The if statement
permits execution of the associated statement block if the test expression evaluates as true. It
is important to note that unlike many compiled languages, it is necessary to enclose the
statement block in curly braces, even if only one statement is to be executed.
The general form of an if/then/else type of control statement is as follows:
if (expression_one) {
true_one_statement;
} elsif (expression_two) {
true_two_statement;
} else {
all_false_statement;
17
}
The “ternary” operator is another nifty one to keep in your bag of tricks:
$var = (expression) ? true_value : false_value;
It is equivalent to:
if (expression) {
$var = true_value;
} else {
$var = false_value;
}
Perl Unless
Unless is opposite to if, unless code block will be executed if the condition is false.
my $a = 5;
unless ($a==5)
{
print “inside the unless block – The value is $a”;
}
else
{
print “Inside else block --- The value is $a”;
}
Output:
Inside else block – The value is 5
The output print the statement of the else block because the condition in unless code block is
true.
$a = “ This is Perl”;
unless ($a eq “SASSDSS”){
print “inside the unless block”;
}
else
{
print “Inside else block”;
}
Output
Inside unless Block
18
LECTURE FIVE - CONTROL STRUCTURES II
PERL LOOPS
Loop statements are used to repeat the executing of some code. Perl supports four types of
control structures (similar to other programming languages) for, foreach, while and until. Perl
provides several different means of repetitively executing blocks of statements.
For loop
The code block will execute till the condition is satisfied. The for loop has three semicolon-
separated expressions within its parentheses. These expressions function respectively for the
initialization, the condition, and incrementing as shown below:
for (initialization; condition; incrementing) {
statements;
}
This structure is typically used to iterate over a range of values. The loop runs until the
condition is false.
for ($i; $i<10;$i++) {
print $i;
}
Foreach
The foreach statement is much like the for statement except it loops over the elements of
a list:
my @array = (1..5);
foreach $i (@array) {
print “The value is $value\n";
}
Output
The value is 1
The value is 2
The value is 3
The value is 4
The value is 5
If the scalar loop variable is omitted, $_ is used as follows:
my @array = (1..5);
foreach (@array) {
print “The value is $_\n";#This is same as the above
#code
}
Output
The value is 1
The value is 2
The value is 3
The value is 4
The value is 5
19
We can obtain hash keys and values using foreach as follows:
my %hash = (‘Tom’ => 23, ‘Jerry’ => 24, ‘Mickey’ => 25);
foreach my $key (keys %hash) {
print “$key \n";
}
Output
Mickey
Tom
Jerry
In the above example, keys function is used to access the keys of the hash. However, we can
use the value function to access values of the hash.
my %hash = (‘Tom’ => 23, ‘Jerry’ => 24, ‘Mickey’ => 25);
foreach my $value(values %hash) {
print “the value is $value \n";
}
Output
the value is 24
the value is 23
the value is 25
While
The Perl while loop is a control structure, where the code block will be executed till the
condition is true. The code block will exit only if the condition is false. Example is as
follows:
#!/usr/bin/perl
$a=1;
while ($a<3) {
print “$a\n”;
$a=$a+1;
}
Output
1
2
3
Do-while
Do while loop will execute at least once even if the condition in the while section is false.
Foe example:
#!/usr/bin/perl
$a=1;
do {
print “$a\n”;
20
$a=$a+1;
}
while ($a<=3)
print “Now value is greater than 3”;
Output
1
2
3
Now value is greater than 3
Until
The until loop tests an expression at the end of a statement block; statements will be
executed until the expression evaluates as true.
#!/usr/bin/perl
$a=1;
until ($a>3) {
print “$a\n”;
$a=$a+1;
}
Output
1
2
3
Do until
Do until can be used only when we need a condition to be false, and it should be executed at
least once.
A statement block is executed at least once, and then repeatedly until the test expression
is true.
#!/usr/bin/perl
$a=1;
Do {
print “$a\n”;
$a=$a+1;
} until ($a>3);
Output
1
2
3
Exiting the loop
It is possible to exit the loop before the end of the loop via three methods: next, redo, and last
statements.
next means "skip over everything else in the block, increment the counter, and
evaluate the conditional again."
21
redo means "skip over everything else in the block and evaluate the conditional again,
without incrementing the counter."
last means "exit the block and never come back."
Example:
foreach $student (@students) {
if ($student eq "END_REGISTERED") {
last;
}
elsif ($student eq "Silber"){
next;
}
else {
$grade = Check_Grade ($student);
}
print "$student: $grade\n";
}
22
LECTURE SIX - SUBROUTINES AND FUNCTIONS
SUBROUTINES AND FUNCTIONS
Perl subroutines encapsulate blocks of code in the usual way. A subroutine can return a scalar
or an array. Subroutines are defined in Perl as:
sub subname {
statement_1;
statement_2;
}
Subroutine definitions are global; there are no local subroutines.
Invoking subroutines
The ampersand (&) is the identifier used to call subroutines. They may also be called by
appended parentheses to the subroutine name:
name();
&name;
You may use the explicit return statement to return a value and leave the subroutine at
any point.
sub myfunc {
statement_1;
if (condition) return $val;
statement_2;
return $val;
}
Passing arguments
Arguments to a subroutine are passed as a single, flat list of scalars, and return values are passed
the same way. Any arguments passed to a subroutine come in as @_. To pass lists of hashes, it
is necessary to pass references to them:
@returnlist = ref_conversion(\@inlist, \%inhash);
The subroutine will have to dereference the arguments in order to access the data values they
represent.
sub myfunc {
my($inlistref,$inhashref) = @_;
my(@inlist) = @$inlistref;
my(%inhash) = %$inhashref;
statements;
return @result;
}
Prototypes allow you to design your subroutines to take arguments with constraints on
the number of parameters and types of data.
Example:
sub Three {
23
return (1 + 2);
}
sub Sum1 {
my ($x, $y) = @_; # the first lines of many
#functions look like this to retrieve and name their
#params
return($x + $y);
}
# Variant where you pull the values out of @_ directly
# This avoids copying the parameters
sub Sum2 {
return($_[0] + $_[1]);
}
24
# How Sum() would really be written in Perl -- it takes
#an array of numbers of arbitrary length, and adds all of
#them...
sub Sum3 {
my ($sum, $elem); # declare local vars
$sum = 0;
foreach $elem (@_) {
$sum += $elem;
}
return($sum);
}
## Variant of above using shift instead of foreach
sub sum4 {
my ($sum, $elem);
$sum = 0;
while(defined($elem = shift(@_))) {
$sum += $elem;
}
return($sum);
}
25
LECTURE SEVEN – PATTERN MATCHING
REGULAR EXPRESSIONS
Perl's most famous strength is in string manipulation with regular expressions. A regular
expression is a way of describing a class of similar strings in a very compact pattern notation.
The simple syntax to search for a pattern in a string is:
$string =~ /pattern/) ## true if the pattern is found in the
string
("binky" =~ /ink/) ==> TRUE
("binky" =~ /onk/) ==> FALSE
In the simplest case, the exact characters in the regular expression pattern must occur in the
string somewhere. All of the characters in the pattern must be matched, but the pattern does
not need to be right at the start or end of the string, and the pattern does not need to use all the
characters in the string.
A whole regex is (usually) placed inside a pair of `/' signs. Variables are interpolated before
pattern-matching occurs.
Character Classes
Square brackets can be used to represent a set of characters. For example [aeiouAEIOU] is a
one character pattern that matches a vowel. Most characters are not special inside a square
bracket and so can be used without a leading backslash (\). \w, \s, and \d work inside a character
class, and the dash (-) can be used to express a range of characters, so [a-z] matches lowercase
"a" through "z". So the \w code is equivalent to [a-zA-Z0-9]. If the first character in a character
class is a caret (^) the set is inverted, and matches all the characters not in the given set. So
[^0-9] matches all characters that are not digits. The parts of an email address on either side of
the "@" are made up of letters, numbers plus dots, underbars, and dashes. As a character class
that's just [\w._-].
"[email protected]" =~ m/^[\w._-]+@[\w._-]+$/
==> TRUE
27
print "Match Found\n";
}
else
{
print "Match Not Found\n";
}
Output
Match Found
The substitution operator, s///, is a variation of the match operator that is used to search and
replace. The basic form of the operator is −
s/PATTERN/REPLACEMENT/;
The PATTERN is the regular expression for the text that we are looking for. The
REPLACEMENT is a specification for the text or regular expression that we want to use to
replace the found text with.
For example
#/user/bin/perl
# Initialising a string
$string = "Adams is a computer science student.";
# Calling the substitute regular expression
$string =~ s/Adams/Sunday/;
$string =~ s/computer science/information science/;
# Printing the substituted string
print "$string\n";
Output
Sunday is a information science student
The translation, tr/// or y///, is used to replace all the occurrences of a character with a given
single character.
The translation operators are −
tr/SEARCHLIST/REPLACEMENTLIST/
y/SEARCHLIST/REPLACEMENTLIST/
The translation replaces all occurrences of the characters in SEARCHLIST with the
corresponding characters in REPLACEMENTLIST.
Example:
#/user/bin/perl
# Initialising a string
28
$string = 'Universities';
# Calling the tr/// operator
$string =~ tr/s/z/;
# Printing the replaced string
print "$string\n";
Output
Univerzitiez
Standard Perl ranges can also be used, allowing you to specify ranges of characters either by
letter or numerical value.
$string =~ tr/a-z/A-Z/;
Split operator
Split looks for occurrences of a regular expression and breaks the input string at
those points.
@fields = split(pattern,$input);
Without any arguments, split breaks on the whitespace in $_:
@words = split; is equivalent to
@words = split(/\s+/,$_);
Join operator
Join, the complement of split, takes a list of values and glues them together with the provided
delimiting string.
$output = join($delimiter,@inlist);
29
LECTURE EIGHT – PATTERN MATCHING II
REGULAR EXPRESSION CHARACTERS
A regex is made up of single character patterns, grouping patterns, alternation patterns,
anchoring patterns and bracketing patterns.
Single character
‘.’ matches any single character.
A single printable character matches itself
[set] matches any single character in the set e.g. For example, [aeiou] matches any single lower-
case vowel.
Also, the set may contain items of the form a-f, which is a shorthand for abcdef.
For example, [a-z#%] matches any single lower-case letter, a hash-mark, or a percent sign.
If a set starts with a `^' character (eg. [^a-z#%]), the set is negated - the pattern matches any
character NOT in the set.
Several useful character classes are predefined.
Digit \d [0-9]
Non-digit \D [^0-9]
Words \w [a-zA-Z0-9_]
Non-word \W [^a-zA-Z0-9_]
Whitespace \s space or tab
Non-whitespace \S not space or tab
Grouping features
Sequence of single-character patterns: matches a corresponding sequence of characters.
eg. /[a-z]bc/ matches any lower case letter, followed immediately by a `b', followed
immediately by a `c', anywhere in the string.
Optional: `?' makes the previous pattern optional - i.e. match zero or one times. eg. /he?llo/
matches `hello' or `hllo'.
Zero-or-more: `*' makes the previous pattern apply any number of times (from 0 upwards). eg.
/he*llo/ matches `hllo', `hello', `heello' etc. It consumes the maximum number of `e's possible
(it's greedy).
One-or-more: `+' means match 1 or more times. eg. /he+llo/ matches `hello', `heello', `heeello'
etc but not `hllo'.
30
A regex can contain several of these operators: eg: /h[uea]*l+o/ matches `hlo', `hullo', `hulllllo',
`heeelo', `heuaueaaeuelllllllo' etc.
One problem with * and +, is that they are "greedy" -- they try to use up as many characters as
they can. Suppose you are trying to pick out all of the characters between two curly braces { }.
The simplest thing would be to use the pattern.
m/{(.*)}/ -- pick up all the characters between {}'s
The problem is that if you match against the string "{group 1} xx {group 2}", the * will
aggressively run right over the first } and match the second }. So $1 will be "group 1} xx
{group 2" instead of "group 1". Fortunately Perl has a nice solution to the too-aggressive-*/+
problem. If a? immediately follows the * or +, then it tries to find the shortest repetition which
works instead of the longest. You need the ? variant most often when matching with .* or \S*
which can easily use up more than you had in mind. Use ".*?" to skip over stuff you don't care
about, but have something you do care about immediately to its right. Such as..
m/{(.*?)}/ ## pick up all the characters between {}'s, but stop
## at the first}
The old way to skip everything up until a certain character, say }, uses the [^}] construct like
this.
m/{([^}]*)}/ ## the inner [^}] matches any char except }
Anchoring Pattern
Placing `^' at the start of a regex matches the start of the string. Similarly, `$' at the end of a
regex matches the end of the string.
`\b' constrains the regex to match only at a word boundary.
Without any anchoring, the regex can match anywhere.
Alternation and bracketing patterns
A regex of the form /h[eua]*llo|wo+tcha/ matches either /h[eua]*llo/ or /wo+tcha/.
Note that /a|b|c|g/ should be written as /[abcg]/ instead for efficiency
Brackets may be placed around any complete sub-pattern, as a way of enforcing a desired
precedence. For example, in /so+ng|bla+ckbird/ obviously bird is only part of (bla+ckbird).
If you meant "/so+ng|bla+ck/ followed by /bird/", then write that as /(so+ng|bla+ck)bird/.
If you want a repetition of anything longer than a single character pattern, you need brackets,
as in /(hello)*/. Without brackets, /hello*/ means /hell/ followed by /o*/ of course!
Brackets have another useful side effect: they tell Perl's regex engine to remember or capture
the text fragment that matched the inner pattern for later reporting or reuse. eg:
my $str = "I'm a melodious little soooongbird, hear me sing";
31
print "found <$1>\n" if $str =~ /(so+ng|bla+ck)bird/;
After the match succeeds, the capture buffer variable $1 contains soooong - the part of $str
matching the bracketed regex.
RE special characters
. # Any single character except a newline
^ # The beginning of the line or string
$ # The end of the line or string
* # Zero or more of the last character
+ # One or more of the last character
? # Zero or one of the last character
RE examples
^.*$ # matches the entire string
hi.*bye # matches from "hi" to "bye" inclusive
x +y # matches x, one or more blanks, and y
^Dear # matches "Dear" only at beginning
bags? # matches "bag" or "bags"
hiss+ # matches "hiss", "hisss", "hissss", etc.
Square brackets
[qjk] # Either q or j or k
[^qjk] # Neither q nor j nor k
[a-z] # Anything from a to z inclusive
[^a-z] # No lower case letters
[a-zA-Z] # Any letter
[a-z]+ # Any non-zero sequence of
# lower case letters
More examples
[aeiou]+ # matches one or more vowels
[^aeiou]+ # matches one or more nonvowels
[0-9]+ # matches an unsigned integer
[0-9A-F] # matches a single hex digit
[a-zA-Z] # matches any letter
[a-zA-Z0-9_]+ # matches identifiers
More special characters
\n # A newline
32
\t # A tab
\w # Any alphanumeric; same as [a-zA-Z0-9_]
\W # Any non-word char; same as [^a-zA-Z0-9_]
\d # Any digit. The same as [0-9]
\D # Any non-digit. The same as [^0-9]
\s # Any whitespace character
\S # Any non-whitespace character
\b # A word boundary, outside [] only
\B # No word boundary
Quoting special characters
\| # Vertical bar
\[ # An open square bracket
\) # A closing parenthesis
\* # An asterisk
\^ # A carat symbol
\/ # A slash
\\ # A backslash
Alternatives and parentheses
jelly|cream # Either jelly or cream
(eg|le)gs # Either eggs or legs
(da)+ # Either da or dada or
# dadada or...
33
#### The last . in the pattern is not matched
"piiig" =~ m/p.i.../ ==> FALSE
#### \d = digit [0-9]
"p123g" =~ m/p\d\d\dg/ ==> TRUE
"p123g" =~ m/p\d\d\d\d/ ==> FALSE
#### \w = letter or digit
"p123g" =~ m/\w\w\w\w\w/ ==> TRUE
#### i+ = one or more i's
"piiig" =~ m/pi+g/ ==> TRUE
#### matches iii
"piiig" =~ m/i+/ ==> TRUE
"piiig" =~ m/p+i+g+/ ==> TRUE
"piiig" =~ m/p+g+/ ==> FALSE
#### i* = zero or more i's
"piiig" =~ m/pi*g/ ==> TRUE
"piiig" =~ m/p*i*g*/ ==> TRUE
#### X* can match zero X's
"piiig" =~ m/pi*X*g/ ==> TRUE
#### ^ = start, $ = end
"piiig" =~ m/^pi+g$/ ==> TRUE
#### i is not at the start
"piiig" =~ m/^i+g$/ ==> FALSE
#### i is not at the end
"piiig" =~ m/^pi+$/ ==> FALSE
"piiig" =~ m/^p.+g$/ ==> TRUE
"piiig" =~ m/^p.+$/ ==> TRUE
"piiig" =~ m/^.+$/ ==> TRUE
#### g is not at the start
"piiig" =~ m/^g.+$/ ==> FALSE
#### Needs at least one char after the g
"piiig" =~ m/g.+/ ==> FALSE
#### Needs at least zero chars after the g
"piiig" =~ m/g.*/ ==> TRUE
#### | = left or right expression
"cat" =~ m/^(cat|hat)$/ ==> TRUE
"hat" =~ m/^(cat|hat)$/ ==> TRUE
"cathatcatcat" =~ m/^(cat|hat)+$/ ==> TRUE
"cathatcatcat" =~ m/^(c|a|t|h)+$/ ==> TRUE
"cathatcatcat" =~ m/^(c|a|t)+$/ ==> FALSE
#### Matches and stops at first 'cat'; does not get to
'catcat' on the right
"cathatcatcat" =~ m/(c|a|t)+/ ==> TRUE
#### ? = optional
"12121x2121x2" =~ m/^(1x?2)+$/ ==> TRUE
34
"aaaxbbbabaxbb" =~ m/^(a+x?b+)+$/ ==> TRUE
"aaaxxbbb" =~ m/^(a+x?b+)+$/ ==> FALSE
#### Three words separated by spaces
"Easy does it" =~ m/^\w+\s+\w+\s+\w+$/ ==> TRUE
#### Just matches "gates@microsoft" -- \w does not match the
"."
"[email protected]" =~ m/\w+@\w+/ ==> TRUE
#### Add the .'s to get the whole thing
"[email protected]" =~ m/^(\w|\.)+@(\w|\.)+$/ ==> TRUE
#### words separated by commas and possibly spaces
"Klaatu, barada,nikto" =~ m/^\w+(,\s*\w+)*$/ ==> TRUE
35
LECTURE NINE – PERL AND OPERATING SYSTEM
FILES AND I/O
Filehandles
Variables which represent files are called "file handles", and they are handled differently from
other variables. They do not begin with any special character -- they are just plain words. By
convention, file handle variables are written in all upper case, like FILE_OUT or SOCK. The
file handles are all in a global namespace, so you cannot allocate them locally like other
variables. File handles can be passed from one routine to another like strings. Every Perl
program has three filehandles that are automatically opened for it: STDIN, STDOUT, and
STDERR:
STDIN Standard input (keyboard or file)
STDOUT Standard output (print and write send output here)
STDERR Standard error (channel for diagnostic output)
Open files
Filehandles are created using the open() function:
open(FILE,”filename”);
You can open files for reading, writing, or appending:
open(FILE,”> newout.dat”) Writing, creating a new file
open(FILE,”>> oldout.dat”) Appending to existing file
open(FILE,”< input.dat”) Reading from existing file
As an aside, under Windows, there are a number of ways to refer to the full path to a file:
”c:\\temp\\file” Escape the backslash in double quotes
‘c:\temp\file’ Use proper path in single quotes
“c:/temp/file” UNIX-style forward slashes
It is important to realize that calls to the open() function are not always successful. Perl will
not (necessarily) complain about using a filehandle created from a failed open(). This is why
we test the condition of the open statement:
open(F,”< badfile.dat”) or die “open: $!”
Closing files
When you are finished using a filehandle, close it using close():
close(FILE);
Examples
Opening and closing files using file handle to a filename
## open "filename" for reading as file handle F1
open(F1, "filename");
36
## open "filename" for writing as file handle F2
open(F2, ">filename");
open(F3, ">>appendtome") ## open "appendtome" for
appending
mkdir(dirname, mode)
The “mode” specifies the permissions (set this to 0777 to be safe).
Removes (empty) directories
37
rmdir(dirname)
Change current working directory to dirname
chdir(dirname)
Change the permissions of files/directories:
38
LECTURE TEN – MODULES AND DATABASE
MODULES
Namespaces store identifiers for a package, including variables, subroutines, filehandles, and
formats, so that they are distinct from those of another package. The default namespace for the
body of any Perl program is main. You can refer to the variables from another package by
“qualifying” them with the package name. To do this, place the name of the package followed
by two colons before the identifier’s name:
$Package::varname
If the package name is null, the main package is assumed.
Modules allow you to split a large program into separate source files and namespaces,
controlling the interface. They extend the functionality of core Perl with additional compiled
code and scripts. To make use of a package (if it’s installed on your system), call the use
function:
use CGI;
This will pull in the module’s subroutines and variables at compile time. Perl looks for modules
by searching the directories listed in @INC. Modules can be obtained from the Comprehensive
Perl Archive Network (CPAN) at
http://www.cpan.org/modules/
or from the ActiveState site:
http://www.ActiveState.com/packages/zips/
To install modules under UNIX, unarchive the file containing the package, change into its
directory and type:
perl Makefile.PL
make
make install
On Windows, the ActivePerl distribution makes use of the “Perl Package Manager” to
install/remove/update packages. To install a package, run ppm on the .ppd file associated
with the module:
ppm install module.ppd
DATABASE AND PERL
Interacting with a database in Perl involves two pieces: the DBI and the database driver, or
DBD. Each of these pieces is a Perl module. The DBI provides the software interface that is
independent of the database, and the DBD provides the software that is database-dependent.
39
The DBI
The DBI (database independent/database interface), contains data-access libraries that are
independent of the type of database. The DBI provides a generic interface on which you call a
driver to access a database. This general interface allows you to use some common methods,
regardless of the backend database. The DBI is a module in itself, and thus is called into your
program’s namespace with a use pragma:
use DBI;
The DBI loads one or more database drivers (generally referred to as DBD, for database
dependent). The DBD, which will be discussed shortly, has the specific software and code
required to access a given type of database. It provides the interface between the DBI and the
type of database for the connection. When coupled with the appropriate DBD, the DBI is the
key to making database connections work.
Database Drivers
A database driver provides the database-interaction methods that are specific to the individual
database implementation. It is commonly referred to as the DBD, for database dependent, since
its code depends on which database is being used. For example, a MySQL database has
different syntax than an Oracle database. The DBI operates independently of the database,
leaving the implementation-specific bits to the DBD. You might be curious as to which drivers
are installed on your server. The DBI module provides a function for listing all of the currently
installed drivers. The example below uses the available_drivers() function of the DBI module
to retrieve the drivers available on the server.
#!/usr/bin/perl
use strict;
use DBI;
my @drivers;
@drivers = DBI->available_drivers();
foreach my $dbd (@drivers) {
print "$dbd driver is available\n";
}
exit;
You run this program from the command line. The output will look something like this:
ExampleP driver is available
Proxy driver is available
mysql driver is available
The program incorporates the DBI into the namespace with this line:
use DBI;
The available drivers are placed into an array called @drivers with this line:
40
@drivers = DBI->available_drivers;
Finally, the array is expanded within the foreach loop and printed to STDOUT, producing
the output. From the output, it is seen that MySQL DBD is installed on this server. If you
wanted to connect to a different type of database, you would need to obtain the DBD module
from your favorite CPAN mirror or install it from your distribution’s repository. For example,
Debian 3.0 includes a number of DBDs, a listing of which is available by searching the
repository with the command apt-cache search dbd. Some of the more popular DBDs include
the following:
1. MySQL: As previously stated, MySQL is one quarter of the prized LAMP (Linux-
Apache-MySQL-Perl) development platform that’s so popular around the world.
2. PostgreSQL: Another popular open-source database is PostgreSQL. The DBD for
PostgreSQL is similar to that of MySQL.
3. ODBC: The ODBC DBD is commonly used to connect to databases that run on
Windows systems, such as Microsoft SQL Server and Microsoft Access, but the ODBC
driver could be used to connect to virtually any database that offers ODBC connectivity.
4. Sybase: Another popular DBD is used with the Sybase database server.
41
LECTURE ELEVEN – PERL AND WEB PROGRAMMING
CGI
CGI (Common Gateway Interface) is a standard of communication between a web server and
a client, such as a user with a web browser. CGI is a class of programs that work with web
servers. The programs themselves are considered to be external or separate from the web
servers, but they provide a gateway between the client and server and it provides a common
interface i.e. the CGI programs can be written in a number of languages. Of the languages with
which a CGI program can be written, Perl is arguably the most common.
FUNCTION-ORIENTED SCRIPT
The function-oriented method of CGI development allows you to rapidly develop small CGI
scripts. The function-oriented method requires that the developer explicitly call or import the
desired functions into their program. This is sometimes done by importing the method groups
rather than the individual methods themselves. The most common method grouping in practice
is the :standard group. This method group contains the most frequently used methods, including
those that make it easy to create and use of web forms, as well as the HTML that surrounds
those forms. The:standard group is used in the Hello World example below:
#!/usr/bin/perl -T
use strict;
use CGI ':standard';
print header;
print start_html('Hello World');
print h1('Hello World');
print end_html();
exit;
Create this code in you’re a text editor and save it to a location defined to run CGI scripts within
your web server. For example, you can save the script as hello.cgi in the directory /usr/lib/cgi-
bin. The CGI will need the correct permissions in order to run. This can usually be
accomplished with the chmod 755 <scriptname.cgi> command:
chmod 755 /usr/lib/cgi-bin/hello.cgi
To view the code in a web browser, point the browser to the URL (uniform resource locator)
of the CGI script. For example, if the script is on a server at the IP address 192.168.1.10.
Combining the server address plus the aliased script location results in the following URL. If
all goes well, you should see a page similar to that in Figure below.
42
From the example, it is seen that the standard invocation of the perl interpreter #!/usr/bin/perl.
#!/usr/bin/perl
The next line of code enables strict checking for the script:
use strict;
This line will show up in every script in this chapter. Following use strict; is the code that
actually calls the CGI.pm module, and more specific module.
use CGI ':standard';
Four functions of the CGI module are used in this script, as shown here:
print header;
print start_html('Hello World');
print h1('Hello World');
print end_html();
The first function, header(), sends the Content-Type to the browser. In this instance, the
header() function is equivalent to using this bit of code in the script:
print "Content-Type: text/html\n\n";
The header() function can also be used for other HTTP headers, such as cookies. The next CGI
function used is start_html(). This function begins the HTML portion of the page with elements
like <title>, <html>, <head>, and so on. In this instance, the script calls the start_html()
function with a string parameter 'Hello World'. Another CGI function called in this script is
h1(). This function places an <h1> element around its parameter. In this case, the parameter
43
passed is 'Hello World', the phrase “Hello World” was indeed given <h1> size. Finally, the
end_html() function is called to provide the </body> and </html> closing tags. The code in the
example uses a semicolon (;) to end each line and then another print statement to begin the next
line. This was done to make the code easier to read. However, it’s quite common to use a
comma in place of the semicolon when programming a CGI application, so the code would
look like this:
#!/usr/bin/perl -T
use strict;
use CGI ':standard';
print header,
start_html('Hello World'),
h1('Hello World'),
end_html();
exit;
44
LECTURE TWELVE – SOAP AND SOCKET PROGRAMMING
SOAP
SOAP is a ubiquitous protocol for exchanging information on the Internet. SOAP is a means
by which remote procedures or methods can be called as if they were local. When you call a
SOAP (Simple Object Access Protocol) method, you are requesting an application to perform
some computation and return a result to your program. This is the same concept as a local
method call; it’s just that the SOAP call happens to be remote. These method calls sent using
SOAP can be transported over a number of mechanisms, such as over HTTP. SOAP provides
a well-formed means to obtain information from a data source. In a SOAP request, you provide
parameters as required by the receiving application. These parameters are then used by the
server, which executes the query to the application’s data source on behalf of the client and
returns values to the client in a SOAP response. This information can then be parsed and used
within the local Perl application. The Perl SOAP::Lite module can be used to create a SOAP
client and a SOAP listener.
SOAP is client/server format, with one side sending the message and the other side parsing the
XML content of the message. The application may take action based on the results of the
message, either on the receiver or sender, or both. SOAP is an XML format with three elements:
envelope, header, and body. The SOAP header and SOAP body are both contained within the
SOAP envelope. The SOAP header is actually optional, although it’s almost always included.
The SOAP body contains the heart of the SOAP message. Several popular web sites have
SOAP interfaces available lists a select few of these services, along with a short description of
each. Some website with SOAP interface
Site/Service Description Information URL
Amazon.com Numerous web http://www.amazon.com/gp/aws/landing.html
services to expose
data on products at
Amazon.com
National Weather Web services to http://weather.gov/xml/
Service expose forecasts and
conditions based on
latitude and
longitude; available
for U.S. locations
45
Google Interfaces to query http://www.google.com/apis/
Google’s directory
and more
1. Create Socket
2. Connect to Server using on its port address
46