Unit-I Part-II - Introduction To PERL
Unit-I Part-II - Introduction To PERL
UNIT-I PART-II
Introduction to Scripting and PERL
Introduction to PERL
Names and Values
Variables
Scalar Expressions
Control Structures
Collection of Data- Working with lists/arrays and hashes
Strings Pattern and Regular Expressions
Subroutines
Introducing PERL
Mission critical
Used for mission critical projects in the public and private sectors.
Object-oriented, procedural and functional
Supports object-oriented, procedural and functional programming.
Easily extendible
There are over 25,000 open source modules available from the
Comprehensive Perl Archive Network (CPAN).
Text manipulation
Perl includes powerful tools for processing text that make it ideal
for working with HTML, XML, and all other mark-up and natural
languages.
PERL Features
Unicode support
Supports Unicode version 6 (from Perl 5.14).
Database integration
Perl's database integration interface (DBI) supports third-party databases
including Oracle, Sybase, Postgres, MySQL and many others.
C/C++ library interface
Perl interfaces with external C/C++ libraries through XS or SWIG(Simplified
Wrapper Interface Generator).
Embeddable
The Perl interpreter can be embedded into other systems such as web servers
and database servers.
Open Source
Perl is Open Source software, licensed under its Artistic License, or the GNU
General Public License (GPL).
PERL – Installation & Running
use strict;
It tell the interpreter to terminate the program if an error occurs
Eg: Defining a variable is MANDATORY using keyword ‘my’ if
you write ‘use strict;’ in a program.
Eg: my $marks=56;
use warnings;
to tell the interpreter to display warning messages
use 5.10.0;
to tell the interpreter to make use of version 5.10.0 features
Eg: say() function
use q for single quotes and qq for double quotes
Names and Values in Perl
Scalar data :
Strings and numbers
In common with many scripting languages , Perl recognizes just two kinds
of scalar data: strings and numbers.
There is no distinction between integers and real numbers as different types -
a number is a number. Internally, numbers are stored as integers if possible,
and otherwise as double length floating point numbers in the native format.
strings are stored as sequences of bytes of unlimited length.
Perl is a dynamically typed language the system keeps track of whether a
variable contains a numeric value or a string value, and the user doesn't have
to worry about the difference between strings and numbers since
conversions between the two kinds of data are done automatically as
required by the context in which they are used
Names and Values in Perl
Numeric constants
Numeric constants (number literals) can be written in a variety of ways,
including scientific notation, octal and hexadecimal.
Although Perl tries to emulate natural human communication, the common
practice of using commas or spaces to break up a large constant into meaningful
digit groups cannot be used, since the comma has a syntactic significance in
Perl.
Instead, underscores can be included in a number literal to improve legibility.
Examples:
123,122.45,122.45e-5,122.45E-5
4929712198024,4929_712 198_024
0377 (octal)
Ox3 f f f (hex)
Names and Values in Perl
String Constants
String constants (string literals) can be enclosed in single or double
quotes. The string is terminated by the first next occurrence of the quote
(single or double) which started it, so a single-quoted string can include
double quotes and vice versa.
The q (quote) and qq (double you to use any character as a quoting
character. Thus
q/any string/ or q(any string) are the same as 'any string‘
and
qq/any string/ or qq(any string)are the same as "any string!!
The character following the q or qq operator is the opening quote
character, and the next occurrence of that character is treated as closing
the quote character.
Variables and Assignment
Assignment:
Perl uses ‘=' as the assignment operator. It is important to note that
an assignment statement returns a value, the value This statements
like
$b= 4 + ($a= 3) ; which the value 3 to $a and the value 7 to $b.
A useful device often used in assignments is to interpolate the value
of a scalar variable into a double quoted after the assignments
$a ="Burger" ;
$b= "Beef $a ";
$c ="Turkey $a “;
The value of $b is "Beef Burger" and the value of $c is "Turkey
Burger", in both cases with a space in the middle.
Variables and Assignment
String operators:
Perl provides very basic operators on strings: most string
processing is done using built-in functions and regular
expressions.
Perl uses period(.) as concatenation operator .
The other string operator is x, which is used to replicate strings,
e.g. $a= "Hello" x 3; sets $a to "He110He110He110".
The capability of combining an operator with assignment is
extended to string operations.
e.g $foo.=“ “ it appends space to foo.
Scalar Expressions
Comparison operators:
The value of a comparison is returned as numeric 1 if true and an
empty string (“ “) if false, in accordance with the convention described
earlier.
Two families of comparison operators are provided one for numbers (=
< > etc.) and one for strings (eq lt gt etc.): the operator used determines
the context, and Perl converts the as required to match the operator.
This duality is necessary because a between a comparison between
strings made up entirely of numerical digits should apply the usual
rules for sorting strings using ASCII as a collating sequence, and this
may not give the same result as a numerical comparison.
E.g.(‘5’<’10’) returns true but (‘5’ lt ’10’) returns as false since 10
comes before 5 in the canonical sort order for ASCII strings
Scalar Expressions
Bitwise operators:
The unary tilde (-) applied to a numerical argument performs bitwise
negation on its operand, the one's complement.
If applied to a string operand it complements all the bits in the string -
an effective way of inverting a lot of bits.
The bitwise operators - & (and), | (or) and ^ (exclusive or) have a rather
complicated definition.
If either operand is a number or a variable that has previously been
used as a number, both operands are converted to integers if need be,
and the bitwise operation takes between the integers. If both operands
are strings, and if variables have never been used as numbers, Perl
performs the bitwise operation between corresponding bits in the two
strings padding the shorter string with zeros as required.
Scalar Expressions
Conditional expressions
A conditional expression is one whose value is chosen from two
alternatives at run-time depending on the outcome of a test. The
syntax is borrowed from C:
test ? true_exp:false_exp
The first expression is evaluated as a Boolean value: if it returns
true the whole expression is replaced by true_exp, otherwise it is
replaced by false_exp,
e.g. $a=($a>$b) ?0 :$a
Control Structures
Blocks
The concept of a block is very important in Perl. A block is just a
sequence of one or more statements enclosed in braces(Curly brackets)
e.g.
{$positive= 1;
$negative= -1}
The last statement in the block is terminated by the closing brace.
The control structures in Perl use conditions to control the evaluation of
one or more blocks . In fact that the body of subroutine is also a block.
Blocks can in fact appear almost anywhere that statement can appear:
such a block is sometimes called a bare block, and often in the context
of clever (or dirty)trick.
Control Structures
Conditions
A condition is just a Perl expression which is evaluated in a Boolean
context: if it evaluates to zero or the empty string the condition is
treated as false otherwise it is treated as true in accord with the rules
already given.
Conditions usually make use of the relational operators, and several
simple conditions can be combined into a complex condition using
the logical operators described in previous slides, e.g.
$total > 50
$total > 50 and $total <100
A condition can be negated using the ! operator, e.g.
!($total >50 and $total < 100)
Control Structures
Conditional execution
If-then-else statements :
If-statement e.g.
if ($total > 0){
print "$total\n"}
If-else-statement e.g.
if ($total >0) {
print "$total\n“ }
else {
print "bad total!\n"}
Mandatory is block enclosed with brackets and condition enclosed in
braces
Control Structures
Statement Qualifiers
Finally, PERL adds a bit more as we have seen in the examples
earlier ,a single statement (but not a block) can be followed by a
conditional modifier, as in the English 'I'll come if it's fine'. For example
print "OK\n" if $volts 1. 5;
print "Weak\n“ if $volts >= 1.2 and
$volts < 1. 5;
print “Replace\n" if $volts < 1. 2 ;
This is readable and self-documenting: compare the following code
expressions, which has the same effect:
print (($volts >= 1.5) ? "OK\n“: (($volts >= 1.2) ? "Weak\n" :
"Replace\n“));
Control Structures
Repetition:
Perl provides a variety of repetition mechanisms to suit all
tastes, including both ‘testing’ loops and 'counting' loops.
‘Testing’ loops
while ($a != $b){
if ($a > $b) {
$a = $a - $b
}else {
$b=$b-$a
}
}
Control Structures
Counting loops
Counting loops use the same syntax as c:
for ($i = l; $i <= 10; $i++){
$i_square = $i*$i; $i_cube= $i**3;
print “$i\t$i_square\t$i_cube\n“;
}
There is also a foreach construct, which takes an explicit list of
values for the controlled variable.
foreach $i (1 .. 10) {
$i_square = $i*$i; $i_cube =$i**3;
print “$i\t$i_square\t$i_cube\n“;
}
Control Structures
while <STDIN> {
last if /quit/;
……….
}
OUTER: while ( ... ){
INNER: while ( ... ){
if ( ... ) then {last OUTER;}
if ( ... ) then {next INNERi}
}}
Control Structures
The last and redo commands can be used in a bare block, as well
as in a looping context. For example the following fragment will
read lines from standard input, throwing away blank lines until
the first non-blank line is reached:
{$line =<STDIN>;
redo until $line =~/\S/}
The expression $line =~ / \S/ evaluates to true if the line contains
any characters that are not whitespace characters.
Built in Functions
Functions like print which take a list of arguments are called list
operators: functions that take a single argument are called named
unary operators. Both have rather unusual precedence rules that
nevertheless lead to natural and 'obvious' behaviour, as follows
If the token following the function name (operator) on the same
line is an opening bracket, the operator and its arguments have
highest precedence if it looks like a function call, it behaves like
a function call. For example:
$n =rand($m*2) + 1;
print("Total is $total\n");
Built in Functions
A named unary operator has lower precedence than arithmetic operations (but
higher precedence than logical operators), thus
$n = rand $m*2 + 1;
has the same effect as
$n = rand($m*2 + 1);
In the absence of the opening bracket, a list operator has very high precedence to
the and very low to the right. Thus in
print "Hello", "World!", "\n";
The commas bind tighter than the print, giving the desired effect, but if a list
operator appears as a component of a list, e.g.
("foo", "bar", substr $line, l0, 5)
The commas on the left of substr are evaluated after it, but the commas on the
right are evaluated before, giving the interpretation as
("foo", "bar", substr($line, 10, 5))
Collections of Data
List Magics
Lists are often used in connection with arrays and hashes,
List containing only variables can appear as the target of an
assignment and/or as the value to be assigned.
This makes it possible to write simultaneous assignments, e.g.
($a, $b, $c) =(1, 2, 3) ;
and to perform swapping or permutation without using a temporary
variable, e.g.
($a, $b) =($b, Sa);
($b, $c, Sa) = ($a, $b, $c);
Both of these are natural forms of expression that can be a great aid to
readability in Perl scripts.
Collections of Data
Arrays
An array is an ordered collection of data whose components are
identified by an ordinal index: it is usually the value of an array
variable. The name of such a variable always starts with an @,
e.g. @days_of_week, denoting a separate namespace and
establishing a list context.
The association between arrays and lists is a close one: an
array stores a collection, and a list is a collection, so it is
natural to assign a list to an array, e.g.
@rainfall=(1.2, 0.4, 0.3, 0.1, 0,0, 0);
Collections of Data
Hashes
In the world of scripting languages it is common to find
associative arrays (sometimes called content-addressable
arrays).
An associative array is one in which each element has two
components, a key and a value, the element being 'indexed' by its
(just like a table).
Such arrays are usually stored in a hash table to facilitate efficient
retrieval, and for this reason Perl uses the term hash for an
associative array.
Hashes are a very natural way of storing data, and are widely
used in Perl -probably more than conventional arrays.
Collections of Data
$somehash{aaa} =123;
$somehash{234} ="bbb" ;
$somehash{" $a "} =0;
%anotherhash =%somehash;
Working with Arrays and Lists
General form of
map expression, list;
and
map BLOCK list;
The function evaluates the expression or block for each
element of list, temporarily setting $_ equal to the list
item and returns a list containing the values of each such
evaluation. (Remember that the value returned by a block
is the value of the last expression evaluated in the block.)
Working with Arrays and Lists
grep
In UNlX, grep pattern file
Print (i.e sends to STDOUT) all lines of the file file that contain an
instance of pattern .In its simplest form, the Perl grep function takes a
pattern and a list and returns a new list containing all the elements of
the original list that match the pattern.
For example, given
@things =(car, bus, cardigan, jumper, carrot);
then
grep /car/ @things;
returns the list
(car, cardigan, carrot)
Working with Arrays and Lists
In fact the Perl grep is much more powerful than this. Its
general form is
Grep expression, list or grep BLOCK list.
Like map, the function evaluates the expression or block
for each element in the list,temporarily setting $_ to that
value: it returns a list of those elements for which the
evaluation returns true.
Working with Hashes
Creating Hashes:
A hash is a set of key/value pairs. Hash variables are preceded by a percent
(%) sign. To refer to a single element of a hash, you will use the hash variable
name preceded by a "$" sign and followed by the "key" associated with the
value in curly brackets.
A list of key-value pairs need to be assigned to hashes
%foo=(key1,value1,key2,value2,….);
Þ Can be used instead of ,
%foo=(key1=>value1,key2=>value2,…..);
example:%foo=(‘banana’=>’yellow’,’apple’=>’green’…..);
%data3 = ('John Paul', 45, 'Lisa', 30, 'Kumar', 40);
%data4 = ('John Paul' => 45, 'Lisa' => 30, 'Kumar' => 40);
%data5 = (-JohnPaul => 45, -Lisa => 30, -Kumar => 40);
Working with Hashes
Manipulating Hashes:
A hash can be unwound into list containing the key-value pairs by
assigning it to an array e.g. @list=%foo;
Perl provides a number of built-in functions to facilitate
manipulation of hashes.
@keylist=keys %data;
It returns a list of keys of the elements in the hash, and
@valuelist= values %data;
Returns a list of the values of elements in the hash.
Working with Hashes
delete $data{$key}-removes the element whose key matches $key from the hash
%data, and
exists $data{$key}-returns true if the hash %data contains an element whose key
matches $key
exists($h{‘key’}) && do {statements } to avoid using if statement
%data10 = ('John Paul' => 45, 'Lisa' => 30, 'Kumar' => 40);
if( exists($data10{'Lisa'} ) ) {
print "Lisa is $data10{'Lisa'} years old\n";
} else {
print "I don't know age of Lisa\n";
}
@keys = keys %data10;
$size = @keys;
print "1 - Hash size: is $size\n";
Working with Hashes
Inverted Hash:
A hash would be a natural structure for a phonebook application,
with the name as and the associated phone number as value.
Hashes map keys to values efficiently, and exists can be used to
check that there is an entry with a particular key. Suppose,
however that we want to do the reverse, to find out if a hash
contains an entry with a particular value, and to map it onto the
associated key.
Working with Hashes
%phones=('asd',9876,'wer',4556778888,'yul',34556676868,'rth',
556767688);
%by_number=reverse %phones;
$target=exists($by_number{9876})?$by_number{9876}:"NOT
FOUND";
print("$target");
Strings, Patterns and Regular expressions
Repetition counts:
In addition to the quantifiers *,? and +,explicit repetition counts can
be added to a component of a RE,
e.g /(wet[ ]){2}wet/ -matches ‘wet wet wet’.
{n} must occur exactly n times
{n,} must occur at least n times
{n,m} must occur at least n times but no more than m times.
IP address pattern can be written as
/([0-9]{1,3}\.){3}[0-9]{1,3}/
Strings, Patterns and Regular expressions
Anchors:
The use of ^ and $ to ‘anchor’ the match at the start or end of the target string .other
anchors can be specified as \b(word boundary) \B(not a word boundary).
Target string contains john and johnathon as space separated words,
/\b john/ will match with both target strings.
/\b john\b/ will only match with john
/\bjohn\B/ will match with johnathon.
Back References:
Round brackets serve another purpose besides grouping: they define a series of partial
matches that are 'remembered' for use in subsequent processing or in the regular
expression itself. Thus, in a regular expression, \ 1, \ 2 etc. denote the substring that
actually matched the first, second etc. sub-pattern, the numbering being determined by
the sequence of opening brackets.
Note:?: is used for rounding brackets to define a grouping with out remembering
the sub-string matches.
Strings, Patterns and Regular expressions
Pattern Matching:
Simple pattern matching operation for in the line of
Code print if /shazzam!/
We now recognize /Shazzam!/ as a pattern containing a regular
expression. Perl compares the pattern with the value of the
anonymous variable, $_, and returns true if the pattern given
matches a sub-string in that value, giving the desired effect.
Short-hand full form of a match operation m/Shazzam!/:this
expression in which a scalar context , as here ,returns a boolean
value recording success or failure of the matching operator.
If the m operator is present we can use any character as the pattern
delimiter ,e.g. print if m/Shazzam!/
Strings, Patterns and Regular expressions
The operation of the pattern match operator can be modified by adding trailing qualifiers,
thus:
m/ /i - Ignore case when pattern matching
m/ /g -find all occurrences. In a list context it returns a list of all the sub-strings
matched by all the bracketed sections of the regular expression. In a scalar context it
iterates through the target string returning true whenever it finds a match, and
false when runs out of matches.
m/ /m -Treat a target string containing newline characters as multiple lines. In this
case ,the anchors ^ and $ are the start and end of line:\A ad \Z anchor to the start
and end of the string ,respectively.
m/ /s -Treat a target string containing newline characters as a single string, i.e dot
matches any character including newline.
m/ /x -Ignore whitespace characters in the regular expression unless they occur in a
character class ,or are escaped with a backslash.
m//o -compile expression once only.
Strings, Patterns and Regular expressions
Substitution:
s/pattern/subst/- The substitution operator checks for a
match between the pattern and the value held $_,and if
match is found the matching sub-string in $_ is replaced by
the string subst.
while(<STDIN>){s/(^d{4}[ ])(.*$)/$2 $1/;
print;
}
$a=~ s/pattern/subst/
Strings, Patterns and Regular expressions
Substitution modifiers:
The i, g, m, 0, s and x modifiers work with the substitution operator
in the same way as they do for the match operator.
In addition, the substitution operator has an additional, very
powerful modifier, e.
This modifier signifies that the substitution string is an expression
that is to be evaluated at run-time if the pattern match is successful,
to generate a new substitution string dynamically.
For example, if the target string contains one or more sequences of
decimal digits, the following substitution operation will treat each
digit string as an and add 1 to it:
s/\d+/$&+l/eg;
Strings, Patterns and Regular expressions
Character translation: tr
The syntax of the tr operator is
tr/original/replacement/
$var = ~ tr/original/replacement/
As should be obvious by now, the first form works on $_. In its simplest form,
original and replacement are sequences of characters of the same length.
The tr operator scans its target string from left to right, replacing occurrences
of characters in original by the corresponding character in replacement:
characters not included in original are left unchanged. Thus, for example
$line =~ tr/A-Z/a-z/;
forces the string in $line into lower case, leaving non-alphabetic characters
unchanged. If replacement is shorter than original, its last character is
replicated as many times as is necessary to fill it out.
Strings, Patterns and Regular expressions
sub foobar {
…..
return value;
}
Subroutines
Calling subroutines:
Perl subroutine are better captured by the 'textual substitution' model. If foobar is
defined as a subroutine it can be called without arguments by
&foobar;
or equivalently (remember that there's more than one way to do it)
&foobar( );
The amperstand identifies foobar explicitly as the name of a subroutine, so this
form of call can be used even if the subroutine definition occurs later in the
script. If the subroutine has been defined earlier, the ampersand can be omitted: it
is common to provide a forward declarations of subroutines that are defined later
in a script, so that the amperstand hardly ever needs to be used.
A forward declaration takes the form
sub foobar;
i.e. it is a declaration without a subroutine body.
Subrouitnes
sub Hello {
print "Hello, World!\n";
}
# Function call
Hello();
Subroutines
Subroutine arguments:
If a subroutine expects arguments, the call takes the form
&foobar(argl, arg2);
or in the likely case that the subroutine is declared, we can omit the
amperstand
foobar(argl, arg2);
The Perl subroutine model is based on the premise that subroutines
typically variadic, i.e. have a variable number of arguments, unlike
conventional in which the number and type of the arguments are defined
as part of the subroutine declaration.
A subroutine expects to find its arguments as a flat list of scalars in the
anonymous array @_: they can be accessed in the body as $_ [0 ], $_
[1 ] etc.
Subroutines
# Function definition
sub Average {
# get total number of arguments passed.
$n = scalar(@_);
$sum = 0;
# Function call
Average(10, 20, 30);
Subroutines
sub sl {
my $foo; my $bar;
my ($c , $m, $y, $k) ;
}
The my declaration can also be used to initialize the local
variables, e.g.
sub s2 {
my ($red, $green, $blue)=(255, 127, 0);
}
Subroutines
# Function definition
sub PrintList {
my @list = @_;
print "Given list is @list\n";
}
$a = 10;
@b = (1, 2, 3, 4);
# Function definition
sub PrintHash {
my (%hash) = @_;
# Function definition
sub Average {
# get total number of arguments passed.
$n = scalar(@_);
$sum = 0;
return $average;
}
# Function call
$num = Average(10, 20, 30);
print "Average for the given numbers : $num\n";