Chapter 14. Perl - The Master Manipulator Introduciton
Chapter 14. Perl - The Master Manipulator Introduciton
com
Introduciton
The following sections tell you what Perl is, the variables and operators in perl, the string
handling functions. The chapter also discusses file handling in perl as also the lists, arrays
and associative arrays (hashes) that have made perl a popular scripting language. One or
two lines of code in perl accomplish many lines of code in a high level language. We
finally discuss writing subroutines in perl.
Objectives
perl preliminaries
The chop function
Variables and Operators
String handling functions
Specifying filenames in a command line
$_(Default Variable)
$. (Current Line Number) and .. (The Range Operator)
Lists and Arrays
ARGV[]: Command Line Arguments
foreach: Looping Through a List
split: Splitting into a List or Array
join: Joining a List
dec2bin.pl: Converting a Decimal Number to Binary
grep: Searching an Array for a Pattern
Associative Arrays
Regular Expressions and Substitution
File Handling
Subroutines
Conclusion
1. Perl preliminaries
Perl: Perl stands for Practical Extraction and Reporting Language. The language was
developed by Larry Wall. Perl is a popular programming language because of its
powerful pattern matching capabilities, rich library of functions for arrays, lists and file
handling. Perl is also a popular choice for developing CGI (Common Gateway Interface)
scripts on the www (World Wide Web).
Perl is a simple yet useful programming language that provides the convenience of shell
scripts and the power and flexibility of high-level programming languages. Perl programs
are interpreted and executed directly, just as shell scripts are; however, they also contain
control structures and operators similar to those found in the C programming language.
This gives you the ability to write useful programs in a very
short time.
1
http://thevtu.webs.com http://thevtu.wordpress.com
A perl program runs in a special interpretive model; the entire script is compiled
internally in memory before being executed. Script errors, if any, are generated before
execution. Unlike awk, printing isn’t perl’s default action. Like C, all perl statements end
with a semicolon. Perl statements can either be executed on command line with the –e
option or placed in .pl files. In Perl, anytime a # character is recognized, the rest of the
line is treated as a comment.
There are two ways of running a perl script. One is to assign execute (x) permission on
the script file and run it by specifying script filename (chmod +x filename). Other is to
use perl interpreter at the command line followed by the script name. In the second case,
we don’t have to use the interpreter line viz., #!/usr/bin/perl.
2
http://thevtu.webs.com http://thevtu.wordpress.com
3. If the first character of a string is not numeric, the entire string becomes
numerically equivalent to zero.
4. When Perl sees a string in the middle of an expression, it converts the string to an
integer. To do this, it starts at the left of the string and continues until it sees a
letter that is not a digit. Example: "12O34" is converted to the integer 12, not
12034.
Comparison Operators
Perl supports operators similar to C for performing numeric comparison. It also provides
operators for performing string comparison, unlike C where we have to use either
strcmp() or strcmpi() for string comparison. The are listed next.
The x operator (the letter x) makes n copies of a string, where n is the value of the right
operand:
Example:
$a = “R" x 5; # $a is now “RRRRR"
The .= operator combines the operations of string concatenation and assignment:
Example:
$a = “VTU";
$a .= “ Belgaum"; # $a is now “VTU Belgaum"
3
http://thevtu.webs.com http://thevtu.wordpress.com
substr(str,m,n) extracts a substring from a string str, m represents the starting point of
extraction and n indicates the number of characters to be extracted.
uc(str) converts all the letters of str into uppercase.
ucfirst(str) converts first letter of all leading words into uppercase.
reverse(str) reverses the characters contained in string str.
The following script will print all Gupta’s and Agarwal/Aggarwal’s contained in a file
(specified using an ERE) that is specified as a command line parameter along with the
script name.
#!/usr/bin/perl
printf(%30s”, “LIST OF EMPLOYEES\n”);
while(<>) {
print if /\bGupta|Ag+[ar][ar]wal/ ;
}
By default, any function that accepts a scalar variable can have its argument omitted. In
this case, Perl uses $_, which is the default scalar variable. chop, <> and pattern matching
operate on $_ by default, the reason why we did not specify it explicitly in the print
statement in the previous script. The $_ is an important variable, which makes the perl
script compact.
4
http://thevtu.webs.com http://thevtu.wordpress.com
In this case, a line is read from standard input and assigned to default variable $_, of
which the last character (in this case a \n) will be removed by the chop() function.
Note that you can reassign the value of $_, so that you can use the functions of perl
without specifying either $_ or any variable name as argument.
Arrays
5
http://thevtu.webs.com http://thevtu.wordpress.com
Perl allows you to store lists in special variables designed for that purpose. These
variables are called array variables. Note that arrays in perl need not contain similar type
of data. Also arrays in perl can dynamically grow or shrink at run time.
@array = (1, 2, 3); # Here, the list (1, 2, 3) is assigned to the array variable @array.
Perl uses @ and $ to distinguish array variables from scalar variables, the same name can
be used in an array variable and in a scalar variable:
$var = 1;
@var = (11, 27.1, "a string");
Here, the name var is used in both the scalar variable $var and the array variable @var.
These are two completely separate variables. You retrieve value of the scalar variable by
specifying $var, and of that of array at index 1 as $var[1] respectively.
Note that $ARGV[0], the first element of the @ARGV array variable, does not contain
the name of the program. This is a difference between Perl and C.
6
http://thevtu.webs.com http://thevtu.wordpress.com
The splice function can do everything that shift, pop, unshift and push can do. It uses
upto four arguments to add or remove elements at any location in the array. The second
argument is the offset from where the insertion or removal should begin. The third
argument represents the number of elements to be removed. If it is 0, elements have to be
added. The new replaced list is specified by the fourth argument (if present).
splice(@list, 5, 0, 6..8); # Adds at 6th location, list becomes 1 2 3 4 5 6 7 8 9
splice(@list, 0, 2); # Removes from beginning, list becomes 3 4 5 6 7 8 9
Example: To iterate through the command line arguments (that are specified as numbers)
and find their square roots,
foreach $number (@ARGV) {
print(“The square root of $number is ” .
sqrt($number) . “\n”);
}
You can even use the following code segment for performing the same task. Here note
the use of $_ as a default variable.
foreach (@ARGV) {
print(“The square root of $_ is “ . sqrt() . “\”);
}
Another Example
#!/usr/bin/perl
@list = ("This", "is", "a", "list", "of", "words");
print("Here are the words in the list: \n");
foreach $temp (@list) {
print("$temp ");
}
print("\n");
Here, the loop defined by the foreach statement executes once for each element in the list
@list. The resulting output is
Here are the words in the list:
This is a list of words
The current element of the list being used as the counter is stored in a special scalar
variable, which in this case is $temp. This variable is special because it is only defined
for the statements inside the foreach loop.
7
http://thevtu.webs.com http://thevtu.wordpress.com
split breaks up a line or expression into fields. These fields are assigned either to
variables or an array.
Syntax:
($var1, $var2, $var3 ….… ) = split(/sep/, str);
@arr = split(/sep/, str);
It splits the string str on the pattern sep. Here sep can be a regular expression or a literal
string. str is optional, and if absent, $_ is used as default. The fields resulting from the
split are assigned to a set of variables , or to an array.
8
http://thevtu.webs.com http://thevtu.wordpress.com
The output of the above script (assuming script name is dec2bin.pl) is,
$ dec2bin.pl 10
Binary form of 10 is 1010
$ dec2bin.pl 8 12 15 10
Binary form of 8 is 1000
Binary form of 12 is 1100
Binary form of 15 is 1111
Binary form of 10 is 1010
Normally, keys returns the key strings in a random sequence. To order the list
alphabetically, use sort function with keys.
1. foreach $key (sort(keys %region)) { # sorts on keys in the associative array, region
2. @key_list = reverse sort keys %region; # reverse sorts on keys in assoc. array, region
9
http://thevtu.webs.com http://thevtu.wordpress.com
perl supports different forms of regular expressions we have studied so far. It makes use
of the functions s and tr to perform substitution and translation respectively.
Here, the s prefix indicates that the pattern between the first / and the second is to be
replaced by the string between the second / and the third.
Here, any character matched by the first pattern is replaced by the corresponding
character in the second pattern.
10
http://thevtu.webs.com http://thevtu.wordpress.com
perl accepts the IRE and TRE used by grep and sed, except that the curly braces
and parenthesis are not escaped.
For example, to locate lines longer than 512 characters using IRE:
perl –ne ‘print if /.{513,}/’ filename # Note that we didn’t escape the curly braces
The following script demonstrates file handling in perl. This script copies the first three
lines of one file into another.
#!/usr/bin/perl
open(INFILE, “desig.dat”) || die(“Cannot open file”);
open(OUTFILE, “>desig_out.dat”);
while(<INFILE>) {
print OUTFILE if(1..3);
}
close(INFILE);
close(OUTFILE);
11
http://thevtu.webs.com http://thevtu.wordpress.com
19. Subroutines
The use of subroutines results in a modular program. We already know the advantages of
modular approach. (They are code reuse, ease of debugging and better readability).
Frequently used segments of code can be stored in separate sections, known as
subroutines. The general form of defining a subroutine in perl is:
sub procedure_name {
# Body of the subroutine
}
Example: The following is a routine to read a line of input from a file and break it into
words.
sub get_words {
$inputline = <>;
@words = split(/\s+/, $inputline);
}
Note: The subroutine name must start with a letter, and can then consist of any number of
letters, digits, and underscores. The name must not be a keyword.
Precede the name of the subroutine with & to tell perl to call the subroutine.
The following example uses the previous subroutine get_words to count the number of
occurrences of the word “the”.
#!/usr/bin/perl
$thecount = 0;
&get_words; Call the subroutine
12
http://thevtu.webs.com http://thevtu.wordpress.com
Return Values
In perl subroutines, the last value seen by the subroutine becomes the subroutine's return
value. That is the reason why we could refer to the array variable @words in the calling
routine.
Conclusion
Perl is a programming language that allows you to write programs that manipulate files,
strings, integers, and arrays quickly and easily. perl is a superset of grep, tr, sed, awk and
the shell. perl also has functions for inter- process communication. perl helps in
developing minimal code for performing complex tasks. The UNIX spirit lives in perl.
perl is popularly used as a CGI scripting language on the web.
13