An experiment-driven guide to Perl
An experiment-driven guide to Perl
This guide surveys the syntax and semantics of Perl by launching almost
200 probes (in the form of tiny scripts) at the interpreter.
These probes forensically recover the meaning of and find dark corners in:
variables;
procedures, procedure calls and context;
references;
input and output;
statements;
expressions and operators;
scope;
quote operators (like strings and regex);
eval and “exceptions”;
packages;
objects; and
special variables.
Disclaimer: This article is not a guide on how to write good, idiomatic Perl.
Just because you can do what I’ve done below does not mean you should.
Statements are commands that conduct computation, side effects and I/O:
For instance, we can import the Math::Trig package with use to gain access
to the sin function and the constant pi:
use Math::Trig ;
$x = sin(pi/2) ;
sub add {
return $_[0] + $_[1] ;
}
#!/usr/bin/perl
print "Hello, world!\n"
#!/usr/bin/env perl
print "Hello, world!\n"
You should emerge from this article with a strong understanding of the
syntax and the semantics of Perl.
What you will not get from this article is mastery of Perl’s idioms and
libraries.
If you want to learn idioms and libraries, I strongly recommend the three-
book series Learning Perl, Intermediate Perl and Mastering Perl.
If you want to understand a blob of Perl code, this article can help you
unravel its meaning.
If you want to write clean, maintainable Perl code, you must master its
idioms and its libraries as well.
use strict;
use warnings;
Comments
A code comment in Perl begins with a hash # and extends to the end of the
line.
# This is a comment.
print "This is not a comment." ; # But this is.
=begin comment
It is a comment.
=end comment
=cut
Variables
It is a close enough approximation of the truth to say that Perl has several
types of variables.
Scalar variables hold basic values like numbers and strings (and references
to other, possibly complex, values).
Scalar variables
$foo = 3 ;
$string = "hello" :
print $foo ; # prints 3
print $string ; # prints hello
Constants
Constants have no prefix, and they should only be defined once, with the
form use constant-name => value ;
In fact, because constants are resolved at compile-time, they take effect even
if the block in which they are defined fails to execute:
if (0) {
use constant E => 2.17 ;
}
print E ; # prints 2.17
(If we look under the hood, constants aren’t even really constants: they’re
functions that take no arguments. PI() and &PI both work.)
Array variables
Arrays use the prefix @, and arrays contain sequences of scalar values:
@bar = (1,2,3) ;
print @bar ; # prints 123
print "@bar" ; # prints 1 2 3
print $bar ; # prints nothing, since $bar is undefined
The familiar [] subscript notation accesses and modifies array elements, but
with the prefix $:
@arr = ("foo","bar","baz");
print $arr[1] ; # prints bar
print $arr[2] ; # prints baz
$arr[1] = "bit" ;
print @arr ; # prints foobitbaz
Contrary to what one might expect, array variables “contain” the entire
array, not a pointer or reference to the array. As a result, copying one array
variable into another copies the entire array:
@a = (1, 2, 3);
@b = @a ;
$b[1] = -2 ;
Hash variables
The prefix % denotes a hash variable:
$hash{"bar"} = 20 ;
print $hash{"bar"} ; # prints 20 ;
With the => operator, if the left-hand operand is a bare identifier, it gets
treated as a string:
$b{"foo"} = 20 ;
Since keys to hashes must be strings, barewords supplied as hash keys will
be turned into strings even in the index position:
Slices
Arrays can be sliced by giving them a list of indices:
@foo = (0,10,20,30,40,50) ;
@bar = @foo[2,3] ;
print @bar ; # prints 20, 30
@foo[2,5] = (-20,-50) ;
print @foo ; # prints 0, 10, -20, 30, 40, -50
@alphabet = @foo{alpha,beta} ;
print @alphabet;
Procedure variables
Technically, procedure variables have the prefix &, although the prefix is not
always necessary in modern Perl:
sub foo {
print "hello" ;
} ;
Sigils as operators
[Warning: This section is going to going to poke into the guts of Perl. You
can write modern Perl quite well without understanding this section. You
should probably skip it for now.]
Scalars, arrays, hashes and procedures with the same identifier act like
distinct variables:
$same = 42 ;
@same = (1, 2, 3) ;
%same = (foo => 1, bar => 2) ;
sub same { print "foo" } ;
However, under the hood, they all share a common symbol table entry.
The bareword represents a symbol table entry, and the sigil specifies how to
access that entry.
In fact, the sigil is not even lexically part of the variable name; it may be
separated by whitespace:
print $
x ; # prints 10
@ foo = (10,20) ;
print @
foo ; # prints 10, then 20
One could argue that there is only one variable type in Perl – the bareword –
and the sigil is an operator that acts on the location represented by a bare
word.
If one were programming in C, one might specify a symbol table entry as:
struct entry {
scalar_t scalar ;
array_t array ;
hash_t hash ;
proc_t proc ;
} ;
Under this interpretation, the sigils dereference individual fields; that is, $
word is kind of like word->scalar and @ word is kind of like word->array.
When Perl looks up a variable like $foo, it must first look up the string foo in
the current environment (something like a hash table) to get the address of
the symbol table entry for foo.
If env is the hash table that maps bare words to their addresses, then looking
up $foo is really a hash table look-up followed by a field dereference:
hash_get(env,"foo")->scalar
Under the interpretation that a bareword is (ultimately) a string that will get
looked up in a hash table to get an address, one wonders if a sigil applied to
a Perl string will look up the address for that string and access as
appropriate.
$x = 10 ;
print $x ; # prints 10
$i = "x" ;
$$i = 20 ;
print $x ; # prints 20
And, sigils can be used with a circumfix syntax to avoid the extra
indirection:
${"x"} = 10 ;
print $x ; # prints 10
At this point, it should be clear the Perl variable names may contain spaces:
${"foo bar"} = 10 ;
print ${"foo bar"} ; # prints 10
Typeglobs
The rarely used sixth variable “type,” the typeglob (sigil *), represents the
entire symbol table entry for a variable.
Typeglobs can create aliases in the symbol table and expose this
implementation detail:
$same = 42 ;
@same = (1, 2, 3) ;
%same = (foo => 1, bar => 2) ;
sub same { print "foo" } ;
*different = *same ;
$different[1] = -2 ;
Modern Perl has proper references, so these sorts of tricks are mostly
unnecessary.
Contexts
Perl is relatively unique in its use of “context” to determine how to evaluate
an expression.
1. scalar
2. list
3. void
(It’s tempting to call the “list context” the “array context,” and Perl even
promotes this confusion by calling the context discriminator wantarray
instead of wantlist. However, there are important distinctions between
arrays and lists.)
Using an array in a context that expects a scalar will yield the size of the
array:
@bar = ("foo","bar","baz") ;
$barsize = @bar ; # @bar turns into list before assignment
print $barsize ; # prints 3
print scalar @bar ; # prints 3
print $bar ; # prints nothing
Using a scalar in a context that expects a list will create a single-element list
with just that value:
@bar = (10,20,30) ;
That list is then immediately assigned to the array @bar, which converts it to
an array with three elements: 10, 20 and 30.
In Perl, the comma operator , has very different interpretations under scalar
and list contexts.
In a list context, the comma operator appends its two arguments together
(each being evaluated in the list context):
This leads to counterintuitive behavior, as inner lists are flattened into the
outer list:
@bar = (10,20) ;
print scalar @bar ; # prints 2 -- the length of @bar
$bar = ("foo","baz") ;
print $bar ; # prints baz (not 2)
When multiple values are assigned, the right-hand side is in list context:
@coords = (10,20,30) ;
($x,$y,$z) = @coords ;
print $x, $y, $z ; # prints 10, then 20, then 30
@long = (1,2,3,4,5,6) ;
($x,@rest) = @long ;
print $x ; # prints 1
print @rest ; # prints 23456
($x,@rest,@oops) = @long ;
print $x ; # prints 1
print @rest ; # prints 23456
print @oops ; # prints nothing
List context appears in more places than one might expect, which means
that many commas are not part of the syntax of the construct, but are really
just operators.
@indices = (2,4) ;
@values = (0,10,20,30,40,50,60) ;
@slice = @values[1,@indices] ; # grabs indices 1,2,4
References
A reference is a scalar value that contains the memory address of an object.
$s = "I'm a scalar." ;
@a = ("A", "Hash") ;
%h = (foo => 42, bar => 1702) ;
$sref = \$s ;
$aref = \@a ;
$href = \%h ;
The hexadecimal value that prints next to the type of the reference is the
memory address of the referenced value.
Perl can also create anonymous references, references for which the
referenced value does not correspond to a named variable.
$b = [1,2,3] ;
print $b ; # prints ARRAY(0xAddr)
print $$b[1] ; # prints 2
print ${$b}[1] ; # prints 2
Since references are scalars, it is possible to have arrays that contain arrays:
$matrix = [ [ 1, 0, 0 ],
[ 0, 1, 0 ],
[ 0, 0, 1 ] ] ;
@array = (10,20,30) ;
$aref = \@array ;
@array->[2] = 40 ; # prints 40
print $aref->[2] ; # prints 40
The argument supplied to both [] and {} are actually in list context, which
means that the usual rules for expansion into a list apply:
Typeglob references
$tgref = \*foo ;
*baz = *$tgref ;
$baz = 100 ;
@baz = (2,3) ;
Procedures
Defining procedures in Perl is terse. (Perl calls procedures subroutines.) In
the simplest case, a procedure definition is the sub keyword, an identifier
and a block of code – sub procedure-name { code }:
sub my_procedure {
print "I'm a procedure!" ;
}
sub foo {
print "foo: @_" ;
}
sub bar {
print $_[1] ;
}
bar 1, 2, 3 ; # prints 2
bar (1,2,3) ; # prints 2
bar ; # prints nothing
sub print9 {
print $_[0] ;
print $_[1] ;
print $_[2] ;
print $_[3] ;
print $_[4] ;
print $_[5] ;
print $_[6] ;
print $_[7] ;
print $_[8] ;
}
The comma operator (,) can mean cons, append, flatten all at once.
sub print3 {
print $_[0], $_[1], $_[2] ;
}
@args = (1,2,3) ;
print3 @args ;
print3 (@args) ;
@arglets = (1,2) ;
print3 @args,3 ;
print3 (@args,3) ;
(Actually, print3() and &print3() could differ; &print3() would ignore the
prototype, if there is one, as discussed below.)
The return keyword exits the current procedure and returns the value it
received:
sub one {
return 1 ;
}
sub two {
2
}
Arguments to procedures
Once again, arguments to procedures are passed implicitly via the @_ array.
sub print_args {
print @_ ;
}
sub call_print_args {
&print_args ;
}
(Procedures called with & also ignore the prototype, as explained below.)
sub sum {
my ($a, $b) = @_ ;
return $a + $b ;
}
Perl novices often don’t realize that arguments in Perl are implicitly passed
by alias: modifications to the inputs to a procedure will be seen by the caller
of that procedure.
That is, the arguments array @_ contains aliases to the input values:
$x = 3 ;
@a = (4,5,6) ;
sub mod_args {
$_[0] = 42 ;
$_[2] = 17 ;
}
mod_args $x, @a ;
sub mod_args {
$_[0] = 42 ;
$_[2] = 17 ;
}
@a = (7,8,9) ;
%h = ( "foo" => 42 ) ;
sub sum {
return $_[0] + $_[1] ;
}
$myprod = sub {
return $_[0] * $_[1] ;
} ;
The -> operator provides a more convenient syntax for invoking anonymous
procedures:
$anon = sub {
print $_[0] ;
} ;
foo (bar 1, 2 , 3)
foo((bar(1, 2, 3)))
or to:
foo((bar(1)), 2, 3)
Before Perl can evaluate (or sometimes even parse) an expression, it must
know the contexts of that expression.
foo(bar())
2. How does the procedure know the context of its return value?
localtime()
The prototype precedes the body block; the declaration form for a procedure
with a prototype is sub procedure-name ( prototype ) { body }
It is necessary to predeclare so that the Perl parser can correctly parse calls
to this procedure.
It seems that the procedure call still flattened out the arrays (and hashes)
when making the call.
Trying to use that argument as a hash, or even the whole input as a hash,
will not work:
To use the provided list as a hash, one must re-interpret the arguments in @_
as a hash:
The specifier & expects to receive a reference to code, but if the first
argument is a literal block of code, it creates an anonymous procedure for it
on the fly:
sub print_me {
print "me" ;
}
Unfortunately, code blocks (withouth sub) are only accepted as the very first
parameter:
@a1 = (1,2,3) ;
@b1 = (4,5,6) ;
$scalar = 3 ;
@array = (10,20,30) ;
%hash = ("foo" => 42, "bar" => 13) ;
Return context
When inside a procedure, the oddly-titled primitive wantarray determines if
the context to which the procedure is returning expects a scalar, a list or
nothing at all:
sub print_context () {
if (wantarray()) {
print "list context";
}
else {
print "void context";
}
}
This is how procedures like localtime can decide whether to return an array
or a scalar:
@a = localtime ;
$x = localtime ;
sub foo {
return (4,5,6) ;
}
$x = foo() ;
@a = foo() ;
Ignoring prototypes
sub f ($$) {
print @_ ;
}
$f = sub ($$) {
print @_ ;
};
$f->((1,2),(3,4)) ; # prints 1, 2, 3, 4
From this test, it seems that prototype information is not stored with the
procedure itself, but rather, it is information associated with a specific
procedure name, and available only during parsing.
Every read from a filehandle implicitly assigns the input to the default
variable, $_.
<STDIN> ;
$_ = <STDIN> ;
<STDIN> ;
print $_ ; # prints the first line of user input
In fact, print uses the default $_ if no arguments are given, so the following
program works as well:
The open operator can establish new filehandles; close closes them.
while (<F>) {
print ;
} # prints contents of io.pl
while (<$fh>) {
print ;
} # prints contents of tmp.txt
close $fh ;
sub pass_handle {
print "file handle: " . $_[0] . "\n" ;
}
open F, "<tmp.txt" ;
pass_handle *F ; # prints file handle: *main::F
pass_handle F ; # error
close F ;
*FH = $_[0] ;
while (<FH>) {
print ;
}
}
open F, "<tmp.txt" ;
pass_handle2 F; # prints file handle: F, contents of tmp.txt
close F ;
The print command is special, in that it can take a filehandle before it takes
any parameters:
while (<$tmp>) {
print STDOUT $_ ;
} # prints Testing to STDOUT
By default, print and write send to STDOUT, but you can change the default
with select:
When opening a file, the second argument to open determines the mode in
which it is opened:
open FH, ">file" opens a file for writing, and will replace contents.
This is not meant as a tutorial on the library, but open also has a three-
argument form:
open FH, ">", "file" opens a file for writing, and will replace
contents.
open FH, ">>", "file" opens a file for appending.
Statements
So far, we have used statements and appealed to intuition as to their
meaning and structure.
if statements
$count = 19 ;
$foo = 20 ;
print "big" if $foo > 10 ; # prints big
print "small" if $foo <= 10 ; # prints nothing
$age = 22 ;
while statements
The condition is checked and the body is evaluated repeatedly until the
condition is false:
$count = 10 ;
while ($count > 0) {
print $count ;
$count = $count - 1 ;
} # prints 10 through 1
The expression next will advance to the next iteration of the innermost loop:
$count = 10 ;
while (--$count > 0) {
next if $count % 2 == 0 ;
print $count ;
} # prints 9 7 5 3 1 ;
$count = 0 ;
while (1) {
print $count ;
$count++ ;
last if $count == 10 ;
} # prints 0 1 2 3 4 5 6 7 8 9
The expression redo will restart the current innermost loop but without re-
evaluating the condition:
$count = 1 ;
while ($count > 0) {
if ($count <= 0) {
print "impossible?" ;
last ;
}
$count-- ;
print $count ;
redo ;
} # prints 0, then impossible?
In Perl, while blocks can have a continue block attached. A continue block
always executes after the main body of the loop, but before the conditional:
$count = 10 ;
while ($count > 0) {
next if $count % 2 == 0 ;
print $count ;
} continue {
$count-- ;
} # prints 9 7 5 3 1
A next expression will jump into the continue block; a redo will not.
It is possible to label a loop in Perl, so that next, last and redo can choose to
which loop they refer:
$i = 0;
OUTER: while ($i < 6) {
$j = 0 ;
INNER: while ($j < 6) {
next OUTER if $i % 2 == 0 ;
next INNER if $j % 3 == 0 ;
print "$i:$j" ;
} continue {
$j++ ;
}
} continue {
$i++ ;
} ;
prints:
1:1
1:2
1:4
1:5
3:1
3:2
3:4
3:5
5:1
5:2
5:4
5:5
$i = 0;
print $i while ($i++ < 4) ; # prints 1 through 4
If the intent is to run the block once before testing the condition, then the
do-while form applies:
$i = 0 ;
do { print $i } while ($i++ < 4) ; # prints 0 through 4
Blocks
Finally, next and last will actually work in any block, not just a while body:
{
print "This prints." ;
next ;
print "But this doesn't." ;
} # prints This prints.
{
print "This prints." ;
last ;
print "But this doesn't." ;
} # prints This prints.
They seem to have identical behavior, except that regular blocks can have
continue blocks as well:
{
next ;
print "I won't print." ;
} continue {
print "But, I will." ;
} # prints But, I will.
{
last ;
print "I won't print." ;
} continue {
print "Neither will this." ;
} # prints nothing
OUTER: {
INNER: {
next OUTER ;
} continue {
print "This won't print." ;
}
} # prints nothing
for statements
Perl has traditional C-style for statements of the form for ( initializer ; test ;
increment ) { body }:
foreach statements
The foreach form in Perl allows iteration over individual elements in arrays:
@array = ("foo","bar","baz") ;
@array = ("foo","bar","baz") ;
Leaving off the variable for iteration will bind each element to the default
variable, $_:
@array = (4,5,6) ;
foreach (@array) {
print $_ ;
} # prints 4 through 6
$n = 4 ;
$a = 1 ;
TOP:
$a = $a * $n ;
$n = $n - 1 ;
goto TOP if $n >= 1 ;
print $a ; # prints 24
Labels are resolved at run-time in Perl, which means that computed strings
can be used to jump off to a label:
$fi = "FI" ;
$rst = "RST" ;
$first = "$fi$rst";
goto $first ;
FIRST: print "foo" ; goto DONE ;
SECOND: print "bar" ;
DONE: {} ;
# Program prints foo
The goto form in Perl is also used to perform tail call jumps.
The expression goto &proc will (effectively) return from the current
procedure and immediately invoke proc in its place:
sub proc1 {
goto &proc2 ; # tail call to proc2
}
sub proc2 {
return 42 ;
}
Expressions
The simplest expressions in Perl are literals: string constants like 'foo' and
numeric constants like 3 or 3.14.
Most other expressions types are constructed from binary, unary or ternary
operators.
print 10 + 20 ; # prints 30
print 10 - 20 ; # prints -10
print 10 / 20 ; # prints 0.5
print 10 * 20 ; # prints 200
$foo = 13 ;
print $foo++ ; # prints 13
print $foo ; # prints 14
print ++$foo ; # prints 15
print $foo ; # prints 15
The word forms of each operator act identically, but have the lowest possible
precedence.
Perl allows C-like bitwise and bitshift operators – &, |, ~, << and >> – as well,
but caution should be taken when using them, since their interpretation
changes depending on whether use integer or use bigint are in effect.
To a get a sense of how these operators work, we can use printf to print
binary:
$a = 23 ;
$b = 71 ;
printf "%b\n", $a ; # prints 10111
printf "%b\n", $b ; # prints 1000111
$name = "Alice" ;
print ($name eq "Alice" ? 1 : 2) ; # prints 1
$rrr = "r" x 10 ;
print $rrr ; # prints rrrrrrrrrr
@rrr = ("r") x 10 ;
print "@rrr" ; # prints r r r r r r r r r r
@rrr = "r" x 10 ;
print "@rrr" ; # prints rrrrrrrrrr
@nums = (1,2) x 5 ;
print "@nums" ; # prints 1 2 1 2 1 2 1 2 1 2
The parentheses on the left-hand side are required to force the list context.
@arr1 = (1,2,3) ;
@arr2 = (2,3) ;
@keys = ("foo") ;
%hash = (foo => 42, bar => 1701) ;
In a list context, the range operator ... produces an array starting with the
left-hand side and going up to the right-hand side:
@range = 3...6 ;
print "@range" ; # prints 3 4 5 6
In scalar context, the ... operator has a very different interpretation; the
scalar ... operator is meant to emulate the range behavior of awk and sed.
In a scalar context, lhs ... rhs will be false until lhs evaluates to true. Then,
it will be true until after rhs evaluates to false. Then, it will evaluate to false
and wait for lhs to be true again:
$i = 0 ;
while ($i < 10) {
print $i if ($i == 3) ... ($i == 7) ;
$i++ ;
} # prints 3 through 7
$toggle = true ;
$toggle = 1 ;
$i = 0 ;
while ($i < 10) {
print $i if ($toggle ? (($i == 3) ? !($toggle = 0) : 0)
: (($i == 7) ? ($toggle = 1) : 1)) ;
$i++ ;
} # prints 3 through 7
It appears that they are lexically scoped to the nearest procedure, which
leads to suprises:
sub proc {
my @a = (1,2) ;
my @b = (1,2,3,4) ;
if ($flip) {
print "\$b: 3 <= $b <= 11" ;
}
}
}
}
proc() ; # prints:
while (<>) {
print if 1 ... 10 ;
}
while (<>) {
print if ($. == 1) ... ($. == 10) ;
}
$i = 0 ;
while ($i < 10) {
print $i if ($i == 3) .. ($i == 3) ;
$i++ ;
} # prints only 3
$i = 0 ;
while ($i < 10) {
print $i if ($i == 3) ... ($i == 3) ;
$i++ ;
} # prints 3 through 9
Scope
Perl supports several scoping disciplines.
But, the keywords my and local can scope variables lexically and
dynamically.
Global scope
If a variable has no explicit scope, then it is globally scoped, and it is visible
to all blocks:
$g = 3.14 ;
{
$g = $g * 2 ;
}
sub mod_g {
$g = $g / 2 ;
}
mod_g ;
Lexical scoping
my $lexical_scalar ;
my ($lexical_scalar1,$lexical_scalar2) ;
my @lexical_array ;
my %lexical_hash ;
{
my $x = 3 ;
{
my $y = 10 ;
print $x ; # prints 3
}
print $y ; # prints nothing
}
Lexically scoped variables are also visible to procedures defined within the
block and anonymous procedures defined within the block.
$x = "global x" ;
{
my $x = "inner x" ;
$f = sub {
return $x ;
}
}
The operator my can actually appear anywhere within the block and it will
cause lexical scoping for the variable within that block, once it’s been
evaluated:
$x = 10 ;
{
$x = 3 ; # $x is global
my $x = 20 ;
print $x ; # prints 20
}
print $x; # prints 3
$x = 10 ;
{
goto SKIP;
BACK:
$x = 3 ; # by the time this hits, $x is lexical
last ;
SKIP:
my $x = 20 ;
print $x ; # prints 20
}
print $x; # prints 10
Unfortunately, it is not hard to extend the prior example into a proof that
the scope of a variable in Perl is (statically) undecidable in general.
$x = 10 ;
{
(my $x) = 20 ;
print $x ; # prints 20
}
print $x; # prints 10
@stack = (1,2,3) ;
$x = 1000 ;
{
half 10, (my $x) ;
print $x ; # prints 5
}
print $x ; # prints 1000
$x = 10 ;
foreach my $x (1,2,3) {
print $x ;
} # prints 1 through 3
print $x ; # prints 10
Dynamic scope
Dynamic scope could fairly be termed stack scope: when a local variable is
evaluated, the topmost stack frame with a binding of that variable provides
its value.
{
local $x = 3 ;
{
local $y = 10 ;
print $x ; # prints 3
}
print $y ; # prints nothing
}
sub get_x {
return $x ;
}
{
my $x = 10 ;
print get_x() ; # prints nothing
}
{
local $x = 10 ;
print get_x() ; # prints 10
}
If use feature "state" is in effect, then Perl also has a lexically scoped
variables that are initialized only once known as state variables.
sub inc_count() {
state $count = 0 ;
return ++$count ;
}
The following program illustrates the difference between the three scoping
disciplines:
$foo = 20 ;
sub print_foo() {
print $foo ;
}
lexical_foo() ; # prints 20
print_foo() ; # prints 20
dynamic_foo() ; # prints 40
print_foo() ; # prints 20
global_foo() ; # prints 60
print_foo() ; # prints 60
Most characters act as their own matching delimiter, but the balanced
delimiters <, (, { and [ match with >, ), } and ] respectively.
print q(This (and this) run.) ; # prints This (and this) run.
print q(This (and this fails.) ; # error
The advantage of quote operators is that ' and " do not have to be escaped
within them (unless of course, they were chosen as the delimiter character):
$pi = 3.14 ;
@a = ("of","a","circle") ;
print "$pi is the circumference @a over its diameter." ;
# prints 3.14 is the circumference of a circle over its diameter.
@array = (1,2,3) ;
{
local $" = '::' ;
print "@array" ; # prints 1::2::3
}
print "@array" ;
$string = "dog" ;
print qq(This is a $string.) ; # prints This is a dog.
print qq{This is a $string.} ; # prints This is a dog.
print qq[This is a $string.] ; # prints This is a dog.
print qq<This is a $string.> ; # prints This is a dog.
print qq|This is a $string.| ; # prints This is a dog.
print qq/This is a $string./ ; # prints This is a dog.
print qq#This is a $string.# ; # prints This is a dog.
print qq"This is a $string." ; # prints This is a dog.
print qq zThis is a $string.z ; # prints This is a dog.
$prefix = "bi" ;
@registries = (42,1701) ;
In an list context, it splits the output along newlines, unless the variable $/ is
set to a different separator:
@files = `ls` ;
foreach $file (@files) {
chomp $file ; # remove newline from end of $file
print "file: $file" ;
} # prints each file, but with file: first
{
local $/ = ':';
@last_user = `tail -1 /etc/passwd` ;
print "@last_user" ; # prints passwd entry for the last user,
# with space after each :
# looks like:
# robot: *: 239: 239: robot: /var/empty: /usr/bin/false
}
@files = qx|ls| ;
foreach $file (@files) {
chomp $file ; # remove newline from end of $file
print "file: $file" ;
} # prints each file, but with file: first
$passwd = '/etc/passwd';
$password_file = `cat $passwd` ;
print $password_file ; # Prints contents of /etc/passwd
But, using the qx quote operator with ' as a delimiter will not interpolate:
Quote words
For quickly creating an array of whitespace-separated words, the quote
operator qw is convenient:
Regular expressions
If you’re not familiar, with regular expressions you will want to read my
guide to regular expressions.
$_ = "foobar" ;
print "yes" if /foo/ ; # prints yes
print "yes" if /bar/ ; # prints yes
print "no" if /baz/ ; # does not print
$fb = "facebook" ;
print "yes" if $fb =~ /face/ ; # prints yes
print "yes" if $fb =~ /book/ ; # prints yes
print "no" if $fb =~ /apple/ ; # does not print
If the right-hand side is not a regular expression quote, then run-time value
of the expression is dynamically interpreted as a regular expression:
$fb = "facebook" ;
$face = "face" ;
$book = "bo+k" ;
$apple = "ap*le" ;
print "yes" if $fb =~ $face ; # prints yes
print "yes" if $fb =~ $book ; # prints yes
print "no" if $fb =~ $apple ; # does not print
$fb = "facebook" ;
print "yes" if $fb =~ m(face) ; # prints yes
print "yes" if $fb =~ m|book| ; # prints yes
print "no" if $fb =~ m"apple" ; # does not print
$fb = "facebook" ;
$face = qr{face} ;
$book = qr|bo+k| ;
print "yes" if $fb =~ $face ; # prints yes
print "yes" if $fb =~ $book ; # prints yes
print "no" if $fb =~ qr/ap*le/ ; # does not print
$rx = "foo|bar" ;
The nth leftmost parenthesis denotes the nth submatch, and the variable $n
holds the nth submatch:
$in = "foobarrrrrrrrrrrrbaz" ;
$in =~ /bar*/ ;
print $& ; # prints barrrrrrrrrrrr
Regular expression quotes may be directly followed flags that modify both
the parsing and the behavior of the regular expressions.
The multiline modifier m modifies the behavior of the anchors ^ and $ so that
each can match where a linebreak happens:
$in = "foo\nbar\nbaz" ;
The “single line” modifier s changes the behavior of . so that it can match
newline:
$in = "foo\nbar" ;
print "no" if $in =~ /foo.bar/ ; # prints nothing
print "yes" if $in =~ /foo.bar/s ; # prints yes
The p modifier copies the string prior to the match, the matched string and
the string after the match into ${^PREMATCH}, ${^MATCH} and ${^POSTMATCH}
respectively:
$in = "fooBAR" ;
print "no" if $in =~ /foobar/ ; # prints nothing
print "yes" if $in =~ /foobar/i ; # prints yes
$in = "foobar" ;
print "no" if $in =~ /foo bar/ ; # prints nothing
print "yes" if $in =~ /foo bar/x ; # prints yes
$ipchunk = qr{(
[0-9] # 0 - 9
| [1-9][0-9] # 10 - 99
| 1[0-9][0-9] # 100 - 199
| 2[0-4][0-9] # 200 - 249
| 25[0-5] # 250 - 255
)}x ;
$in = "123,456,789";
@allmatches = ($in =~ /\d+/g) ;
print $allmatches[0] ; # prints 123
print $allmatches[1] ; # prints 456
print $allmatches[2] ; # prints 789
$in = "123,456,789";
while ($in =~ /(\d+)/g) {
print $1 ;
} # prints 123, then 456, then 789
The special pattern \G matches the last match point on a per-string basis.
The current procedure pos yields the current match point for string:
$in = "123,456,789";
print pos $in ; # prints nothing
$in =~ /(\d+)/g ;
print pos $in ; # prints 3
$in =~ /(\d+)/g ;
print pos $in ; # prints 7
$in =~ /(\d+)/g ;
print pos $in ; # prints 11
The procedure pos can also set the last match point for a string:
$in = "123,456,789";
$in =~ /(\d+)/g ;
print $1; # prints 123
$in =~ /(\d+)/g ;
print $1 ; # prints 456
pos($in) = 3 ;
$in =~ /(\d+)/g ;
print $1 ; # prints 456
Caution must be taken, because while the last-match position is held per-
string copying the string will reset it:
$in = "123,456,789";
$in =~ /(\d+)/g ;
print pos $in ; # prints 3
$inref = \$in ;
print pos ${$inref} ; # prints 3
$in2 = $in ;
print pos $in2 ; # prints nothing!
The modifier c, in conjunction with g, does not reset the match position for
the string on a failed match.
$in =~ /^/g ;
while (1) {
print "IF" if $in =~ /\Gif/gc ;
print "" if $in =~ /\G\s+/gc ;
print "PRINT" if $in =~ /\Gprint/gc ;
print "ID" if $in =~ /\G\w+/gc ;
print "LP" if $in =~ /\G\(/gc ;
print "RP" if $in =~ /\G\)/gc ;
print "LB" if $in =~ /\G\{/gc;
print "RB" if $in =~ /\G\}/gc ;
print "SEMI" if $in =~ /\G;/gc ;
last if $in =~ /\G$/gc ;
}
# prints
# IF
#
# LP
# ID
# RP
#
# LB
#
# PRINT
#
# SEMI
#
# RB
Substitution operators
Perl borrows and significantly extends sed’s substitution quote operator, s.
$in = "foo" ;
$in =~ s{foo}{bar} ;
print $in ; # prints bar
$_ = "foo" ;
s/foo/bar/ ;
print $_ ; # prints bar
$in = "foo" ;
$in =~ s/foo/bar/ ;
print $in ; # prints bar
The same flags that apply to the match quote operator m also work with
substitution:
$in = "This is a foo foo." ;
$in =~ s/foo/bar/ ;
print $in ; # prints This is a bar foo.
Since the s operator destroys its target string by default, it also accepts an r
modifier which causes it to (non-destructively) return the result instead:
$in = "foo" ;
$out = ($in =~ s/foo/bar/r) ;
print $in ; # prints foo
print $out ; # prints bar
For scoping purposes, the code run in eval runs in its own block:
my $foo ;
eval '$foo = 3;' ;
print $foo ; # prints 3
my $x = 42 ;
eval 'print $x;' # prints 42
If the code run by eval fails, then the failure does not terminate the script;
rather, it returns from the eval expression, and places the error in the
special variable $@.
The idiom for exception-handling is to eval a risky block of code, and then
to check if it called die by examining the value of $@ afterward.
sub fail_on_one {
if ($_[0] == 1) {
die("fail") ;
}
print "success" ;
}
eval {
fail_on_one 2 ; # prints success
} ;
if ($@) {
print "failure: " . $@ ; # does not print
}
eval {
fail_on_one 1 ; # does not print
} ;
if ($@) {
print "failure: " . $@ ; # prints fail at <file> line <number>.
}
In some sense, eval acts like try, die acts like throw and if ($@) acts like
catch.
# try evals the block, and then calls the handler for errors:
sub try (&$) {
my ($tryblock,$handler) = @_ ;
eval { &{$tryblock} } ;
if ($@) {
&{$handler}($@) ;
}
}
sub fail_on_one {
if ($_[0] == 1) {
die("fail") ;
}
print "success" ;
}
try {
fail_on_one 2 ; # prints success
}
catch {
print "caught: @_" ; # does not execute
} ;
try {
fail_on_one 1 ; # fails
}
catch {
print "caught: @_" ; # prints caught: ...
} ;
Packages
In Perl, the package keyword creates a package.
Typically, a Perl named name package would go into a file named name.pm:
# Foo.pm
package Foo;
1
A program can import a package name in file name.pm with require name ;
require Foo ;
# Foo.pm
package Foo;
sub import {
my ($package,%params) = @_ ;
print $package ;
print $params{'life'} ;
}
# main.pl
use Foo (life => 42, ship => 1701) ; # prints Foo, then 42
Perl also allows packages inlined within a file by placing all of the package
within a block:
package Bar {
}
print "Bar is imported." ; # prints Bar is imported.
package Bar {
my $hidden = 10 ;
our $foo = 20 ;
} ;
package Bar {
our $foo = 20 ;
sub proc {
print "visible: $foo" ;
}
} ;
Modules can export procedure names into the main module user’s
namespace as well. In order to do so, modules should use the base Exporter
module, and then specify the names of the procedures to export in our
@EXPORT:
# Baz.pm
package Baz;
sub my_proc {
print "My procedure!" ;
}
Be careful when using the Exporter package: it provides its own import
method to handle exporting.
use Baz ;
Objects
[Warning: Nothing in this article is idiomatic Perl, but this section is
especially unidiomatic in its rank abuse of bless and packages while
exposing the underlying semantics of objects.]
$o = {} ; # an anonymous hash
The -> operator will look for a procedure with the method’s name in the
namespace associated with the object.
If bless wasn’t given a namespace when the object was created, then ->
looks the default (global) namespace, known as main:
sub some_method {
print "called a method" ;
}
$a = bless {} ;
$a->some_method ; # prints called a method
$b = {} ;
$b->some_method ; # error: $b is not blessed
By creating a package and passing that to bless when the object is created,
method look-up happens in the package:
package Dog {
sub growl {
print "grrrrr" ;
}
package Dog {
sub growl {
print "grrrrr" ;
}
So, Perl’s object-oriented system has been grafted on top of its module
system. Packages do double duty as class definitions.
sub print_args {
print "@_" ;
}
$o = bless {} ;
sub set_x {
$_[0]->{"x"} = $_[1] ;
}
sub get_x {
return $_[0]{"x"} ;
}
$o = bless {} ;
$o->set_x(42) ;
Class inheritance in Perl is specified in the our @ISA variable for a package.
If a method isn’t on the blessed package, then it checks the packages in @ISA
for the method:
package Animal {
sub eat {
print "nom nom" ;
}
}
package Cat {
our @ISA = (Animal) ;
}
sub my_method {
print "called my_method" ;
}
$o = bless {} ;
$name = 'my_method' ;
package Ship {
sub new {
my ($class,@args) = @_ ;
$self->{'x'} = $args[0] ;
$self->{'y'} = $args[1] ;
return $self ;
}
sub print_position {
my ($self) = @_ ;
print "($self->{'x'},$self->{'y'})" ;
}
Special variables
Perl makes use of many special variables.
$_ : Default input/output
$_ is, by convention, the default input and output for many procedures and
operators (when none other is specified), including print, chomp, the regex
quote operators and many input operators.
@_ : Arguments to a procedure
sub proc {
my ($first,$second,$third) = @_ ;
print $second ;
}
proc 1, 2, 3 ; # prints 2
$" = "-" ;
@a = (1,2,3) ;
print @a ; # prints 123
print "@a" ; # prints 1-2-3
$$ : Current process id
$0 : Program name
As in many shell langugaes, $0 contains the program that was executed.
$; : Subscript separator
There is a convention in Perl that allows hashes to accept multiple keys in
order to simulate multidimensional arrays using hashes.
By default, when multiple keys are given, they are joined together with
concatenated as a single string to act as a key.
$; = ";" ;
%hash = () ;
$hash{0,0} = 1 ;
$hash{1,1} = 1 ;
$hash{2,2} = 1 ;
print $hash{'0;0'} ; # prints 1
Modifying the entries in %ENV will change the environment for newly created
child processes as well.
By default, $\ is empty, but if it is set, then this will print at the end of every
print command.
Normally, when reading from a filehandle with <>, it reads until a newline. If
$/ is set to something else, then it reads until the next instance of this string.
$, = "::" ;
print "foo", "bar", "baz" ; # prints foo::bar::baz
The variable $. holds the line number of the most recently accessed
filehandle.
What’s next?
The goal of this article was to provide an experimental understanding of
Perl’s syntax and its semantics.
Perl’s standard library contains many routines useful for common tasks,
particularly with respect to text and basic data structure manipulation.
New users to Perl should take the time to browse the standard library.
The CPAN repository contains most of them, and the cpan tool can
automatically download and install many of them.
When use strict and use warnings are in effect, many of the abuses I used
to poke at the internal workings of the Perl interpreter won’t work anymore
(or you’ll be warned), and it is generally considered good practice to
program with them in effect.
If you're serious about writing good Perl (and yes, you can), then bryan
d foy's (recently updated!) Mastering Perl is required reading: