NetworkProgrammingwithPerl PDF
NetworkProgrammingwithPerl PDF
Copyright................................................................................................................................ 1
Preface................................................................................................................................... 2
Acknowledgments................................................................................................................. 10
Part 1: Basics......................................................................................................................... 11
Chapter 1. Input/Output Basics..................................................................................................................................................................................................... 12
Perl and Networking.................................................................................................................................................................................................................. 12
Networking Made Easy.............................................................................................................................................................................................................. 13
Filehandles................................................................................................................................................................................................................................. 15
Using Object-Oriented Syntax with the IO::Handle and IO::File Modules............................................................................................................................. 30
Summary.................................................................................................................................................................................................................................... 35
Chapter 2. Processes, Pipes, and Signals...................................................................................................................................................................................... 36
Processes.................................................................................................................................................................................................................................... 36
Pipes........................................................................................................................................................................................................................................... 39
Signals........................................................................................................................................................................................................................................ 48
Summary.................................................................................................................................................................................................................................... 55
Chapter 3. Introduction to Berkeley Sockets................................................................................................................................................................................ 56
Clients, Servers, and Protocols.................................................................................................................................................................................................. 56
Berkeley Sockets........................................................................................................................................................................................................................ 59
Socket Addressing...................................................................................................................................................................................................................... 63
A Simple Network Client........................................................................................................................................................................................................... 67
Network Names and Services.................................................................................................................................................................................................... 69
Network Analysis Tools............................................................................................................................................................................................................. 73
Summary.................................................................................................................................................................................................................................... 76
Chapter 4. The TCP Protocol......................................................................................................................................................................................................... 77
A TCP Echo Client...................................................................................................................................................................................................................... 77
Socket Functions Related to Outgoing Connections................................................................................................................................................................. 79
A TCP Echo Server..................................................................................................................................................................................................................... 80
Adjusting Socket Options.......................................................................................................................................................................................................... 85
Other Socket-Related Functions............................................................................................................................................................................................... 88
Exceptional Conditions during TCP Communications............................................................................................................................................................. 89
Summary.................................................................................................................................................................................................................................... 91
Chapter 5. The IO::Socket API...................................................................................................................................................................................................... 92
Using IO::Socket........................................................................................................................................................................................................................ 92
IO::Socket Methods................................................................................................................................................................................................................... 94
More Practical Examples........................................................................................................................................................................................................... 99
Concurrent Clients................................................................................................................................................................................................................... 105
Summary................................................................................................................................................................................................................................... 111
Part 2: Developing Clients for Common Services................................................................. 112
Chapter 6. FTP and Telnet........................................................................................................................................................................................................... 113
Net::FTP.................................................................................................................................................................................................................................... 113
Net::Telnet................................................................................................................................................................................................................................ 123
Summary.................................................................................................................................................................................................................................. 136
Chapter 7. SMTP: Sending Mail................................................................................................................................................................................................... 137
Introduction to the Mail Modules............................................................................................................................................................................................ 137
Net::SMTP................................................................................................................................................................................................................................ 137
MailTools.................................................................................................................................................................................................................................. 143
MIME-Tools.............................................................................................................................................................................................................................. 151
Summary.................................................................................................................................................................................................................................. 170
Chapter 8. POP, IMAP, and NNTP.............................................................................................................................................................................................. 171
The Post Office Protocol........................................................................................................................................................................................................... 171
The IMAP Protocol................................................................................................................................................................................................................... 184
Internet News Clients............................................................................................................................................................................................................... 187
A News-to-Mail Gateway.......................................................................................................................................................................................................... 197
Summary.................................................................................................................................................................................................................................. 202
Chapter 9. Web Clients............................................................................................................................................................................................................... 203
Installing LWP......................................................................................................................................................................................................................... 203
LWP Basics.............................................................................................................................................................................................................................. 204
LWP Examples......................................................................................................................................................................................................................... 215
Part 3: Developing TCP Client/Server Systems.................................................................... 247
Chapter 10. Forking Servers and the inetd Daemon.................................................................................................................................................................. 248
Standard Techniques for Concurrency................................................................................................................................................................................... 248
Running Example: A Psychotherapist Server......................................................................................................................................................................... 250
The Psychotherapist as a Forking Server................................................................................................................................................................................. 251
A Client Script for the Psychotherapist Server........................................................................................................................................................................ 256
Daemonization on UNIX Systems........................................................................................................................................................................................... 259
Starting Network Servers Automatically................................................................................................................................................................................. 264
Using the inetd Super Daemon............................................................................................................................................................................................... 267
Summary.................................................................................................................................................................................................................................. 272
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Chapter 11. Multithreaded Applications..................................................................................................................................................................................... 273
About Threads.......................................................................................................................................................................................................................... 273
A Multithreaded Psychiatrist Server....................................................................................................................................................................................... 279
A Multithreaded Client............................................................................................................................................................................................................ 281
Summary.................................................................................................................................................................................................................................. 282
Chapter 12. Multiplexed Applications........................................................................................................................................................................................ 284
A Multiplexed Client................................................................................................................................................................................................................ 284
The IO::Select Module............................................................................................................................................................................................................. 286
A Multiplexed Psychiatrist Server........................................................................................................................................................................................... 289
Summary.................................................................................................................................................................................................................................. 295
Chapter 13. Nonblocking I/O...................................................................................................................................................................................................... 296
Creating Nonblocking I/O Handles........................................................................................................................................................................................ 296
Using Nonblocking Handles.................................................................................................................................................................................................... 298
Using Nonblocking Handles with Line-Oriented I/O............................................................................................................................................................ 300
A Generic Nonblocking I/O Module....................................................................................................................................................................................... 305
Nonblocking Connects and Accepts........................................................................................................................................................................................ 322
Summary.................................................................................................................................................................................................................................. 332
Chapter 14. Bulletproofing Servers............................................................................................................................................................................................. 333
Using the System Log.............................................................................................................................................................................................................. 333
Setting User Privileges............................................................................................................................................................................................................. 346
Taint Mode............................................................................................................................................................................................................................... 350
Using chroot().......................................................................................................................................................................................................................... 353
Handling HUP and Other Signals........................................................................................................................................................................................... 355
Summary.................................................................................................................................................................................................................................. 361
Chapter 15. Preforking and Prethreading................................................................................................................................................................................... 363
Preforking................................................................................................................................................................................................................................ 363
Prethreading............................................................................................................................................................................................................................ 387
Performance Measures............................................................................................................................................................................................................ 395
Summary.................................................................................................................................................................................................................................. 395
Chapter 16. IO::Poll..................................................................................................................................................................................................................... 396
Using IO::Poll.......................................................................................................................................................................................................................... 396
IO::Poll Events......................................................................................................................................................................................................................... 397
IO::Poll Methods..................................................................................................................................................................................................................... 399
A Nonblocking TCP Client Using IO::Poll.............................................................................................................................................................................. 399
Summary.................................................................................................................................................................................................................................. 402
Part 4: Advanced Topics..................................................................................................... 403
Chapter 17. TCP Urgent Data...................................................................................................................................................................................................... 404
"Out-of-Band" Data and the Urgent Pointer.......................................................................................................................................................................... 404
Using TCP Urgent Data........................................................................................................................................................................................................... 406
The sockatmark() Function...................................................................................................................................................................................................... 411
A Travesty Server..................................................................................................................................................................................................................... 413
Summary.................................................................................................................................................................................................................................. 423
Chapter 18. The UDP Protocol.................................................................................................................................................................................................... 424
A Time of Day Client................................................................................................................................................................................................................ 424
Creating and Using UDP Sockets............................................................................................................................................................................................ 426
UDP Errors.............................................................................................................................................................................................................................. 428
Using UDP Sockets with IO::Socket........................................................................................................................................................................................ 429
Sending to Multiple Hosts....................................................................................................................................................................................................... 431
UDP Servers............................................................................................................................................................................................................................. 433
Increasing the Robustness of UDP Applications.................................................................................................................................................................... 435
Summary.................................................................................................................................................................................................................................. 442
Chapter 19. UDP Servers............................................................................................................................................................................................................. 443
An Internet Chat System......................................................................................................................................................................................................... 443
The Chat Client........................................................................................................................................................................................................................ 446
The Chat Server........................................................................................................................................................................................................................ 453
Detecting Dead Clients............................................................................................................................................................................................................ 463
Summary.................................................................................................................................................................................................................................. 468
Chapter 20. Broadcasting............................................................................................................................................................................................................ 470
Unicasting versus Broadcasting.............................................................................................................................................................................................. 470
Broadcasting Explained........................................................................................................................................................................................................... 471
Sending and Receiving Broadcasts.......................................................................................................................................................................................... 472
Broadcasting Without the Broadcast Address........................................................................................................................................................................ 475
Enhancing the Chat Client to Support Resource Discovery................................................................................................................................................... 484
Summary.................................................................................................................................................................................................................................. 486
Chapter 21. Multicasting............................................................................................................................................................................................................. 488
Multicast Basics....................................................................................................................................................................................................................... 488
Using Multicast........................................................................................................................................................................................................................ 493
Sample Multicast Applications............................................................................................................................................................................................... 500
Summary.................................................................................................................................................................................................................................. 513
Chapter 22. UNIX-Domain Sockets............................................................................................................................................................................................ 514
Using UNIX-Domain Sockets.................................................................................................................................................................................................. 514
A "Wrap" Server....................................................................................................................................................................................................................... 518
Using UNIX-Domain Sockets for Datagrams.......................................................................................................................................................................... 521
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Summary.................................................................................................................................................................................................................................. 525
Appendix A. Additonal Source Code.................................................................................... 526
Net::NetmaskLite (Chapter 3).................................................................................................................................................................................................... 526
PromptUtil.pm (Chapters 8 and 9)............................................................................................................................................................................................. 528
IO::LineBufferedSet (Chapter 13)............................................................................................................................................................................................... 530
IO::LineBufferedSessionData (Chapter 13)................................................................................................................................................................................ 532
DaemonDebug (Chapter 14)........................................................................................................................................................................................................ 536
Text::Travesty (Chapter 17)......................................................................................................................................................................................................... 538
mchat_client.pl (Chapter 21)...................................................................................................................................................................................................... 540
Appendix B. Perl Error Codes and Special Variables........................................................... 544
System Error Constants.............................................................................................................................................................................................................. 544
Magic Variables Affecting I/O..................................................................................................................................................................................................... 547
Other Perl Globals....................................................................................................................................................................................................................... 548
Appendix C. Internet Reference Tables............................................................................... 549
Assigned Port Numbers............................................................................................................................................................................................................... 549
Registered Port Numbers............................................................................................................................................................................................................ 574
Internet Multicast Addresses...................................................................................................................................................................................................... 588
Appendix D. Bibliography................................................................................................... 590
Perl Programming....................................................................................................................................................................................................................... 590
TCP/IP and Berkeley Sockets...................................................................................................................................................................................................... 591
Network Server Design................................................................................................................................................................................................................ 592
Multicasting................................................................................................................................................................................................................................. 592
Application-Level Protocols........................................................................................................................................................................................................ 593
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
xi
Copyright
Copyright Information
Copyright © 2001 by Addison-Wesley
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or
transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or oth-
erwise, without the prior consent of the publisher. Printed in the United States of America. Published
simultaneously in Canada.
The publisher offers discounts on this book when ordered in quantity for special sales. For more
information, please contact:
Visit us on the Web at http://www.awl. com/cseng/ [http://www.awl.com/cseng/]
Library of Congress Cataloging-in-Publication Data
Stein, Lincoln D.
Network programming with Perl / Lincoln Stein.
p. cm.
1. Perl (Computer program language). 2. Internet programming. I. Title.
QA76.73.P22 S73 2000
005.2'762--dc21
00-067574
Text printed on recycled paper.
1 2 3 4 5 6 7 8 9 10 - MA - 04 03 02 01 00
First printing, December 2000
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
xii
Preface
The network is everywhere. At the office, machines are wired together into local area networks, and
the local networks are interconnected via the Internet. At home, personal computers are intermit-
tently connected to the Internet or, increasingly, via "always-on" cable and DSL modems. New
Licensed by
wireless technologies, such as Bluetooth, promise to vastly expand the network realm, embracing
everything from cell phones to kitchen appliances.
Such an environment creates tremendous opportunities for innovation. Whole new classes of ap-
plications are now predicated on the availability of high-bandwidth, always-on connectivity. Inter-
Stjepan Maric
active games allow players from around the globe to compete on virtual playing fields and the instant
messaging protocols let them broadcast news of their triumphs to their friends. New peer-to-peer
systems, such as Napster and Gnutella, allow people to directly exchange MP3 audio files and other
types of digital content. The SETI@Home project takes advantage of idle time on the millions of
personal computers around the world to search for signs of extraterrestrial life in a vast collection
4218908
of cosmic noise.
The ubiquity of the network allows for more earthbound applications as well. With the right knowl-
edge, you can write a robot that will fetch and summarize prices from competitors' Web sites; a
script to page you when a certain stock drops below a specified level; a program to generate daily
management reports and send them off via e-mail; a server that centralizes some number-crunching
task on a single high-powered machine, or alternatively distributes that task among the multiple
nodes of a computer cluster.
Whether you are searching for the best price on a futon or for life in a distant galaxy, you'll need to
understand how network applications work in order to take full advantage of these opportunities.
You'll need a working understanding of the TCP/IP protocol—the common denominator for all In-
ternet-based communications and the most common protocol in use in local area networks as well.
You'll need to know how to connect to a remote program, to exchange data with that program, and
what to do when something goes wrong. To work with existing applications, such as Web servers,
you'll have to understand how the application-level protocols are built on top of TCP/IP, and how to
deal with common data exchange formats such as XML and MIME.
This book uses the Perl programming language to illustrate how to design and implement practical
network applications. Perl is an ideal language for network programming for a number of reasons.
First, like the rest of the language, Perl's networking facilities were designed to make the easy things
easy. It takes just two lines of code to open a network connection to a server somewhere on the
Internet and send it a message. A fully capable Web server can be written in a few dozen lines of
code.
Second, Perl's open architecture has encouraged many talented programmers to contribute to an
ever-expanding library of useful third-party modules. Many of these modules provide powerful in-
terfaces to common network applications. For example, after loading the LWP::Simple module, a
single function call allows you to fetch the contents of a remote Web page and store it in a variable.
Other third-party modules provide intuitive interfaces to e-mail, FTP, net news, and a variety of
network databases.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Preface xiii
Perl also provides impressive portability. Most of the applications developed in this book will run
without modification on UNIX machines, Windows boxes, Macintoshes, VMS systems, and OS/2.
However, the most compelling reason to choose Perl for network application development is that it
allows you to fully exploit the power of TCP/IP. Perl provides you with full access to the same low-
level networking calls that are available to C programs and other natively compiled languages. You
can create multicast applications, implement multiplexed servers, and design peer-to-peer systems.
Using Perl, you can rapidly prototype new networking applications and develop interfaces to existing
ones. Should you ever need to write a networking application in C or Java, you'll be delighted to
discover how much of the Perl API carries over into these languages.
Roadmap
This book is organized into four main parts: Basics, Developing Clients for Common Services, De-
veloping TCP Client/Server Systems, and Advanced Topics.
Part I, Basics, introduces the fundamentals of TCP/IP network communications.
• Chapters 1 and 2, Networking Basics and Processes, Pipes, and Signals, review Perl's functions
and variables for input and output, discusses the exceptions that can occur during I/O operations,
and uses the piped filehandle as the basis for introducing sockets. These chapters also review
Perl's process model, including signals and forking, and introduces Perl's object-oriented ex-
tensions.
• Chapter 3, Introduction to Berkeley Sockets, discusses the basics of Internet networking and
discusses IP addresses, network ports, and the principles of client/server applications. It then
turns to the Berkeley Socket API, which provides the programmer's interface to TCP/IP.
• Chapters 4 and 5, The TCP Protocol and The IO::Socket API and Simple TCP Applications,
show the basics of TCP, the networking protocol that provides reliable stream-oriented com-
munications. These chapters demonstrate how to create client and server applications and then
introduce examples that show the power of technique as well as some common roadblocks.
Part II, Developing Clients for Common Services, looks at a collection of the best third-party modules
that developers have contributed to the Comprehensive Perl Archive Network (CPAN).
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Preface xiv
• Chapter 6, FTP and Telnet, introduces modules that provide access to the FTP file-sharing
service, as well as to the flexible Net::Telnet module which allows you to create clients to access
all sorts of network services.
• E-mail is still the dominant application on the Internet, and Chapter 7, SMTP: Sending Mail,
introduces half of the equation. This chapter shows you how to create e-mail messages on the
fly, including binary attachments, and send them to their destinations.
• Chapter 8, POP, IMAP, and NNTP: Processing Mail and Netnews, covers the other half of e-
mail, explaining modules that make it possible to receive mail from mail drop systems and proc-
ess their contents, including binary attachments.
• Chapter 9, Web Clients, discusses the LWP module, which provides everything you need to talk
to Web servers, download and process HTML documents, and parse XML.
Part III, Developing TCP Client/Server Systems—the longest part of the book—discusses the al-
ternatives for designing TCP-based client/server systems. The major example used in these chap-
ters is an interactive psychotherapist server, based on Joseph Weizenbaum's classic Eliza program.
• Chapter 10, Forking Servers and the inetd Daemon, covers the common type of TCP server that
forks a new process to handle each incoming connection. This chapter also covers the UNIX
and Windows inetd daemons, which allow programs not specifically designed for networking to
act as servers.
• Chapter 11, Multithreaded Applications, explains Perl's experimental multithreaded API, and
shows how it can greatly simplify the design of TCP clients and servers.
• Chapters 12 and 13, Multiplexed Operations and Nonblocking I/O, discuss the select() call,
which enables an application to process multiple I/O streams concurrently without using multi-
processing or multithreading.
• Chapter 14, Bulletproofing Servers, discusses techniques for enhancing the reliability and main-
tainability of network servers. Among the topics are logging, signal handling, and exceptions, as
well as the important topic of network security.
• Chapter 15, Preforking and Prethreading, presents the forking and threading models discussed
in earlier chapters. These enhancements increase a server's ability to perform well under heavy
loads.
• Chapter 16, IO::Poll, discusses an alternative to select() available on UNIX platforms. This
module allows applications to multiplex multiple I/O streams using an API that some people find
more natural than select()'s.
Part IV, Advanced Topics, addresses techniques that are useful for specialized applications.
• Chapter 17, TCP Urgent Data, is devoted to TCP urgent or "out of band" data. This technique
is often used in highly interactive applications in which the user urgently needs to signal the
remote server.
• Chapters 18 and 19, The UDP Protocol and UDP Servers, introduce the User Datagram Pro-
tocol, which provides a lightweight, message-oriented communications service. Chapter 18 in-
troduces the protocol, and Chapter 19 shows how to design UDP servers. The major example
in this and the next two chapters contain a live online chat and messaging system written entirely
in Perl.
• Chapters 20 and 21, Broadcasting and Multicasting, extend the UDP discussion by showing how
to build one-to-all and one-to-many message broadcasting systems. In these chapters we ex-
tend the chat system to take advantage of automatic server discovery and multicasting.
• Chapter 22, UNIX-Domain Sockets, shows how to create lightweight communications channels
between processes on the same machine. This can be useful for specialized applications such
as loggers.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Preface xv
Cross-Platform Compatibility
More serious are the differences between implementations of Perl on various operating systems.
Perl started out on UNIX (and Linux) systems, but has been ported to many different operating
systems, including Microsoft Windows, the Macintosh, VMS, OS/2, Plan9, and others. A script writ-
ten for the Windows platform will run on UNIX or Macintosh without modifications.
The problem is that the I/O subsystem (the part of the system that manages input and output op-
erations) is the part that differs most dramatically from operating system to operating system. This
restricts the ability of Perl to make its I/O system completely portable. While Perl's basic I/O func-
tionality is identical from port to port, some of the more sophisticated operations are either missing
or behave significantly differently on non-UNIX platforms. This affects network programming, of
course, because networking is fundamentally about input and output.
In this book, Chapters 1 through 9 use generic networking calls that will run on all platforms. The
exception to this rule is the last example in Chapter 5, which calls a function that isn't implemented
on the Macintosh, fork(), and some of the introductory discussion in Chapter 2 of process man-
agement on UNIX systems. The techniques discussed in these chapters are all you need for the
vast majority of client programs, and are sufficient to get a simple server up and running. Chapters
10 through 22 deal with more advanced topics in server design. The table here shows whether the
features in the chapters are supported by UNIX, Windows, or the Macintosh ports of Perl.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Preface xvi
The nice thing is that the non-UNIX ports of Perl are improving rapidly, and there is a good chance
that new features will be available at the time you read this.
Installing Modules
Many of Perl's networking modules are preinstalled in the standard distribution. Others are third-
party modules that you must download and install from the Web. Most third-party modules are written
in pure Perl, but some, including several that are mentioned in this book, are written partly in C and
must be compiled before they can be used.
CPAN is a large Web-based collection of contributed Perl modules. You can get access to it via a
Web or FTP browser, or by using a command-line application built into Perl itself.
Once the archives are unpacked, you'll enter the newly created directory and give the perl Make-
file.PL, make, make test, and make install commands. These will build, test, and install the module.
% cd Digest-MD5-2.00
% perl Makefile.PL
Testing alignment requirements for U32...
Checking if your kit is complete...
Looks good
Writing Makefile for Digest::MD2
Writing Makefile for Digest::MD5
% make
mkdir ./blib
mkdir ./blib/lib
mkdir ./blib/lib/Digest
...
% make test
make[1]: Entering directory '/home/lstein/Digest-MD5-2.00/MD2'
make[1]: Leaving directory '/home/lstein/Digest-MD5-2.00/MD2'
PERL_DL_NONLAZY=1 /usr/local/bin/perl -I./blib/arch -I./blib/lib...
t/digest............ok
t/files.............ok
t/md5-aaa...........ok
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Preface xvii
t/md5...............ok
t/rfc2202...........ok
t/sha1..............skipping test on this platform
All tests successful.
Files=6, Tests=291, 1 secs ( 1.37 cusr 0.08 csys = 1.45 cpu)
% make install
make[1]: Entering directory '/home/lstein/Digest-MD5-2.00/MD2'
make[1]: Leaving directory '/home/lstein/Digest-MD5-2.00/MD2'
Installing /usr/local/lib/perl5/site_perl/i586-linux/./auto/Digest/MD5/MD5.so
Installing /usr/local/lib/perl5/site_perl/i586-linux/./auto/Digest/MD5/MD5.bs
...
On UNIX systems, you may need superuser privileges to perform the final step. If you don't have
such privileges, you can install the modules in your home directory. At the perl Makefile.PL step,
provide a PREFIX= argument with the path of your home directory. For example, assuming your
home directory can be found at /home/jdoe, you would type:
% perl Makefile.PL PREFIX=/home/jdoe
The rest of the install procedure is identical to what was shown earlier.
If you are using a custom install directory, you must tell Perl to look in this directory for installed
modules. One way to do this is to add the name of the directory to the environment variable
PERL5LIB. For example:
setenv PERL5LIB /home/jdoe # C shell
PERL5LIB=/home/jdoe; export PERL5LIB # bourne shell
Another way is to place the following line at the top of each script that uses an installed module.
use lib '/home/jdoe';
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Preface xviii
Online Documentation
In addition to books and Web sites, Network Programming with Perl refers to two major sources of
online information, Internet RFCs and Perl POD documentation.
Internet RFCs
The specifications of all the fundamental protocols of the Internet are described in a series of Re-
quests for Comment (RFC) submitted to the Internet Engineering Task Force (IETF). These docu-
ments are numbered sequentially. For example RFC 1927—"Suggested Additional MIME Types for
Associating Documents"—was the 1927th RFC submitted. Some of these RFCs eventually become
Internet Standards, in which case they are given sequentially numbered STD names. However,
most of them remain RFCs. Even though the RFCs are unofficial, they are the references that people
use to learn the details of networking protocols and to validate that a particular implementation is
correct.
The RFC archives are mirrored at many locations on the Internet, and maintained in searchable
form by several organizations. One of the best archives is maintained at http://www.faqs.org/rfcs/.
To retrieve an RFC from this site, go to the indicated page and type the number of the desired RFC
in the text field labeled "Display the document by number." The document will be delivered in a
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Preface xix
minimally HTMLized form. This page also allows you to search for standards documents, and to
search the archive by keywords and phrases. If you prefer a text-only form, the http://
www.faqs.org site contains a link to their FTP site, where you can find and download the RFCs in
their original form.
This will give you a list of other POD pages that you can display.
For a quick summary of a particular Perl function, perldoc accepts the -f flag. For example, to see
a summary of the socket() function, type:
% perldoc -f socket
For Macintosh user's the MacPerl distribution comes with a "helper" application called shuck. This
adds POD viewing facilities to the MacPerl Help menu.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
xx
Acknowledgments
They say that the first skill an editor learns on the job is patience, but I think that Karen Gettman
was born with an excess of it. She must have caught on after the second or third time that when I
said "it should be done in just another week," I really was talking about months. Yet she never
betrayed any sign of dismay, even though I'm sure she was fighting an increasingly restive produc-
Licensed by
tion and marketing staff. To Karen, all I can say is "thank you!"
Thanks also to Mary Hart, the assistant editor responsible for my book. I have worked with Mary on
other projects, and I know that it is her tireless effort that makes publishing with Addison-Wesley
seem so frictionless.
Stjepan Maric
I am extremely grateful to the technical reviewers who worked so diligently to keep me honest: Jon
Orwant, James Lee, Harry Hochheiser, Robert Kolstad, Sander Wahls, and Megan Conklin. The
book is very much better because of your efforts.
I owe a debt of gratitude to the long-suffering members of my laboratory—Ravi, David, Marco, Hong,
4218908
Guanming, Nathalie, and Peter; they have somehow managed to keep things moving forward even
during the last months of manuscript preparation, when my morning absences became increasingly
extended.
And of course I wish to thank my wife, Jean, who has stuck with me through several of these projects
already, and has never, ever, asked for the dining room table back.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
1
Part 1. Basics
The four chapters that follow will provide the fundamental knowledge you need to write networking
applications in Perl using Berkeley sockets. They set the stage for later parts of the book that delve
more deeply into specific network problems and their solutions.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
2
This chapter provides you with the background information you'll need to write TCP/IP applications
in Perl. We review Perl's input/output (I/O) system using the language's built-in function calls, and
then using the object-oriented (OO) extensions of Perl5. This will prepare you to use the object-
oriented constructions in later chapters.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Input/Output Basics 3
Security
Security is an important aspect of network application development, because by definition a network
application allows a process running on a remote machine to affect its execution. Perl has some
features that increase the security of network applications relative to other languages. Because of
its dynamic memory management, Perl avoids the buffer overflows that lead to most of thesecurity
holes in C and other compiled languages. Of equal importance, Perl implements a powerful "taint"
check system that prevents tainted data obtained from the network from being used in operations
such as opening files for writing and executing system commands, which could be dangerous.
Performance
A last issue is performance. As an interpreted language, Perl applications run several times more
slowly than C and other compiled languages, and about par with Java and Python. In most net-
working applications, however, raw performance is not the issue; the I/O bottleneck is. On I/O-bound
applications Perl runs just as fast (or as slowly) as a compiled program. In fact, it's possible for the
performance of a Perl script to exceed that of a compiled program. Benchmarks of a simple Perl-
based Web server that we develop in Chapter 12 are several times better than the C-based Apache
Web server.
If execution speed does become an issue, Perl provides a facility for rewriting time-critical portions
of your application in C, using the XS extension system. Or you can treat Perl as a prototyping
language, and implement the real application in C or C++ after you've worked out the architectural
and protocol details.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Input/Output Basics 4
The lgetl.pl script (for "line get local," Figure 1.1) reads the first line of a local file. Call it with the path
to the file you want to read, and it will print out the top line. For example, here's what I see when I
run the script on a file that contains a quote from James Hogan's "Giants Star":
% lgetl.pl giants_star.txt
"Reintegration complete," ZORAC advised. "We're back in the universe."
This snippet illustrates the typographic conventions this book uses for terminal (command-line in-
terpreter) sessions. The "%" character is the prompt printed out by my command-line interpreter.
Bold-faced text is what I (the user) typed. Everything else is regular monospaced font.
The script itself is straightforward:
Lines 1–2: Load modules—We use() the IO::File module, which wraps an object-oriented in-
terface around Perl file operations.
Line 3: Process the command line argument—We shift() the filename off the command line
and store it in a variable named $file.
Line 4: Open the file—We call the IO::File->new() method to open the file, returning a
filehandle, which we store in $fh. Don't worry if the OO syntax is unfamiliar to you; we discuss
it more later in this chapter.
Lines 5–6: Read a line from the filehandle and print it—We use the <> operator to read a line of
text from the filehandle into the variable $line, which we immediately print.
Now we'll look at a very similar script named lgetr.pl (for "line get remote," Figure 1.2). It too fetches
and prints a line of text, but instead of reading from a local file, this one reads from a remote server.
Its command-line argument is the name of a remote host followed by a colon and the name of the
network service you want to access.
To read a line of text from the "daytime" service running on the FTP server wuarchive.wustl.edu,
we use an argument of "wuarchive.wustl.edu:daytime." This retrieves the current time of day at the
remote site:
% lgetr.pl wuarchive.wustl.edu:daytime
Tue Aug 8 06:49:20 2000
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Input/Output Basics 5
To read the welcome banner from the FTP service at the same site, we ask for "wuarch-
ive.wustl.edu:ftp":
% lgetr.pl wuarchive.wustl.edu:ftp
2:220 wuarchive.wustl.edu FTP server (Version wu-2.6.1(1) Thu Jul 13
21:24:09 CDT 2000) ready.
Or for a change of hosts, we can read the welcome banner from the SMTP (Internet mail) server
running at mail.hotmail.com like this:
% lgetr.pl mail.hotmail.com:smtp
2:220-HotMail (NO UCE) ESMTP server ready at Tue Aug 08 05:24:40 2000
Let's turn to the code for the lgetr.pl script in Figure 1.2.
Lines 1–2: Load modules—We use() the IO::Socket module, which provides an object-oriented
interface for network socket operations.
Line 3: Process the command line argument—We shift() the host and service name off the
command line and store it in a variable named server
Line 4: Open a socket—We call the IO::Socket::INET->new() method to create a "socket"
connected to the designated service running on the remote machine. IO::Socket::INET is a fil-
ehandle class that is adapted for Internet-based communications. A socket is just a specialized
form of filehandle, and can be used interchangeably with other types of filehandles in I/O oper-
ations.
Lines 5–6: Read a line from the socket and print it—We use the <> operator to read a line of
text from the socket into the variable $line, which we immediately print.
Feel free to try the lgetr.pl script on your favorite servers. In addition to the services used in the
examples above, other services to try include "nntp," the Netnews transfer protocol, "chargen," a
test character generator, and "pop3," a protocol for retrieving mail messages. If the script appears
to hang indefinitely, you've probably contacted a service that requires the client to send the first line
of text, such as an HTTP (Web) server. Just interrupt the script and try a different service name.
Although lgetr.pl doesn't do all that much, it is useful in its own right. You can use it to check the
time on a remote machine, or wrap it in a shell script to check the time synchronization of all the
servers on your network. You could use it to generate a summary of the machines on your network
that are running an SMTP mail server and the software they're using.
Notice the similarity between the two scripts. Simply by changing IO::File->new() to
IO::Socket::INET->new(), we have created a fully functional network client. Such is the power
of Perl.
Filehandles
Filehandles are the foundation of networked applications. In this section we review the ins and outs
of filehandles. Even if you're an experienced Perl programmer, you might want to scan this section
to refresh your memory on some of the more obscure aspects of Perl I/O.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Input/Output Basics 6
Standard Filehandles
A filehandle connects a Perl script to the outside world. Reading from a filehandle brings in outside
data, and writing to one exports data. Depending on how it was created, a filehandle may be con-
nected to a disk file, to a hardware device such as a serial port, to a local process such as a com-
mand-line window in a windowing system, or to a remote process such as a network server. It's also
possible for a filehandle to be connected to a "bit bucket" device that just sucks up data and ignores
it.
A filehandle is any valid Perl identifier that consists of uppercase and lowercase letters, digits, and
the underscore character. Unlike other variables, a filehandle does not have a distinctive prefix (like
"$"). So to make them distinct, Perl programmers often represent them in all capital letters, or caps.
When a Perl script starts, exactly three filehandles are open by default: STDOUT, STDIN, and
STDERR. The STDOUT filehandle, for "standard output," is the default filehandle for output. Data sent
to this filehandle appears on the user's preferred output device, usually the command-line window
from which the script was launched. STDIN, for "standard input," is the default input filehandle. Data
read from this filehandle is taken from the user's preferred input device, usually the keyboard.
STDERR ("standard error") is used for error messages, diagnostics, debugging, and other such in-
cidental output. By default STDERR uses the same output device as STDOUT, but this can be changed
at the user's discretion. The reason that there are separate filehandles for normal and abnormal
output is so that the user can divert them independently; for example, to send normal output to a
file and error output to the screen.
This code fragment will read a line of input from STDIN, remove the terminating end-of-line character
with the chomp() function, and echo it to standard output:
$input = <STDIN>;
chomp($input);
print STDOUT "If I heard you correctly, you said: $input\n";
By taking advantage of the fact that STDIN and STDOUT are the defaults for many I/O operations,
and by combining chomp() with the input operation, the same code could be written more succinctly
like this:
chomp($input = <>);
print "If I heard you correctly, you said: $input\n";
We review the <> and print() functions in the next section. Similarly, STDERR is the default des-
tination for the warn() and die() functions.
The user can change the attachment of the three standard filehandles before launching the script.
On UNIX and Windows systems, this is done using the redirect metacharacters "<" and ">". For
example, given a script named muncher.pl this command will change the script's standard input so
that it comes from the file data.txt, and its standard output so that processed data ends up in
crunched.txt:
% muncher.pl <data.txt >crunched.txt
Standard error isn't changed, so diagnostic messages (e.g., from the built-in warn() and die()
functions) appear on the screen.
On Macintosh systems, users can change the source of the three standard filehandles by selecting
filenames from a dialog box within the MacPerl development environment.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Input/Output Basics 7
$line = <FILEHANDLE>
@lines = <FILEHANDLE>
$line <>
@lines <>
The <> ("angle bracket") operator is sensitive to the context in which it is called. If it is used to assign to a scalar
variable, a so-called scalar context, it reads a line of text from the indicated filehandle, returning the data along
with its terminating end-of-line character. After reading the last line of the filehandle, <> will return undef,
signaling the end-of-file (EOF) condition.
When <> is assigned to an array or used in another place where Perl ordinarily expects a list, it reads all lines
from the filehandle through to EOF, returning them as one (potentially gigantic) list. This is called a list context.
If called in a "void context" (i.e., without being assigned to a variable),<> copies a line into the $_ global variable.
This is commonly seen in while() loops, and often combined with pattern matches and other operations that
use $_ implicitly:
while (<>) {
print "Found a gnu\n" if /GNU/i;
}
The <FILEHANDLE> form of this function explicitly gives the filehandle to read from. However, the <> form is
"magical." If the script was called with a set of file names as command-line arguments, <> will attempt to
open() each argument in turn and will then return lines from them as if they were concatenated into one large
pseudofile.
If no files are given on the command line, or if a single file named "-" is given, then <> reads from standard
input and is equivalent to <STDIN>. See the perlfunc POD documentation for an explanation of how this works
(pod perlfunc, as explained in the Preface).
$bytes = read (FILEHANDLE,$buffer,$length [,$offset])
$bytes = sysread (FILEHANDLE,$buffer,$length [,$offset])
The read() and sysread() functions read data of arbitrary length from the indicated filehandle. Up to
$length bytes of data will be read, and placed in the $buffer scalar variable. Both functions return the
number of bytes actually read, numeric 0 on the end of file, or undef on an error.
This code fragment will attempt to read 50 bytes of data from STDIN, placing the information in $buffer, and
assigning the number of bytes read to $bytes:
my $buffer;
$bytes = read (STDIN,$buffer,50);
By default, the read data will be placed at the beginning of $buffer, overwriting whatever was already there.
You can change this behavior by providing the optional numeric $offset argument, to specify that read data
should be written into the variable starting at the specified position.
The main difference between read() and sysread() is that read() uses standard I/O buffering, and
sysread() does not. This means that read() will not return until either it can fetch the exact number of bytes
requested or it hits the end of file. The sysread() function, in contrast, can return partial reads. It is guaranteed
to return at least 1 byte, but if it cannot immediately read the number of bytes requested from the filehandle, it
will return what it can. This behavior is discussed in more detail later in the Buffering and Blocking section.
$result = print FILEHANDLE $data1,$data2,$data3...
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Input/Output Basics 8
When reading data as a byte stream with read() or sysread(), a common idiom is to pass
length($buffer) as the offset into the buffer. This will make read() append the new data to
the end of data that was already in the buffer. For example:
my $buffer;
while (1) {
$bytes = read (STDIN,$buffer,50,length($buffer));
last unless $bytes > 0;
}
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Input/Output Basics 9
The EOF condition is signaled differently depending on whether you are reading from the filehandle
one line at a time or as a byte stream. For byte-stream operations with read() or sysread(),
EOF is indicated when the function returns numeric 0. Other I/O errors return undef and set $! to
the appropriate error message. To distinguish between an error and a normal end of file, you can
test the return value with defined():
while (1) {
my $bytes = read(STDIN,$buffer,100);
die "read error" unless defined ($bytes);
last unless $bytes > 0;
}
In contrast, the <> operator doesn't distinguish between EOF and abnormal conditions, and returns
undef in either case. To distinguish them, you can set $! to undef before performing a series of
reads, and check whether it is defined afterward:
undef $!;
while (defined(my $line = <STDIN>)) {
$data .= $line;
}
die "Abnormal read error: $!" if defined ($!);
When you are using <> inside the conditional of a while() loop, as shown in the most recent code
fragment, you can dispense with the explicit defined() test. This makes the loop easier on the
eyes:
while (my $line = <STDIN>) {
$data .= $line;
}
This will work even if the line consists of a single 0 or an empty string, which Perl would ordinarily
treat as false. Outside while() loops, be careful to use defined() to test the returned value for
EOF.
Finally, there is the eof() function, which explicitly tests a filehandle for the EOF condition:
$eof = eof(FILEHANDLE)
The eof() function returns true if the next read on FILEHANDLE will return an EOF. Called without arguments
or parentheses, as in eof, the function tests the last filehandle read from.
When using while(<>) to read from the command-line arguments as a single pseudofile, eof() has "mag-
ical"—or at least confusing—properties. Called with empty parentheses, as in eof(), the function returns true
at the end of the very last file. Called without parentheses or arguments, as in eof, the function returns true
at the end of each of the individual files on the command line. See the Perl POD documentation for examples
of the circumstances in which this behavior is useful.
In practice, you do not have to use eof() except in very special circumstances, and a reliance on
it is often a signal that something is amiss in the structure of your program.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Input/Output Basics 10
Licensed by
of-line character(s) contained in $/, and return the line of text with the end-of-line sequence still
attached. The chomp() function looks for the end-of-line sequence at the end of a text string and
removes it, respecting the current value of $/.
The string escape \n is the logical newline character, and means different things on different plat-
Stjepan Maric
forms. For example, \n is equivalent to \012 on UNIX systems, and to \015 on Macintoshes. (On
Windows systems, \n is usually \012, but see the later discussion of DOS text mode.) In a similar
vein, \r is the logical carriage return character, which also varies from system to system.
When communicating with a line-oriented network server that uses CRLF to terminate lines, it won't
4218908
be portable to set $/ to \r\n. Use the explicit string \015\012 instead. To make this less obscure,
the Socket and IO::Socket modules, which we discuss in great detail later, have an option to export
globals named $CRLF and CRLF() that return the correct values.
There is an additional complication when performing line-oriented I/O on Microsoft Windows and
DOS machines. For historical reasons, Windows/DOS distinguishes between filehandles in "text
mode" and those in "binary mode." In binary mode, what you see is exactly what you get. When you
print to a binary filehandle, the data is output exactly as you specified. Similarly, read operations
return the data exactly as it was stored in the file.
In text mode, however, the standard I/O library automatically translates LF into CRLF pairs on the
way out, and CRLF pairs into LF on the way in. The virtue of this is that it makes text operations on
Windows and UNIX Perls look the same—from the programmer's point of view, the DOS text files
end in a single \n character, just as they do in UNIX. The problem one runs into is when reading or
writing binary files—such as images or indexed databases—and the files become mysteriously cor-
rupted on input or output. This is due to the default line-end translation. Should this happen to you,
you should turn off character translation by calling binmode() on the filehandle.
Another way to avoid confusion over text and binary mode is to use the sysread() and sys-
write() functions, which bypass the character translation routines in the standard I/O library.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Input/Output Basics 11
A whole bevy of special global variables control other aspects of line-oriented I/O, such as whether
to append an end-of-line character automatically to data output with the print() statement, and
whether multiple data values should be separated by a delimiter. See Appendix B for a brief sum-
mary.
We call open() with two arguments: a filehandle name and the name of the file we wish to open.
The filehandle name is any valid Perl identifier consisting of any combination of uppercase and
lowercase letters, digits, and the underscore character. To make them distinct, most Perl program-
mers choose all uppercase letters for filehandles. The " > " symbol in front of the filename tells Perl
to overwrite the file's contents if it already exists, or to create the file if it doesn't. The file will then
be opened for writing.
If open() succeeds, it returns a true value. Otherwise, it returns false, causing Perl to evaluate the
expression to the right of the or operator. This expression simply dies with an error message, using
Perl's $! global variable to retrieve the last system error message encountered.
We call print() twice to write some text to the filehandle. The first argument to print() is the
filehandle, and the second and subsequent arguments are strings to write to the filehandle. Again,
notice that there is no comma between the filehandle and the strings to print. Whatever is printed
to a filehandle shows up in its corresponding file. If the filehandle argument to print() is omitted,
it defaults to STDOUT.
After we have finished printing, we call close() to close the filehandle. close() returns a true
value if the filehandle was closed uneventfully, or false if some untoward event, such as a disk filling
up, occurred. We check this result code using the same type of or test we used earlier.
Let's look at open() and close() in more detail.
$success = open(FILEHANDLE,$path)
$success = open(FILEHANDLE,$mode,$path)
The open() call opens the file given in $path, associating it with a designated FILEHANDLE. There are both
two- and three-argument versions of open(). In the three-argument version, which is available in Perl versions
5.6 and higher, a $mode argument specifies how the file is to be opened. $mode is a one- or two-character
string chosen to be reminiscent of the I/O redirection operators in the UNIX and DOS shells. Choices are shown
here.
Mode Description
< Open file for reading
> Truncate file to zero length and open for writing
>> Open file for appending, do not truncate
+> Truncate file and then open for read/write
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Input/Output Basics 12
Mode Description
<+ Open file for read/write, do not truncate
We can open the file named darkstar.txt for reading and associate it with the filehandle DARKFH like
this:
open(DARKFH,'<','darkstar.txt');
In the two-argument form of open(), the mode is appended directly to the filename, as in:
open(DARKFH,'<darkstar.txt');
For readability, you can put any amount of whitespace between the mode symbol and the filename;
it will be ignored. If you leave out the mode symbol, the file will be opened for reading. Hence the
above examples are all equivalent to this:
open(DARKFH,'darkstar.txt');
If successful, open() will return a true value. Otherwise it returns false. In the latter case, the $!
global will contain a human-readable message indicating thecause of the error.
$success = close(FH);
The close() function closes a previously opened file, returning true if successful, or false otherwise. In the
case of an error, the error message can again be found in $!.
When your program exits, any filehandles that are still open will be closed automatically.
The three-argument form of open() is used only rarely. However, it has the virtue of not scanning
the filename for special characters the way that the two-argument form does. This lets you open
files whose names contain leading or trailing whitespace, ">" characters, and other weird and arbi-
trary data. The filename "-" is special. When opened for reading, it tells Perl to open standard input.
When opened for writing, it tells Perl to open standard output.
If you call open() on a filehandle that is already open, it will be automatically closed and then
reopened on the file that you specify. Among other things, this call can be used to reopen one of
the three standard filehandles on the file of your choice, changing the default source or destination
of the <>, print(), and warn() functions. We will see an example of this shortly.
As with the print() function, many programmers drop the parentheses around open() and
close(). For example, this is the most common idiom for opening a file:
open DARKSTAR,"darkstar.txt" or die "Couldn't open darkstar.txt: $!"
I don't like this style much because it leads to visual ambiguity (does the or associate with the string
"darkstar.txt" or with the open() function?). However, I do use this style with close(),
print(), and return() because of their ubiquity.
The two-argument form of open() has a lot of magic associated with it (too much magic, some
would say). The full list of magic behavior can be found in the perlfunc and perlopentut POD docu-
mentation. However, one trick is worth noting because we use it in later chapters. You can dupli-
cate a filehandle by using it as the second argument to open() with the sequence >& or <& pre-
pended to the beginning. >& duplicates filehandles used for writing, and <& duplicates those used
for reading:
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Input/Output Basics 13
open (OUTCOPY,">&STDOUT");
open (INCOPY,"<&STDOUT");
This example creates a new filehandle named OUTCOPY that is attached to the same device as
STDOUT. You can now write to OUTCOPY and it will have the same effect as writing to STDOUT. This
is useful when you want to replace one or more of the three standard filehandles temporarily, and
restore them later. For example, this code fragment will temporarily reopen STDOUT onto a file,
invoke the system date command (using the system() function, which we discuss in more detail
in Chapter 2), and then restore the previous value of STDOUT. When date runs, its standard output
is opened on the file, and its output appears there rather than in the command window:
#!/usr/bin/perl
# file: redirect.pl
open (STDOUT,">&SAVEOUT");
print "STDOUT restored\n";
Notice how the second print() statement and the output of the date system command went to
the file rather than to the screen because we had reopened STDOUT at that point. When we restored
STDOUT from the copy saved in SAVEOUT, our ability to print to the terminal was restored.
Perl also provides an alternative API for opening files that avoids the magic and obscure syntax of
open() altogether. The sysopen() function allows you to open files using the same syntax as the
C library's open() function.
The $mode argument used in sysopen() is different from the mode used in ordinary open().
Instead of being a set of characters, it is a numeric bitmask formed by ORing together one or more
constants using the bitwise OR operator " | ". For example, the following snippet opens up a file for
writing using a mode that causes it to be created if it doesn't exist, and truncated to length zero if it
does (equivalent to open()'s " > " mode):
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Input/Output Basics 14
sysopen(FH, "darkstar.txt",O_WRONLY|O_CREAT|O_TRUNC)
or die "Can't open: $!"
The standard Fcntl module exports the constants recognized by sysopen(), all of which begin
with the prefix O_. Just use Fcntl at the top of your script togain access to them.
The mode constants useful for sysopen() are listed in Table 1.1. Each call to sysopen() must
have one (and only one) of O_RDONLY, O_WRONLY, and O_RDWR. The O_WRONLY and O_RDWR
constants may be ORed with one or more of O_CREAT, O_EXCL, O_TRUNC, or O_APPEND.
O_CREAT causes the file to be created if it doesn't already exist. If it isn't specified, and the file
doesn't exist when you try to open it for writing, then sysopen() will fail.
Combining O_CREAT with O_EXCL leads to the useful behavior of creating the file if it doesn't already
exist, but failing if it does. This can be used to avoid accidentally clobbering an existing file.
If O_TRUNC is specified, then the file is truncated to zero length before the first write, effectively
overwriting the previous contents. O_APPEND has the opposite effect, positioning the write pointer
to the end of the file so that everything written to the file is appended to its existing contents.
Table 1.1. sysopen() Mode Constants
Constant Description
O_RDONLY Open read only.
O_WRONLY Open write only.
O_RDWR Open read/write.
O_CREAT Create file if it doesn't exist.
O_EXCL When combined with O_CREAT, create file if it doesn't exist and fail if it does.
O_TRUNC If file already exists, truncate it to zero.
O_APPEND Open file in append mode (equivalent to open()'s " >> ").
O_NOCTTY If the file is a terminal device, open it without allowing it to become the process's controlling terminal.
O_NONBLOCK Open file in nonblockingmode.
O_SYNC Open file for synchronous mode, so that all writes block until the data is physically written.
The O_NOCTTY, O_NONBLOCK, and O_SYNC modes all have specialized uses that are discussed in
later chapters.
If sysopen() needs to create a file, the $perm argument specifies the permissions mode of the
resulting file. File permissions is a UNIX concept that maps imperfectly onto the Windows/DOS
world, and not at all onto the Macintosh world. It is an octal value, such as 0644 (which happens to
specify read/write permissions for the owner of the file, and read-only permissions for others).
If $perm is not provided, sysopen() defaults to 0666, which grants read/write access to everyone.
However, whether you specify the permissions or accept the default, the actual permissions of the
created file are determined by performing the bitwise AND between the $perm argument and the
current contents of the user's umask (another UNIX concept). This is often set, at the user's dis-
cretion, to forbid access to the file from outside the user's account or group.
In most circumstances, it is best to omit the permissions argument and let the user adjust the
umask. This also increases the portability of the program. See the umask() entry in the perlfunc
POD documentation for information on how you can examine and set the umask programatically.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Input/Output Basics 15
Figure 1.3. Buffers help solve the mismatch between computation speed and I/O speed
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Input/Output Basics 16
Similarly, for input operations, the operating system receives data from active input devices (the
keyboard, disk drive, network card) and writes that data to an input buffer somewhere in memory.
The data remains in the input buffer until your program calls <> or read(), at which point the data
is copied from the operating system's buffer into the memory space corresponding to a variable in
your program.
The advantage of buffering is significant, particularly if your program performs I/O in a "bursty" way;
that is, it performs numerous reads and writes of unpredictable size and timing. Instead of waiting
for each operation to complete at the hardware level, the data is safely buffered in the operating
system and "flushed"—passed on to the output device—whenever the downstream hardware can
accept it.
The buffers in Figure 1.3 are conceptually circular FIFO (first in first out) data structures. When data
is written beyond the end of the buffer memory area, the operating system merely begins writing
new data at the beginning. The operating system maintains two pointers in each of its I/O buffers.
The write pointer is the place where new data enters the buffer. The read pointer is the position from
which stored data is moved out of the buffer to its next destination. For example, on write operations,
each print() you perform adds some data to the output buffer and advances the write pointer
forward. The operating system reads older data starting at the read pointer and copies it to the low-
level hardware device.
The size of the I/O buffer minus the distance between the write pointer and the read pointer is the
amount of free space remaining. If your program is writing faster than the output hardware can
receive it, then the buffer will eventually fill up and the buffer's free space will be zero. What happens
then?
Because there is no room for new data in the buffer, the output operation cannot succeed immedi-
ately. As a result, the write operation blocks. Your program will be suspended at the blocked
print() or syswrite() for an indefinite period of time. When the backlog in the output buffer
clears and there is again room to receive new data, the output operation will unblock and print() or
syswrite() will return.
In a similar fashion, reads will block when the input buffer is empty; that is, it blocks when the amount
of free space is equal to the size of the buffer. In this case, calls to read() or sysread() will block
until some new data has entered the buffer and there is something to be read.
Blocking is often the behavior you want, but sometimes you need more control over I/O. There are
several techniques to manage blocking. One technique, discussed in Chapter 2 under Timing Out
System Calls, uses signals to abort an I/O operation prematurely if it takes too long. Another tech-
nique, discussed in Chapter 12, uses the four-argument select() system call to test a filehandle
for its readiness to perform I/O before actually making the read or write call. A third technique,
discussed in Chapter 13, is to mark the filehandle as nonblocking, which causes the read or write
operation to return immediately with an error code if the operation would block.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Input/Output Basics 17
Perl's <> operator, read(), and print() all use stdio. When you call print(), the data is trans-
ferred to an output buffer in the stdio layer before being sent to the operating system itself. Likewise,
<> and read() both read data from an stdio buffer rather than directly from the OS. Each filehandle
has its own set of buffers for input and output. For efficiency reasons, stdio waits until its output
buffers reach a certain size before flushing their contents to the OS.
Normally, the presence of the stdio buffering is not a problem, but it can run you into trouble when
doing more sophisticated types of I/O such as network operations. Consider the common case of
an application that requires you to write a small amount of data to a remote server, wait for a re-
sponse, and then send more data. You may think that the data has been sent across the network,
but in fact it may not have been. The output data may still be sitting in its local stdio buffer, waiting
for more data to come before flushing the buffer. The remote server will never receive the data, and
so will never return a response. Your program will never receive a response and so will never send
additional data. Deadlock ensues.
In contrast, the lower-level buffering performed by the operating system doesn't have this property.
The OS will always attempt to deliver whatever data is in its output buffers as soon as the hardware
can accept it.
There are two ways to work around stdio buffering. One is to turn on "autoflush" mode for the file-
handle. Autoflush mode applies only to output operations. When active, Perl tells stdio to flush the
filehandle's buffer every time print() is called.
To turn on autoflush mode, set the special variable $| to a true value. Autoflush mode affects the
currently selected filehandle, so to change autoflush mode for a specific filehandle, one must first
select() it, set $| to true, and then select() the previously selected filehandle. For example,
to turn on autoflush mode for filehandle FH:
my $previous = select(FH);
$| = 1;
select($previous);
You will sometimes see this motif abbreviated using the following mind-blowing idiom:
select((select(FH),$|=1)[0]);
However, it is much cleaner to bring in the IO::Handle module, which adds an autoflush()
method to filehandles. With IO::Handle loaded, FH can be put into autoflush mode like this:
use IO::Handle;
FH->autoflush(1);
If the OO syntax confuses you, see the Objects and References section later in this chapter.
The other way to avoid stdio buffering problems is to use the sysread()i and syswrite() calls.
These calls bypass the stdio library and go directly to the operating system I/O calls. An important
advantage of these calls is that they interoperate well with other low-level I/O calls, such as the four-
argument select() call, and with advanced techniques such as nonblocking I/O.
Another ramification of the fact that the sys*() functions bypass stdio is the difference in behavior
between read() and sysread() when asked to fetch a larger chunk of data than is available. In
the case of read(), the function will block indefinitely until it can fetch exactly the amount of data
requested. The only exception to this is when the filehandle encounters the end of file before the
full request has been satisfied, in which case read() will return everything to the end of the file. In
1You can think of the stdio library as a layersitting on top of the OS, making the OS look more C-like to programs; similarly, a Pascal standard
I/O library makes the OS look as if it were written in Pascal.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Input/Output Basics 18
contrast, sysread() can and will return partial reads. If it can't immediately read the full amount
of data requested, it will return the data that is available. If no data is available, sysread() will
block until it can return at least 1 byte. This behavior makes sysread() invaluable for use in net-
work communications, where data frequently arrives in chunks of unpredictable size.
For these reasons, sysread() and syswrite() are preferred for many network applications.
This technique often works; however, it will run you into problems as soon asyou try to pass file-
handles to subroutines in other packages, such as functions exported by modules. The reason is
that passing filehandles as strings loses the filehandle package information. If we pass the filehandle
MY_FH from the main script (package main) to a subroutine defined in the MyUtils module, the
subroutine will try to access a filehandle named MyUtils::MY_FH rather than the true filehandle,
which is main::MY_FH. The same problem also occurs, of course, when a subroutine from one
package tries to return a filehandle to a caller from another package.
The correct way to move filehandles around is as a typeglob or a typeglob reference. Typeglobs
are symbol table entries, but you don't need to know much more about them than that in order to
use them (see the perlref POD documentation for the full details). To turn a filehandle into a glob
put an asterisk ("*") in front of its name:
$fh = *MY_FH;
To turn a filehandle into a typeglob reference, put "\*" in front of its name:
$fh = \*MY_FH;
In either case, $fh can now be used to pass the filehandle back and forth between subroutines and
to store filehandles in data structures. Of the two forms, the glob reference (\*HANDLE) is the safer,
because there's less risk of accidentally writing to the variable and altering the symbol table. This
is the form we use throughout this book, and the one used by Perl's I/O-related modules, such as
IO::Socket.
Typeglob references can be passed directly to subroutines:
hello(\*MY_FH):
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Input/Output Basics 19
Typeglob refs can be used anywhere a bare filehandle is accepted, including as the first argument
to print(), read(), sysread(), syswrite(), or any of the socket-related calls that we discuss
in later chapters.
Sometimes you will need to examine a scalar variable to determine whether it contains a valid
filehandle. The fileno() function makes this possible:
Detecting Errors
Because of the vicissitudes of Internet communications, I/O errors are common in network appli-
cations. As a rule, each of the Perl functions that performs I/O returns undef, a false value, on
failure. More specific information can be found by examining the special global variable $!.
$! has an interesting dual nature. Treated as a string, it will return a human-readable error message
such as Permission denied. Treated as a number, however, it will return the numeric constant
for the error, as defined by the operating system (e.g., EACCES). It is generally more reliable to use
these numeric error constants to distinguish particular errors, because they are standardized across
operating systems.
You can obtain the values of specific error message constants by importing them from the Errno
module. In the use statement, you can import individual constants by name, or all of them at once.
To bring in individual constants, list them in the use() statement, as shown here:
use Errno qw(EACCES ENOENT);
my $result = open (FH,">/etc/passwd");
if (!$result) { # oops, something went wrong
if ($! == EACCESS) {
warn "You do not have permission to open this file.";
} elsif ($! == ENOENT) {
warn "File or directory not found.";
} else {
warn "Some other error occurred: $!";
}
}
The qw() operator is used to split a text string into a list of words. The first line above is equivalent
to:
use Errno ('EACCESS','ENOENT');
and brings in the EACCESS and ENOENT constants. Notice that we use the numeric comparison
operator " == " when comparing $! to numeric constants.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Input/Output Basics 20
To bring in all the common error constants, import the tag :POSIX. Thisbrings in the error constants
that are defined by the POSIX standard, across-platform API that UNIX, Windows NT/2000, and
many other operating systems are largely compliant with. For example:
use Errno qw(:POSIX);
Do not get into the habit of testing $! to see if an error occurred during the last operation. $! is set
when an operation fails, but is not unset when an operation succeeds. The value of $! should be
relied on only immediately after a function has indicated failure.
Licensed by
know much about creating object-oriented modules, you will need a basic understanding of how to
use OO modules and their syntax. This section illustrates the basics of Perl's OO syntax with ref-
erence to the IO::Handle and IO::File module, which together form the basis of Perl's object-oriented
I/O facilities.
4218908
$a = 'hi there';
$a_ref = \$a; # reference to a scalar
@b = ('this','is','an','array');
$b_ref = \@b; # reference to an array
%c = ( first_name => 'Fred', last_name => 'Jones');
$c_ref = \%c; # reference to a hash
Once a reference has been created, you can make copies of it, as you would any regular scalar, or
stuff it into arrays or hashes. When you want to get to the data contained inside a reference, you
dereference it using the prefix appropriate for its contents:
$a = $$a_ref;
@b = @$b_ref;
%c = %$c_ref;
You can index into array and hash references without dereferencing the whole thing by using the
-> syntax:
$b_ref->[2]; # yields "an"
$c_ref->{last_name}; # yields "Jones"
It is also possible to create references to anonymous, unnamed arrays and hashes, using the fol-
lowing syntax:
$anonymous_array = ['this','is','an','anonymous','array'];
$anonymous_hash = { first_name =>'Jane', last_name => 'Doe' };
If you try to print out a reference, you'll get a string like HASH(0x82ab0e0), which indicates the
type of reference and the memory location where it can be found (which is just short of useless).
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Input/Output Basics 21
An object is a reference with just a little bit extra. It is "blessed" into a particular module's package
in such a way that it carries information about what module created it.2 The blessed reference will
continue to work just like other references. For example, if the object named $object is a blessed
hash reference, you can index into it like this:
$object->{last_name};
What makes objects different from plain references is that they have methods. A method call uses
the -> notation, but followed by a subroutine name and optional subroutine-style arguments:
$object->print_record(); # invoke the print_record() method
You may sometimes see a method called with arguments, like this:
$object->print_record(encoding => 'EBCDIC');
The "=>" symbol is accepted by Perl as a synonym for ','. It makes the relationship between the two
arguments more distinct, and has the added virtue of automatically quoting the argument on the left.
This allows us to write encoding rather than "encoding". If a method takes no arguments, it's often
written with the parentheses omitted, as in:
$object->print_record;
In many cases, print_record() will be a subroutine defined in the object's package. Assuming
that the object was created by a module named BigDatabase, the above is just a fancy way of saying
this:
BigDatabase::print_record($object);
However, Perl is more subtle than this, and the print_record(), method definition might actually
reside in another module, which the current module inherits from. How this works is beyond the
scope of this introduction, and can be found in the perltoot, perlobj, and perlref POD pages, as well
as in [Wall 2000] and the other general Perl reference works listed in Appendix D.
To create an object, you must invoke one of its constructors. A constructor is a method call that is
invoked from the module's name. For example, to create a new BigDatabase object:
$object = BigDatabase->new(); # call the new() constructor
Constructors, which are a special case of a class method, are frequently named new(). However,
any subroutine name is possible. Again, this syntax is part trickery. In most cases an equivalent call
would be:
$object = BigDatabase::new('BigDatabase');
This is not quite the same thing, however, because class methods can also be inherited.
2The function responsible for turning ordinary references into blessed ones is, naturally enough, called bless().
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Input/Output Basics 22
IO::File's elegance does not by itself provide any very compelling reason to choose the object-
oriented syntax over native filehandles. Its main significance is that IO::Socket, IO::Pipe, and other
I/O-related modules also inherit their behavior from IO::Handle. This means that programs that read
and write from local files and those that read and write to remote network servers share a common,
easy-to-use interface.
We'll get a feel for the module by looking at a tiny example of a program that opens a file, counts
the number of lines, and reports its findings (Figure 1.4).
Lines 1–3: Load modules—We turn on strict syntax checking, and load the IO::File module.
Lines 4–5: Initialize variables—We recover from the command line the name of the file to perform
the line count on, and initialize the $counter variable to zero.
Line 6: Create a new IO::File object—We call the IO::File::new() method, using the syntax
IO::File->new(). The argument is the name of the file to open. If successful, new() returns
a new IO::File object that we can use for I/O. Otherwise it returns undef, and we die with an
error message.
Lines 7–9: Main loop—We call the IO::File object's getline() method in the test portion of a
while() loop. This method returns the next line of text, or undef on end of file—just like <>.
Each time through the loop we bump up $counter. The loop continues until getline() returns
undef.
Line 10: Print results—We print out our results by calling STDOUT->print(). We'll discuss why
this surprising syntax works in a moment.
When I ran count_lines.pl on the unfinished manuscript for this chapter, I got the following result:
% count_lines.pl ch1.pod
Counted 2428 lines
IO::File objects are actually blessed typeglob references (see the Passing and Storing Filehandles
section earlier in this chapter). This means that you can use them in an object-oriented fashion, as
in:
$fh->print("Function calls are for the birds.\n");
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Input/Output Basics 23
Many of IO::File's methods are simple wrappers around Perl's built-in functions. In addition to
print() and getline() methods, there are read(), syswrite(), and close() methods,
among others. We discuss the pros and cons of using object-oriented method calls and function
calls in Chapter 5, where we introduce IO:: Socket.
When you load IO::File (technically, when IO::File loads IO::Handle, which it inherits from), it adds
methods to ordinary filehandles. This means that any of the methods in IO::File can also be used
with STDIN, STDOUT, STDERR, or even with any conventional filehandles that you happen to create.
This is why line 10 of Figure 1.4 allows us to print to standard output by calling print().
Of the method listings that follow, only the new() and new_tmpfile() methods are actually de-
fined by IO::File. The rest are inherited from IO::Handle and can be used with other descendents of
IO::Handle, such as IO::Socket. This list is not complete. I've omitted some of the more obscure
methods, including those that allow you to move around inside a file in a record-oriented fashion,
because we won't need them for network communications.
If called with two or three arguments, IO::File treats the second argument as the open mode, and the third
argument as the file creation permissions. $mode may be a Perl-style mode string, such as " +< ", or an octal
numeric mode, such as those used by sysopen(). As a convenience, IO::File automatically imports the Fcntl
O_* constants when it loads. In addition, open() allows for an alternative type of symbolic mode string that
is used in the C fopen() call; for example, it allows " w " to open the file for writing. We won't discuss those
modes further here, because they do not add functionality.
The permission agreement given by $perms is an octal number, and has the same significance as the corre-
sponding parameter passed to sysopen().
If new() cannot open the indicated file, it will return undef and set $! to the appropriate system error message.
$fh = IO::File->new_tmpfile
The new_tmpfile() constructor, which is called without arguments, creates a temporary file opened for
reading and writing. On UNIX systems, this file is anonymous, meaning that it cannot be seen on the file system.
When the IO::File object is destroyed, the file and all its contents will be deleted automatically.
This constructor is useful for storing large amounts of temporary data.
$result = $fh->close
The close() method closes the IO::File object, returning a true result if successful. If you do not call
close() explicitly, it will be called automatically when the object is destroyed. This happens when the script
exits, if you happen to undef() the object, or if the object goes out of scope such as when a my variable
reaches the end of its enclosing block.
$result = $fh->open($filename [,$mode [,$perms]])
You can reopen a filehandle object on the indicated file by using its open() method. The input arguments are
identical to new(). The method result indicates whether the open was successful.
This is chiefly used for reopening the standard filehandles STDOUT, STDIN, and STDERR. For example:
STDOUT->open("log.txt") or die "Can't reopen STDOUT: $!";
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Input/Output Basics 24
$boolean = $fh->eof
Returns true if the next read on the filehandle object will return EOF.
$fh->flush
The flush() method immediately flushes any data that is buffered in the filehandle object. If the filehandle
is being used for writing, then its buffered data is written to disk (or to the pipe, or network, as we'll see when
we get to IO::Socket objects). If the filehandle is being used for reading, any data in the buffer is discar-
ded, forcing the next read to come from disk.
$boolean = $fh->blocking("$boolean")
The blocking() method turns on and off blocking mode for the filehandle. We discuss how to use this at
length in Chapter 13.
$fh->clearerr
$boolean = $fh->error
These two methods are handy if you wish to perform a series of I/O operations and check the error status only
after you're finished. The error()i method will return true if any errors have occurred on the filehandle since
it was created, or since the last call to clearerr(). The clearerr() method clears this flag.
In addition to the methods listed here, IO::File has a constructor named new_from_fd(), and a
method named fdopen(), both inherited from IO::Handle. These methods can be used to save
and restore objects in much the way that the >&FILEHANDLE does with standard filehandles.
$fh = IO::File->new_from_fd($fd,$mode)
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Input/Output Basics 25
The new_from_fd() method opens up a copy of the filehandle object indicated by $fd using the read/write
mode given by $mode. The object may be an IO::Handle object, an IO::File object, a regular filehandle, or a
numeric file descriptor returned byfileno(). $mode must match the mode with which $fd was originally
opened. For example:
$saveout = IO::File->new_from_fd(STDOUT,">");
$result = $fh->fdopen($fd,$mode)
The fdopen() method is used to reopen an existing filehandle object, making it a copy of another one. The
$fd argument may be an IO::Handle object or a regular filehandle, or a numeric file descriptor $mode must
match the mode with which $fd was originally opened.
This is typically used in conjunction with new_from_fd() to restore a saved filehandle:
$saveout = IO::File->new_from_fd(STDOUT,">"); # save STDOUT
STDOUT->open('>log.txt'); # reopen on a file
STDOUT->print("Yippy yie yay!\n"); # print something
STDOUT->fdopen($saveout,">"); # reopen on saved value
See the POD documentation for IO::Handle and IO::File for information about the more obscure
features that these modules provide.
Summary
Perl and network programming were made for each other. Perl's strong text-processing abilities
combine with a flexible I/O subsystem to create an environment that is ideal for interprocess com-
munication.
This, combined with its native support for the Berkeley Sockets protocol, make Perl an excellent
choice for network applications.
In this chapter we reviewed the essential components of Perl's I/O API. Filehandles are the funda-
mental object used for Perl input/output operations, and offer both line-oriented and byte-stream-
oriented modes.
The STDIN, STDOUT, and STDERR filehandles are available when a program is started, and corre-
spond to the standard input, output, and error devices. A script may open up additional filehandles,
or reopen the standard ones on different files.
The standard I/O library, used by the <>, read(), and print() functions, improves I/O efficiency
by adding a layer of buffering to input and output operations. However, this buffering can sometimes
get in the way. One way to avoid buffering problems is to put the filehandle into autoflush mode.
Another way is to use the lower-level syswrite() and sysread() functions.
The IO::File and IO::Handle modules add object-oriented methods to filehandles. They smooth out
some of the inconsistencies in Perl's original design, and pave the way to a smooth transition to
IO::Socket.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
26
This chapter discusses three key Perl features: processes, pipes, and signals. By creating new
processes, a Perl program can run another program or even clone copies of itself in order to divide
the work. Pipes allow Perl scripts to exchange data with other processes, and signals make it pos-
sible for Perl scripts to monitor and control other processes.
Processes
UNIX, VMS, Windows NT/2000, and most other modern operating systems are multitasking. They
can run multiple programs simultaneously, each one running in a separate thread of execution
known as a process. On machines with multiple CPUs, the processes running on different CPUs
really are executing simultaneously. Processes that are running on single-CPU machines only ap-
pear to be running simultaneously because the operating system switches rapidly between them,
giving each a small slice of time in which to execute.
Network applications often need to do two or more things at once. For example, a server may need
to process requests from several clients at once, or handle a request at the same time that it is
watching for new requests. Multitasking simplifies writing such programs tremendously, because it
allows you to launch new processes to deal with each of the things that the application needs to do.
Each process is more or less independent, allowing one process to get on with its work without
worrying about what the others are up to.
Perl supports two types of multitasking. One type, based on the traditional UNIX multiprocessing
model, allows the current process to clone itself by making a call to the fork() function. After
fork() executes, there are two processes, each identical to the other in almost every respect. One
goes off to do one task, and the other does another task.
Another type, based on the more modern concept of a lightweight "thread," keeps all the tasks within
a single process. However, a single program can have multiple threads of execution running through
it, each of which runs independently of the others.
In this section, we introduce fork() and the variables and functions that are relevant to processes.
We discuss multithreading in Chapter 11.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Processes, Pipes, and Signals 27
The Perl fork() function takes no arguments and returns a numeric result code. When fork()
is called, it spawns an exact duplicate of the current process. The duplicate, called the child, shares
the current values of all variables, filehandles (including data in the standard I/O buffers), and other
data structures. In fact, the duplicate process even has the memory of calling fork(). It is like a
man walking into the cloning booth of a science fiction movie. The copy wakes up in the other booth
having all the memories of the original up to and including walking into the cloning booth, but thinking
wait, didn't I start out over there? And who is the handsome gentleman in that other booth?
To ensure peaceful coexistence, it is vital that the parent and child processes know which one is
which. Each process on the system is associated with a unique positive integer, known as its process
ID, or PID.
After a call to fork(), the parent and child examine the function's return value. In the parent proc-
ess, fork() returns the PID of the child. In the child process, fork() returns numeric 0. The code
will go off and do one thing if it discovers it's the parent, and do another if it's the child.
$pid = fork()
Forks a new process. Returns the child's PID in the parent process, and 0 in the child process. In case of error
(such as insufficient memory to fork), returns undef, and sets $! to the appropriate error message.
If the parent and child wish to communicate with each other following the fork, they can do so with
a pipe (discussed later in this chapter in the Pipes section), or via shared memory (discussed in
Chapter 14 in the An Adaptive Preforking Server Using Shared Memory section). For simple mes-
sages, parent and child can send signals to each others' PIDs using the kill() function. The parent
gets the child's PID from fork()'s result code, and the child can get the parent's PID by calling
getppid(). A process can get its own PID by examining the $$ special variable.
$pid = getppi()
Returns the PID of the parent process. Every Perl script has a parent, even those launched directly from the
command line (their parent is the shell process).
$$
The $$ variable holds the current PID for the process. It can be read, but not changed.
We discuss the kill() function later in this chapter, in the Signals section.
If it wishes, a child process can itself fork(), creating a grandchild. The original parent can also
fork() again, as can its children and grandchildren. In this way, Perl scripts can create a whole
tribe of (friendly, we hope) processes. Unless specific action is taken, each member of this tribe
belongs to the same process group.
Each process group has a unique ID, which is usually the same as the process ID of the shared
ancestor. This value can be obtained by calling getpgrp():
$processid = getpgrp([$pid])
For the process specified by $pid, the getpgrp() function returns its process group ID. If no PID is specified,
then the process group of the current process is returned.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Processes, Pipes, and Signals 28
Each member of a process group shares whatever filehandles were open at the time its parent
forked. In particular, they share the same STDIN, STDOUT, and STDERR. This can be modified by
any of the children by closing a filehandle, or reopening it on some other source. However, the
system keeps track of which children have filehandles open, and will not close the file until the last
child has closed its copy of the filehandle.
Figure 2.1 illustrates the typical idiom for forking. Before forking, we print out the PID stored in $$.
We then call fork() and store the result in a variable named $child. If the result is undefined,
then fork() has failed and we die with an error message.
We now examine $child to see whether we are running in the parent or the child. If $child is
nonzero, we are in the parent process. We print out our PID and the contents of $child, which
contains the child process's PID.
If $child is zero, then we are running in the child process. We recover the parent's PID by calling
ppid() and print it and our own PID.
Here's what happens when you run fork.pl:
% fork.pl
PID=372
Parent process: PID=372, child=373
Child process: PID=373, parent=372
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Processes, Pipes, and Signals 29
The system() function executes a command as a subprocess and waits for it to exit. The command and its
arguments can be specified as a single string, or as a list containing the command and its arguments as
separate elements. In the former case, the string will be passed intact to the shell for interpretation. This allows
you to execute commands that contain shell metacharacters (such as input/output re-directs), but opens up
the possibility of executing shell commands you didn't anticipate. The latter form allows you to execute com-
mands with arguments that contain whitespace, shell metacharacters, and other special characters, but it
doesn't interpret metacharacters at all.
The exec() function is like system(), but replaces the current process with the indicated com-
mand. If successful, it never returns because the process is gone. The new process will have the
same PID as the old one, and will share the same STDIN, STDOUT, and STDERR filehandles. How-
ever, other opened filehandles will be closed automatically.1
exec() is often used in combination with fork() to run a command as a subprocess after doing
some special setup. For example, after this code fragment forks, the child reopens STDOUT onto a
file and then calls exec() to run the ls -l command. On UNIX systems, this command generates a
long directory listing. The effect is to run ls -l in the background, and to write its output to the indicated
file.
my $child = fork();
die "Can't fork: $!" unless defined $child;
if ($child == 0) { # we are in the child now
open (STDOUT,">log.txt") or die "open() error: $!";
exec ('ls','-l');
die "exec error(): $!"; # shouldn't get here
}
We use exec() in this way in Chapter 10, in the section titled The Inetd Super Daemon.
Pipes
Network programming is all about interprocess communication (IPC). One process exchanges data
with another. Depending on the application, the two processes may be running on the same ma-
chine, may be running on two machines on the same segment of a local area network, or may be
halfway across the world from each other. The two processes may be related to each other—for
example, one may have been launched under the control of the other—or they may have been
written decades apart by different authors for different operating systems.
The simplest form of IPC that Perl offers is the pipe. A pipe is a filehandle that connects the current
script to the standard input or standard output of another process. Pipes are fully implemented on
UNIX, VMS, and Microsoft Windows ports of Perl, and implemented on the Macintosh only in the
MPW environment.
1You can arrange for some filehandles to remain open across exec() by changing the value of the $~F special variable. See the perlvar
POD document for details.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Processes, Pipes, and Signals 30
Opening a Pipe
The two-argument form of open() is used to open pipes. As before, the first argument is the name
of a filehandle chosen by you. The second argument, however, is a program and all its arguments,
either preceded or followed by the pipe " | " symbol. The command should be entered exactly as
you would type it in the operating system's default shell, which for UNIX machines is the Bourne
shell ("sh") and the DOS/NT command shell on Microsoft Windows systems. You may specify the
full path to the command, for example /usr/bin/ls, or rely on the PATH environment variable to find
the command for you.
If the pipe symbol precedes the program name, then the filehandle is opened for writing and ev-
erything written to the filehandle is sent to the standard input of the program. If the pipe symbol
follows the program, then the filehandle is opened for reading, and everything read from the file-
handle is taken from the program's standard output.
For example, in UNIX the command ls -l will return a listing of the files in the current directory. By
Licensed by
passing an argument of " ls -l | " to open(), we can open a pipe to read from the command:
open (LSFH,"ls -l |") or die "Can't open ls -l: $!";
while (my $line = <LSFH>) {
print "I saw: $line\n";
}
Stjepan Maric
close LSFH;
This fragment simply echoes each line produced by the ls -l command. In a real application, you'd
want to do something more interesting with the information.
As an example of an output pipe, the UNIX wc -lw command will count the lines (option " -l ") and
4218908
words (option " -w ") of a text file sent to it on standard input. This code fragment opens a pipe to
the command, writes a few lines of text to it, and then closes the pipe. When the program runs, the
word and line counts produced by wc are printed in the command window:
open (WC,"| wc -lw") or die "Can't open wordcount: $!";
print WC "This is the first line.\n";
print WC "This is the another line.\n";
print WC "This is the last line.\n";
print WC "Oops. I lied.\n";
close WC;
Using Pipes
Let's look at a complete functional example (Figure 2.2). The program whos_there.pl opens up a
pipe to the UNIX who command and counts the number of times each user is logged in. It produces
a report like this one:
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Processes, Pipes, and Signals 31
% whos_there.pl
jsmith 9
abu 5
lstein 1
palumbo 1
This indicates that users "jsmith" and "abu" are logged in 9 and 5 times, respectively, while "lstein"
and "palumbo" are each logged in once. The users are sorted in descending order of the number
of times they are logged in. This is the sort of script that might be used by an administrator of a busy
system to watch usage.
Lines 1–3: Initialize script—We turn on strict syntax checking with use strict. This catches
mistyped variables, inappropriate use of globals, failure to quote strings, and other potential
errors. We create a local hash %who to hold the set of logged-in users and the number of times
they are logged in.
Line 4: Open pipe to who command—We call open() on a filehandle named WHOFH, using who
| as the second argument. If the open() call fails, die with an error message.
Lines 5–8: Read the output of the who command —We read and process the output of who one
line at a time. Each line of who looks like this:
jsmith pts/23 Aug 12 10:26 (cranshaw.cshl.org)
The fields are the username, the name of the terminal he's using, the date he logged in, and the
address of the remote machine he logged in from (this format will vary slightly from one dialect
of UNIX to another). We use a pattern match to extract the username, and we tally the names
into the %who hash in such a way that the usernames become the keys, and the number of times
each user is logged in becomes the value.
The <WHOFH> loop will terminate at the EOF, which in the case of pipes occurs when the program
at the other end of the pipe exits or closes its standard output.
Lines 9–11: Print out the results—We sort the keys of %who based on the number of times each
user has logged in, and print out each username and login count. The printf() format used
here, " %10s %d\n ", tells printf() to format its first argument as a string that is right justified
on a field 10 spaces long, to print a space, and then to print the second argument as a decimal
integer.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Processes, Pipes, and Signals 32
Line 12: Close the pipe—We are done with the pipe now, so we close() it. If an error is detected
during close, we print out a warning.
With pipes, the open() and close() functions are enhanced slightly to provide additional infor-
mation about the subprocess. When opening a pipe, open() returns the process ID (PID) of the
command at the other end of the pipe. This is a unique nonzero integer that can be used to monitor
and control the subprocess with signals (which we discuss in detail later in the Handling Signals
section). You can store this PID, or you can ignore its special meaning and treat the return value
from open() as a Boolean flag.
When closing a pipe, the close() call is enhanced to place the exit code from the subprocess in
the special global variable $?. Contrary to most Perl conventions, $? is zero if the command suc-
ceeded, and nonzero on an error. The perlvar POD page has more to say about the exit code, as
does the section Handling Child Termination in Chapter 10.
Another aspect of close() is that when closing a write pipe, the close() call will block until the
process at the other end has finished all its work and exited. If you close a read pipe before reading
to the EOF, the program at the other end will get a PIPE signal (see The PIPE Signal) the next time
it tries to write to standard output.
This will run the ls (directory listing) command, capture its output, and assign the output to the
$ls_output scalar.
Internally, Perl opens a pipe to the indicated command, reads everything it prints to standard output,
closes the pipe, and returns the command output as the operator result. Typically at the end of the
result there is a new line, which can be removed with chomp().
Just like double quotes, backticks interpolate scalar variables and arrays. For example, we can
create a variable containing the arguments to pass to ls like this:
$arguments = '-l -F';
$ls_output = `ls $arguments`;
The command's standard error is not redirected by backticks. If the subprocess writes any diagnostic
or error messages, they will be intermingled with your program's diagnostics. On UNIX systems,
you can use the Bourne shell's output redirection system to combine the subprocess's standard
error with its standard output like this:
$ls_output = `ls 2>&1`;
Now $ls_output will contain both the standard error and the standard output of the command.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Processes, Pipes, and Signals 33
Open a pair of filehandles connected by a pipe. The first argument is the name of a filehandle to read from,
and the second is a filehandle to write to. If successful, pipe() returns a true result code.
Why is pipe() useful? It is commonly used in conjunction with the fork() function in order to
create a parent-child pair that can exchange data. The parent process keeps one filehandle and
closes the other, while the child does the opposite. The parent and child process can now commu-
nicate across the pipe as they work in parallel.
A short example will illustrate the power of this technique. Given a positive integer, the facfib.pl script
calculates its factorial and the value of its position in the Fibonacci series. To take advantage of
modern multiprocessing machines, these calculations are performed in two subprocesses so that
both calculations proceed in parallel. The script uses pipe() to create filehandles that the child
processes can use to communicate their findings to the parent process that launched them. When
we run this program, we may see results like this:
% facfib.pl 8
factorial(1) => 1
factorial(2) => 2
factorial(3) => 6
factorial(4) => 24
factorial(5) => 120
fibonacci(1) => 1
factorial(6) => 720
fibonacci(2) => 1
factorial(7) => 5040
fibonacci(3) => 2
factorial(8) => 40320
fibonacci(4) => 3
fibonacci(5) => 5
fibonacci(6) => 8
fibonacci(7) => 13
fibonacci(8) => 21
The results from the factorial and Fibonacci calculation overlap because they are occurring in par-
allel.
Figure 2.3 shows how this program works.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Processes, Pipes, and Signals 34
Lines 1–3: Initialize module—We turn on strict syntax checking and recover the command-line
argument. If no argument is given, we default to 10.
Line 4: Create linked pipes—We create linked pipes with pipe(). READER will be used by the
main (parent) process to read results from the children, which will use WRITER to write their
results.
Lines 5–10: Create first child process—We call fork() to clone the current process. In the
parent process, fork() returns the nonzero PID of the child process. In the child process,
fork() returns numeric 0. If we see that the result of fork() is 0, we know we are the child
process. We close the READER filehandle because we don't need it. We select() WRITER,
making it the default filehandle for output, and turn on autoflush mode by setting $| to a true
value. This is necessary to ensure that the parent process gets our messages as soon as we
write them.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Processes, Pipes, and Signals 35
We now call the factorial() subroutine with the integer argument from the command line.
After this, the child process is done with its work, so we exit(). Our copy of WRITER is closed
automatically.
Lines 11–16: Create the second child process—Back in the parent process, we invoke
fork() again to create a second child process. This one, however, calls the fibonacci()
subroutine rather than factorial().
Lines 17–19: Process messages from children—In the parent process, we close WRITER be-
cause we no longer need it. We read from READER one line at a time, and print out the results.
This will contain lines issued by both children. READER returns undef when the last child has
finished and closed its WRITER filehandle, sending us an EOF. We could close() READER
and check the result code, or let Perl close the filehandle when we exit, as we do here.
Lines 20–25: The factorial() subroutine—We calculate the factorial of the subroutine ar-
gument in a straightforward iterative way. For each step of the calculation, we print out the
intermediate result. Because WRITER has been made the default filehandle with select(),
each print() statement enters the pipe, where it is ultimately read by the parent process.
Lines 26–34: The fibonacci() subroutine—This is identical to factorial() except for the
calculation itself.
Instead of merely echoing its children's output, we could have the parent do something more useful
with the information. We use a variant of this technique in Chapter 14 to implement a preforked Web
server. The parent Web server manages possibly hundreds of children, each of which is responsible
for processing incoming Web requests. To tune the number of child processes to the incoming load,
the parent monitors the status of the children via messages that they send via a pipe launching more
children under conditions of high load, and killing excess children when the load is low.
The pipe() function can also be used to create a filehandle connected to another program in much
the way that piped open() does. We don't use this technique elsewhere, but the general idea is
for the parent process to fork(), and for the child process to reopen either STDIN or STDOUT onto
one of the paired filehandles, and then exec() the desired program with arguments. Here's the
idiom:
pipe(READER,WRITER) or die "pipe no good: $!";
my $child = fork();
die "Can't fork: $!" unless defined $child;
if ($child == 0) { # child process
close READER; # child doesn't need this
open (STDOUT,">&WRITER"); # STDOUT now goes to writer
exec $cmd,$args;
die "exec failed: $!";
}
close WRITER; # parent doesn't need this
At the end of this code, READER will be attached to the standard output of the command named
$cmd, and the effect is almost exactly identical to this code:
open (READER,"$cmd $args |") or die "pipe no good: $!";
Bidirectional Pipes
Both piped open() and pipe() create unidirectional filehandles. If you want to both read and write
to another process, you're out of luck. In particular, this sensible-looking syntax does not work:
open(FH,"| $cmd |");
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Processes, Pipes, and Signals 36
One way around this is to call pipe() twice, creating two pairs of linked filehandles. One pair is
used for writing from parent to child, and the other for child to parent, rather like a two-lane highway.
We won't go into this technique, but it's what the standard IPC::Open2 and IPC::Open3 modules do
to create a set of filehandles attached to the STDIN, STDOUT, and STDERR of a subprocess.
A more elegant way to create a bidirectional pipe is with the socketpair() function. This creates
two linked filehandles like pipe() does, but instead of being a one-way connection, both filehandles
are read/write. Data written into one filehandle comes out the other one, and vice versa. Because
the socketpair() function involves the same concepts as the socket() function used for net-
work communications, we defer our discussion of it until Chapter 4.
The -t and -S file tests can distinguish other special types of filehandle. If a filehandle is opened
on a terminal (the command-line window), then -t will return true. Programs can use this to test
STDIN to see if the program is being run interactively or has its standard input redirected from a file:
print "Running in batch mode, confirmation prompts disabled.\n"
unless -t STDIN;
The -S test detects whether a filehandle is opened on a network socket (introduced in Chapter 3):
print "Network active.\n" if -S FH
There are more than a dozen other file test functions that can give you a file's size, modification
date, ownership, and other information. See the perlfunc POD page for details.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Processes, Pipes, and Signals 37
The two scripts are shown in Figures 2.4 and 2.5. Of note is that write_ten.pl puts the pipe into
autoflush mode so that each line of text is sent down the pipe immediately, rather than being buffered
locally. write_ten.pl also sleep()s for one second after writing each line of text, giving
read_three.pl a chance to report that the text was received. Together, these steps make it easier
for us to see what is happening. When we run write_ten.pl we see the following:
Figure 2.4. The write_ten.pl script writes ten lines of text to a pipe
Figure 2.5. The read_three.pl script reads three lines of text from standard input
% write_ten.pl
Writing line 1
Read_three got: This is line number 1
Writing line 2
Read_three got: This is line number 2
Writing line 3
Read_three got: This is line number 3
Writing line 4
Broken pipe
%
Everything works as expected through line three, at which point read_three.pl exits. When
write_ten.pl attempts to write the fourth line of text, the script crashes with a Broken pipe error.
The statement that prints out the number of lines successfully passed to the pipe is never executed.
When a program attempts to write to a pipe and no program is reading at the other end, this results
in a PIPE exception. This exception, in turn, results in a PIPE signal being delivered to the writer.
By default this signal results in the immediate termination of the offending program. The same error
occurs in network applications when the sender attempts to transmit data to a remote program that
has exited or has stopped receiving.
To deal effectively with PIPE, you must install a signal handler, and this brings us to the next major
topic.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Processes, Pipes, and Signals 38
Signals
As with filehandles, understanding signals is fundamental to network programming. A signal is a
message sent to your program by the operating system to tell it that something important has oc-
curred. A signal can indicate an error in the program itself such as an attempt to divide by zero, an
event that requires immediate attention such as an attempt by the user to interrupt the program, or
a noncritical informational event such as the termination of a subprocess that your program has
launched.
In addition to signals sent by the operating system, processes can signal each other. For example,
when the user presses control-C (^C) on the keyboard to send an interrupt signal to the currently
running program, that signal is sent not by the operating system, but by the command shell that pro
cesses and interprets keystrokes. It is also possible for a process to send signals to itself.
Common Signals
The POSIX standard defines nineteen signals. Each has a small integer value and a symbolic name.
We list them in Table 2.2 (the gaps in the integer sequence represent nonstandard signals used by
some systems).
The third column of the table indicates what happens when a process receives the signal. Some
signals do nothing. Others cause the process to terminate immediately, and still others terminate
the process and cause a core dump. Most signals can be "caught." That is, the program can install
a handler for the signal and take special action when the signal is received. A few signals, however,
cannot be intercepted in this way.
You don't need to understand all of the signals listed in Table 2.2 because either they won't occur
during the execution of a Perl script, or their generation indicates a low-level bug in Perl itself that
you can't do anything about. However, a handful of signals are relatively common, and we'll look at
them in more detail now.
HUP signals a hangup event. This typically occurs when a user is running a program from the com-
mand line, and then closes the command-line window or exits the interpreter shell. The default action
for this signal is to terminate the program.
INT signals a user-initiated interruption. It is generated when the user presses the interrupt key,
typically ^C. The default behavior of this signal is to terminate the program. QUIT is similar to
INT, but also causes the program to generate a core file (on UNIX systems). This signal is issued
when the user presses the "quit" key, ordinarily ^\.
Table 2.2. POSIX Signals
Signal Name Value Notes Comment
HUP 1 A Hangup detected
INT 2 A Interrupt from keyboard
QUIT 3 A Quit from keyboard
ILL 4 A Illegal Instruction
ABRT 6 C Abort
Notes:
A—Default action is to terminate process.
B—Default action is to ignore the signal.
C—Default action is to terminate process and dump core.
D—Default action is to stop the process.
E—Default action is to resume the process.
F—Signal cannot be caught or ignored.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Processes, Pipes, and Signals 39
By convention, TERM and KILL are used by one process to terminate another. By default, TERM
causes immediate termination of the program, but a program can install a signal handler for TERM
to intercept the terminate request and possibly perform some cleanup actions before quitting. The
KILL signal, in contrast, is uncatchable. It causes an immediate shutdown of the process without
chance of appeal. For example, when a UNIX system is shutting down, the script that handles the
shutdown process first sends a TERM to each running process in turn, giving it a chance to clean
up. If the process is still running a few tens of seconds later, then the shutdown script sends a KILL.
The PIPE signal is sent when a program writes to a pipe or socket but the process at the remote
end has either exited or closed the pipe. This signal is so common in networking applications that
we will look at it closely in the Handling PIPE Exceptions section.
ALRM is used in conjunction with the alarm() function to send the program a prearranged signal
after a certain amount of time has elapsed. Among other things, ALRM can be used to time out
blocked I/O calls. We will see examples of this in the Timing Out Long-Running Operations section.
CHLD occurs when your process has launched a subprocess, and the status of the child has changed
in some way. Typically the change in status is that the child has exited, but CHLD is also generated
whenever the child is stopped or continued. We discuss how to deal with CHLD in much greater
detail in Chapters 4 and 9.
STOP and TSTP both have the effect of stopping the current process. The process is put into sus-
pended animation indefinitely; it can be resumed by sending it a CONT signal. STOP is generally
used by one program to stop another. TSTP is issued by the interpreter shell when the user presses
the stop key (^Z on UNIX systems). The other difference between the two is that TSTP can be caught,
but STOP cannot be caught or ignored.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Processes, Pipes, and Signals 40
Catching Signals
You can catch a signal by adding a signal handler to the %SIG global hash. Use the name of the
signal you wish to catch as the hash key. For example, use $SIG{INT} to get or set the INT signal
handler. As the value, use a code reference: either an anonymous subroutine or a reference to a
named subroutine. For example, Figure 2.6 shows a tiny script that installs an INT handler. Instead
of terminating when we press the interrupt key, it prints out a short message and bumps up a counter.
This goes on until the script counts three interruptions, at which point it finally terminates. In the
transcript that follows, the "Don't interrupt me!" message was triggered each time I typed ^C:
Licensed by
Stjepan Maric
% interrupt.pl
I'm sleeping.
4218908 Figure 2.6. Catching the INT signal
I'm sleeping.
Don't interrupt me! You've already interrupted me 1x.
I'm sleeping.
I'm sleeping.
Don't interrupt me! You've already interrupted me 2x.
I'm sleeping.
Don't interrupt me! You've already interrupted me 3x.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Processes, Pipes, and Signals 41
$SIG{INT} = sub {
$interruptions++;
warn "Don't interrupt me! You've already interrupted
me ${interruptions}x.\n";
};
In addition to code references, %SIG recognizes two special cases. The string " DEFAULT " restores
the default behavior of the signal. For example, setting $SIG{INT} to " DEFAULT " will cause the
INT signal to terminate the script once again. The string " IGNORE " will cause the signal to be
ignored altogether.
As previously mentioned, don't bother installing a handler for either KILL or STOP. These signals
can be neither caught nor ignored, and their default actions will always be performed.
If you wish to use the same routine to catch several different signals, and it is important for the
subroutine to distinguish one signal from another, it can do so by looking at its first argument, which
will contain the name of the signal. For example, for INT signals, the handler will be called with the
string " INT ":
$SIG{TERM} = $SIG{HUP} = $SIG{INT} = \&handler
sub handler {
my $sig = shift;
warn "Handling a $sig signal.\n";
}
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Processes, Pipes, and Signals 42
When a PIPE signal is received, the handler will undefine the $ok flag, making it false.
The other modification is to replace the simple for() loop in the original version with a more so-
phisticated version that checks the status of $ok. If the flag becomes false, the loop exits. When
we run the modified script, we see that the program runs to completion, and correctly reports the
number of lines successfully written:
% write_ten_ph.pl
Writing line 1
Read_three got: This is line number 1
Writing line 2
Read_three got: This is line number 2
Writing line 3
Read_three got: This is line number 3
Writing line 4
Wrote 3 lines of text
Another general technique is to set $SIG{PIPE} to 'IGNORE', in order to ignore the PIPE signal
entirely. It is now our responsibility to detect that something is amiss, which we can do by examining
the result code from print(). If print() returns false, we exit the loop.
Figure 2.8 shows the code for write_ten_i.pl, which illustrates this technique. This script begins by
setting $SIG{PIPE} to the string 'IGNORE', suppressing PIPE signals. In addition, we modify the
print loop so that if print() is successful, we bump up $count as before, but if it fails, we issue
a warning and exit the loop via last.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Processes, Pipes, and Signals 43
Notice that the error message that appears in $! after the unsuccessful print is "Broken pipe." If we
wanted to treat this error separately from other I/O errors, we could explicitly test its value via a
pattern match, or, better still, check its numeric value against the numeric error constant EPIPE. For
example:
use Errno ':POSIX';
...
unless (print PIPE "This is line number $_\n") { # handle write error
last if $! == EPIPE; # on PIPE, just terminate the loop
die "I/O error: $!"; # otherwise die with an error message
}
Sending Signals
A Perl script can send a signal to another process using the kill() function:
$count = kill($signal,@processes)
The kill() function sends signal $signal to one or more processes. You may specify the signal numerically,
for example 2, or symbolically as in " INT ". @processes is a list of one or more process IDs to deliver the
signal to. The number of processes successfully signaled is returned as the kill() function result.
One process can only signal another if it has sufficient privileges to do so. In general, a process
running under a normal user's privileges can signal only other processes that are running under the
same user's privileges. A process running with root or superuser privileges, however, can signal
any other process.
The kill() function provides a few tricks. If you use the special signal number 0, then kill()
will return the number of processes that could have been signaled, without actually delivering the
signal. If you use a negative number for the process ID, then kill() will treat the absolute value
of the number as a process group ID and deliver the signal to all members of the group.
A script can send a signal to itself by calling kill() on the variable $$, which holds the current
process ID. For example, here's a fancy way for a script to commit suicide:
kill INT => $$; # same as kill('INT',$$)
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Processes, Pipes, and Signals 44
Indeed, the implementation of signals on Windows systems is currently extremely limited. Simple
things, such as an INT handler to catch the interrupt key, will work. More complex things, such as
CHLD handlers to catch the death of a subprocess, do not work. This is an area of active development
so be sure to check the release notes before trying to write or adapt any code that depends heavily
on signals.
Signal handling is not implemented in MacPerl.
$slept = sleep([$seconds])
Sleep for the indicated number of seconds or until a signal is received. If no argument is provided, this function
will sleep forever. On return, sleep() will return the number of seconds it actually slept.
Another exception is the four-argument version of select(), which can be used to perform a timed
wait until one or more of a set of filehandles are ready for I/O. This function is described in detail in
Chapter 12.
Sometimes the automatic restarting of system calls is not what you want. For example, consider an
application that prompts a user to type her password and tries to read the response from standard
input. You might want the read to time out after some period of time in case the user has wandered
off and left the terminal unattended. This fragment of code might at first seem to do the trick:
my $timed_out = 0;
Here we use the alarm() function to set a timer. When the timer expires, the operating system
generates an ALRM signal, which we intercept with a handler that sets the $timed_out global to
true. In this code we call alarm() with a five-second timeout, and then read a line of input from
standard input. After the read completes, we call alarm() again with an argument of zero, turning
the timer off. The idea is that the user will have five seconds in which to type a password. If she
doesn't, the alarm clock goes off and we fall through to the rest of the program.
$seconds_left = alarm($seconds)
Arrange for an ALRM signal to be delivered to the process after $seconds. The function result is the number
of seconds left from the previous timer, if any. An argument of zero disables the timer.
The problem is that Perl automatically restarts slow system calls, including <>. Even though the
alarm clock has gone off, we remain in the <> call, waiting for the user's keyboard input.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Processes, Pipes, and Signals 45
The solution to this problem is to use eval{} and a local ALRM handler to abort the read. The
general idiom is this:
print STDERR "type your password: ";
my $password =
eval {
local $SIG{ALRM} = sub { die "timeout\n" };
alarm (5); # five second timeout
return <STDIN>;
};
alarm (0);
print STDERR "you timed out\n" if $@ =~ /timeout/;
Instead of having an ALRM handler in the main body of the program, we localize it within an
eval{} block. The eval{} block sets the alarm, as before, and attempts to read from STDIN. If
<> returns before the timer goes off, then the line of input is returned from the eval{} block, and
assigned to $password.
However, if the timer goes off before the input is complete, the ALRM handler executes, dying with
the error message "timeout." However, since we are dying within an eval{} block, the effect of this
is for eval{} to return undef, setting the variable $@ to the last error message. We pattern match
$@ for the timeout message, and print a warning if found.
In either case, we turn off the timer immediately after returning from the eval{} block in order to
avoid having the timer go off at an inconvenient moment.
We will use this technique several times in later chapters when we need to time out slow network
calls.
Summary
This chapter introduced three topics that we will use throughout this book.
Processes correspond to an instance of a running program. Perl can create new processes via its
system() and fork() commands, or replace the current process with a different one with exec().
Pipes are I/O connections between two processes. A pipe looks and acts like a filehandle, but it is
connected to another process rather than to a file. If a pipe is opened for reading, data read from it
is taken from the standard output of the process at the other end. If a pipe is opened for writing, data
printed to it is received by the other process on its standard input.
Signals provide programs with notification of exceptional conditions, among which are PIPE errors
and other I/O-related problems. Signals are also useful for timing out long-running operations and
catching urgent requests from the user. You can manage incoming signals by installing signal han-
dlers in the %SIG hash, and you can send signals to other processes (or your own) using the
kill() function.
The next chapter goes into the particulars of Berkeley sockets before leading into a full discussion
of TCP networking.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
46
This chapter introduces Perl's version of the Berkeley socket API, the low-level network API that
underlies all of Perl's networking modules. It explains the different kinds of sockets, demonstrates
basic Berkeley-based clients and servers, and discusses some of the common traps programmers
encounter when first working with the API.
Protocols
We've thrown around the word protocol, but what is it, exactly? A protocol is simply an agreed-upon
set of standards whereby two software components interoperate. There are protocols at every level
of the networking stack (Figure 3.1).
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Introduction to Berkeley Sockets 47
At the lowest level is the hardware or datalink layer, where, for example, the drivers built into Ethernet
network interface cards have a common understanding of how to interpret the pulses of electric
potential on the network wire in terms of Ethernet frames, how to detect when the wire is in use by
another card, and how to detect and resolve collisions between two cards trying to transmit at the
same time.
One level up is the network layer. At this layer, information is grouped into packets that consist of
a header containing the sender and recipient's address, and a payload that consists of the actual
data to send. Payloads are typically in the range of 500 bytes to 1500 bytes. Internet routers act at
the IP layer by reading packet headers and figuring out how to route them to their destinations. The
main protocol at this layer is the Internet Protocol, or IP.
The transport layer is concerned with creating data packets and ensuring the integrity of their con-
tents. The two important protocols at this layer are the Transmission Control Protocol (TCP), which
provides reliable connection-oriented communications, and the User Datagram Protocol (UDP),
which provides an unreliable message-oriented service. These protocols are responsible for getting
the data to its destination. They don't care what is actually inside the data stream.
At the top of the stack is the application layer, where the content of the data stream does matter.
There is an abundance of protocols at this level, including such familiar and unfamiliar names as
HTTP, FTP, SMTP, POP3, IMAP, SNMP, XDMCP, and NNTP. These protocols specify, sometimes
in excruciating detail, how a client should contact a server, what messages are allowed, and what
information to exchange with each message.
The combination of the network layer and the transport layer is known as TCP/IP, named after the
two major protocols that operate at those layers.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Introduction to Berkeley Sockets 48
To understand this, consider exchanging the number 1984. To exchange it as text, one host sends
the other the string 1984, which, in the common ASCII character set, corresponds to the four hex-
adecimal bytes 0x31 0x39 0x38 0x34. These four bytes will be transferred in order across the
network, and (provided the other host also speaks ASCII) will appear at the other end as "1984".
However, 1984 can also be treated as a number, in which case it can fit into the two-byte integer
represented in hexadecimal as 0x7C0. If this number is already stored in the local host as a number,
it seems sensible to transfer it across the network in its native two-byte form rather than convert it
into its four-byte text representation, transfer it, and convert it back into a two-byte number at the
other end. Not only does this save some computation, but it uses only half as much network capacity.
Unfortunately, there's a hitch. Different computer architectures have different ways of storing inte-
gers and floating point numbers. Some machines use two-byte integers, others four-byte integers,
and still others use eight-byte integers. This is called word size. Furthermore, computer architectures
have two different conventions for storing integers in memory. In some systems, called big-endian
architectures, the most significant part of the integer is stored in the first byte of a two-byte integer.
On such systems, reading from low to high, 1984 is represented in memory as the two bytes:
0x07 0xC0
low -> high
On little-endian architectures, this convention is reversed, and 1984 is stored in the opposite ori-
entation:
0xC0 0x07
low -> high
These architectures are a matter of convention, and neither has a significant advantage over the
other. The problem comes when transferring such data across the network, because this byte pair
has to be transferred serially as two bytes. Data in memory is sent across the network from low to
high, so for big-endian machines the number 1984 will be transferred as 0x07 0xC0, while for little-
endian machines the numbers will be sent in the reverse order. As long as the machine at the other
end has the same native word size and byte order, these bytes will be correctly interpreted as 1984
when they arrive. However, if the recipient uses a different byte order, then the two bytes will be
interpreted in the wrong order, yielding hexadecimal 0xC007, or decimal 49,159. Even worse, if the
recipient interprets these bytes as the top half of a four-byte integer, it will end up as
0xC0070000, or 3,221,684,224. Someone's anniversary party is going to be very late.
Because of the potential for such binary chaos, text-based protocols are the norm on the Internet.
All the common protocols convert numeric information into text prior to transferring them, even
though this can result in more data being transferred across the net. Some protocols even convert
data that doesn't have a sensible text representation, such as audio files, into a form that uses the
ASCII character set, because this is generally easier to work with. By the same token, a great many
protocols are line-oriented, meaning that they accept commands and transmit data in the form of
discrete lines, each terminated by a commonly agreed-upon newline sequence.
A few protocols, however, are binary. Examples include Sun's Remote Procedure Call (RPC) sys-
tem, and the Napster peer-to-peer file exchange protocol. Such protocols have to be exceptionally
careful to represent binary data in a common format. For integer numbers, there is a commonly
recognized network format. In network format, a "short" integer is represented in two big-endian
bytes, while a "long" integer is represented with four big-endian bytes. As we will see in Chapter
19, Perl's pack() and unpack () functions provide the ability to convert numbers into network
format and back again.
Floating point numbers and more complicated things like data structures have no commonly ac-
cepted network representation. When exchanging binary data, each protocol has to work out its own
way of representing such data in a platform-neutral fashion.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Introduction to Berkeley Sockets 49
We will stick to text-based protocols for much of this book. However, to give you a taste for what it's
like to use a binary protocol, the UDP-based real-time chat system in Chapters 19 and 20 exchanges
platform-neutral binary messages.
Berkeley Sockets
Berkeley sockets are part of an application programming interface (API) that specifies the data
structures and function calls that interact with the operating system's network subsystem. The name
derives from the origins of the API in release 4.2 of the Berkeley Standard Distribution (4.2BSD) of
UNIX. Berkeley sockets act at the transport layer: They help get the data where it's going, but have
nothing to say about the content of the data.
Berkeley sockets are part of an API, not a specific protocol, which defines how the programmer
interacts with an idealized network. Although strongly associated with the TCP/IP network protocol
for which the API was first designed, the Berkeley sockets API is generic enough to support other
types of network, such as Novell Netware, Windows NT networking, and Appletalk.
Perl provides full support for Berkeley sockets on most of the platforms it runs on. On certain plat-
forms such as the Windows and Macintosh ports, extension modules also give you access to the
non-Berkeley APIs native to those machines. However, if you're interested in writing portable ap-
plications you'll probably want to stick to the Berkeley API.
In addition to these domains, there are many others including AF_APPLETALK, AF_IPX, and
AF_X25, each corresponding to a particular addressing scheme. AF_INET6, corresponding to the
long addresses of TCP/IP version 6, will become important in the future, but is not yet supported by
Perl.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Introduction to Berkeley Sockets 50
The AF_ prefix stands for "address family." In addition, there is a series of "protocol family" constants
starting with the PF_ prefix. For example, there is a PF_INET constant that corresponds to
AF_INET. These constants evaluate to the same value and can, in fact, be used interchangeably.
The distinction between them is a historical artifact, and you'll find that published code sometimes
uses the one and sometimes the other.
Licensed by
Table 3.2. Constants Exported by Socket
Constant Description
SOCK_STREAM A continuous stream of data
SOCK_DGRAM Individual packets of data
Stjepan Maric
SOCK_RAW Access to internal protocols and interfaces
Perl fully supports the SOCK_STREAM and SOCK_DGRAM socket types. SOCK_RAW is supported
through an add-on module named Net::Raw.
The TCP and UDP protocols are supported directly by the Perl sockets API. You can get access to
the ICMP and raw protocols via the Net::ICMP and Net::Raw third-party modules, but we do not
discuss them in this book (it would be possible, but probably not sensible, to reimplement TCP in
Perl using raw packets).
Generally there exists a single protocol to support a given domain and type. When creating a socket,
you must be careful to set the domain and socket type to match the protocol you've selected. The
possible combinations are summarized in Table 3.4.
Table 3.4. Allowed Combinations of Socket Type and Protocol in the INET and UNIX Domains
Domain Type Protocol
AF_INET SOCK_STREAM tcp
AF_INET SOCK_DGRAM udp
AF_UNIX SOCK_STREAM PF_UNSPEC
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Introduction to Berkeley Sockets 51
The allowed combinations of socket domain, type, and protocol are few. SOCK_STREAM goes with
TCP, and SOCK_DGRAM goes with UDP. Also notice that the AF_UNIX address family doesn't use
a named protocol, but a pseudoprotocol named PF_UNSPEC (for "unspecified").
The Perl object-oriented IO::Socket module (discussed in Chapter 5) can fill in the correct socket
type and protocol automatically when provided with partial information.
Datagram Sockets
Datagram-type sockets provide for the transmission of connectionless, unreliable, unsequenced
messages. The UDP is the chief datagram-style protocol used by the Internet protocol family.
As the diagram in Figure 3.2 shows, datagram services resemble the postal system. Like a letter or
a telegram, each datagram in the system carries its destination address, its return address, and a
certain amount of data. The Internet protocols will make the best effort to get the datagram delivered
to its destination.
Figure 3.2. Datagram sockets provide connectionless, unreliable, unsequenced transmission of message
There is no long-term relationship between the sending socket and the recipient socket: A client can
send a datagram off to one server, then immediately turn around and send a datagram to another
server using the same socket. But the connectionless nature of UDP comes at a price. Like certain
countries' postal systems, it is very possible for a datagram to get "lost in the mail." A client cannot
know whether a server has received its message until it receives an acknowledgment in reply. Even
then, it can't know for sure that a message was lost, because the server might have received the
original message and the acknowledgment got lost!
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Introduction to Berkeley Sockets 52
Datagrams are neither synchronized nor flow controlled. If you send a set of datagrams out in a
particular order, they might not arrive in that order. Because of the vagaries of the Internet, the first
datagram may go by one route, and the second one may take a different path. If the second route
is faster than the first one, the two datagrams may arrive in the opposite order from which they were
sent. It is also possible for a datagram to get duplicated in transit, resulting in the same message
being received twice.
Because of the connectionless nature of datagrams, there is no flow control between the sender
and the recipient. If the sender transmits datagrams faster than the recipient can process them, the
recipient has no way to signal the sender to slow down, and will eventually start to discard packets.
Although a datagram's delivery is not reliable, its contents are. Modern implementations of UDP
provide each datagram with a checksum that ensures that its data portion is not corrupted in transit.
Stream Sockets
The other major paradigm is stream sockets, implemented in the Internet domain as the TCP pro-
tocol. Stream sockets provide sequenced, reliable bidirectional communications via byte-oriented
streams. As depicted in Figure 3.3, stream sockets resemble a telephone conversation. Clients
connect to servers using their address, the two exchange data for a period of time, and then one of
the pair breaks off the connection.
Reading and writing to stream sockets is a lot like reading and writing to a file. There are no arbitrary
size limits or record boundaries, although you can impose a record-oriented structure on the stream
if you like. Because stream sockets are sequenced and reliable, you can write a series of bytes into
a socket secure in the knowledge that they will emerge at the other end in the correct order, provided
that they emerge at all ("reliable" does not mean immune to network errors).
TCP also implements flow control. Unlike UDP, where the danger of filling the data-receiving buffer
is very real, TCP automatically signals the sending host to suspend transmission temporarily when
the reading host is falling behind, and to resume sending data when the reading host is again ready.
This flow control happens behind the scenes and is ordinarily invisible.
If you have ever used the Perl open(FH, " | command ") syntax to open up a pipe to an external
program, you will find that working with stream sockets is not much different. The major visible
difference is that, unlike pipes, stream sockets are bidirectional.
Although it looks and acts like a continuous byte stream, the TCP protocol is actually implemented
on top of a datagram-style service, in this case the low-level IP protocol. IP packets are just as
unreliable as UDP datagrams, so behind the scenes TCP is responsible for keeping track of packet
sequence numbers, acknowledging received packets, and retransmitting lost packets.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Introduction to Berkeley Sockets 53
UDP is also preferred when the interaction between one host and the other is very short. The length
of time to set up and take down a TCP connection is about eightfold greater than the exchange of
a single byte of data via UDP (for details, see [Stevens 1996]). If relatively small amounts of data
are being exchanged, the TCP setup time will dominate performance. Even after a TCP connection
is established, each transmitted byte consumes more bandwidth than UDP because of the additional
overhead for ensuring reliability.
Another common scenario occurs when a host must send the same data to many places; for ex-
ample, it wants to transmit a video stream to multiple viewers. The overhead to set up and manage
a large number of TCP connections can quickly exhaust operating system resources, because a
different socket must be used for each connection. In contrast, sending a series of UDP datagrams
is much more sparing of resources. The same socket can be reused to send datagrams to many
hosts.
Whereas TCP is always a one-to-one connection, UDP also allows one-to-many and many-to-many
transmissions. At one end of the spectrum, you can address a UDP datagram to the "broadcast
address," broadcasting a message to all listening hosts on the local area network. At the other end
of the spectrum, you can target a message to a predefined group of hosts using the "multicast"
facility of modern IP implementations. These advanced features are covered in Chapters 20 and 21.
The Internet's DNS is a common example of a UDP-based service. It is responsible for translating
hostnames into IP addresses, and vice versa, using a loose-knit network of DNS servers. If a client
does not get a response from a DNS server, it just retransmits its request. The overhead of an
occasional lost datagram outweighs the overhead of setting up a new TCP connection for each
request. Other common examples of UDP services include Sun's Network File System (NFS) and
the Trivial File Transfer Protocol (TFTP). The latter is used by diskless workstations during boot in
order to load their operating system over the network. UDP was originally chosen for this purpose
because its implementation is relatively small. Therefore, UDP fit more easily into the limited ROM
space available to workstations at the time the protocol was designed.
The TCP/IP protocol suite is described well in [Stevens 1994, Wright and Stevens 1995, and Stevens
1996], as well as in RFC 1180, A TCP/IP Tutorial .
Socket Addressing
In order for one process to talk to another, each of them has to know the other's address. Each
networking domain has a different concept of what an address is. For the UNIX domain, which can
be used only between two processes on the same host machine, addresses are simply paths on
the host's filesystem, such as /usr/tmp/log. For the Internet domain, each socket address has
three parts: the IP address, the port, and the protocol.
IP Addresses
In the currently deployed version of TCP/IP, IPv4, the IP address is a 32-bit number used to identify
a network interface on the host machine. A series of subnetworks and routing tables enables any
machine on the Internet to send packets to any other machine based on its IP address.
For readability, the four bytes of a machine's IP address are usually spelled out as a series of four
decimal digits separated by dots to create the "dotted quad address" that network administrators
have come to know and love. For example, 143.48.7.1 is the IP address of one of the servers at my
workplace. Expressed as a 32-bit number in the hexadecimal system, this address is 0x8f3071.
Many of Perl's networking calls require you to work with IP addresses in the form of packed binary
strings. IP addresses can be converted manually to binary format and back again using pack()
and unpack() with a template of "C4" (four unsigned characters). For example, here's how to
convert 18.157.0.125 into its packed form and then reverse the process:
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Introduction to Berkeley Sockets 54
You usually won't have to do this, however, because Perl provides convenient high-level functions
to handle this conversion for you.
Most hosts have two addresses, the "loopback" address 127.0.0.1 (often known by its symbolic
name "localhost") and its public Internet address. The loopback address is associated with a device
that loops transmissions back onto itself, allowing a client on the host to make an outgoing con-
nection to a server running on the same host. Although this sounds a bit pointless, it is a powerful
technique for application development, because it means that you can develop and test software
on the local machine without access to the network.
The public Internet address is associated with the host's network interface card, such as an Ethernet
card. The address is either assigned to the host by the network administrator or, in systems with
dynamic host addressing, by a Boot Protocol (BOOTP) or Dynamic Host Configuration Protocol
(DHCP) server. If a host has multiple network interfaces installed, each one can have a distinct IP
address. It's also possible for a single interface to be configured to use several addresses. Chapter
21 discusses IO::Interface, a third-party Perl module that allows a Perl script to examine and alter
the IP addresses assigned to its interface cards.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Introduction to Berkeley Sockets 55
To describe where the network/host split occurs for routing purposes, networks use a netmask,
which is a bitmask with 1s in the positions of the network part of the IP address. Like the IP address
itself, the netmask is usually written in dotted-quad form. Continuing with our example, CSHL has
a netmask of 255.255.0.0, which, when written in binary, is
11111111,11111111,00000000,00000000.
Historically, IP networks were divided into three classes on the basis of their netmasks (Table
3.5). Class A networks have a netmask of 255.0.0.0 and approximately 16 million hosts. Class B
networks have a netmask of 255.255.0.0 and some 65,000 hosts, and class C networks use the
netmask 255.255.255.0 and support 254 hosts (as we will see, the first and last host numbers in a
network range are unavailable for use as a normal host address).
Table 3.5. Address Classes and Their Netmasks
Class Netmask Example Address Network Park Host Part
A 255.0.0.0 120.155.32.5 120. 155.32.5
B 255.255.0.0 128.157.32.5 128.157. 32.5
C 255.255.255.0 192.66.12.56 192.66.12. 56
As the Internet has become more crowded, however, networks have had to be split up in more
flexible ways. It's common now to see netmasks that don't end at byte boundaries. For example,
the netmask 255.255.255.128 (binary 11111111,11111111,11111111,10000000) splits the last
byte in half, creating a set of 126-host networks. The modern Internet routes packets based on this
more flexible scheme, called Classless Inter-Domain Routing (CIDR). CIDR uses a concise con-
vention to describe networks in which the network address is followed by a slash and an integer
containing the number of 1s in the mask. For example, CSHL's network is described by the CIDR
address 143.48.0.0/16. CIDR is described in detail in RFCs 1517 through 1520, and in the FAQs
listed in Appendix D.
Figuring out the network and broadcast addresses can be confusing when you work with netmasks
that do not end at byte boundaries. The Net::Netmask module, available on CPAN, provides facilities
for calculating these values in an intuitive way. You'll also find a short module that I wrote, Net::Net-
maskLite, in Appendix A. You might want to peruse this code in order to learn the relationships
among the network address, broadcast address, and netmask.
The first and last addresses in a subnet have special significance and cannot be used as ordinary
host addresses. The first address, sometimes known as the all-zeroes address, is reserved for use
in routing tables to denote the network as a whole. The last address in the range, known as the all-
ones address, is reserved for use as the broadcast address. IP packets sent to this address will be
received by all hosts on the subnet. For example, for the network 192.18.4.x (a class C address or
192.18.4.0/24 in CIDR format), the network address is 192.18.4.0 and the broadcast address is
192.18.4.255. We will discuss broadcasting in detail in Chapter 20.
In addition, several IP address ranges have been set aside for special purposes (Table 3.6). The
class A network 10.x.x.x, the 16 class B networks 172.16.x.x through 172.31.x.x, and the 255 class
C addresses 192.168.0.x through 192.168.255.x are reserved for use as internal networks. An or-
ganization may use any of these networks internally, but must not connect the network directly to
the Internet. The 192.168.x.x networks are used frequently in testing, or placed behind firewall sys-
tems that translate all the internal network addresses into a single public IP address. The network
addresses 224.x.x.x through 239.x.x.x are reserved for multicasting applications (Chapter 21), and
everything above 240.x.x.x is reserved for future expansion.
Table 3.6. Reserved IP Addresses
Address Description
127.0.0.x Loopback interface
10.x.x.x Private class A address
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Introduction to Berkeley Sockets 56
Address Description
172.16.x.x–172.32.x.x Private class B addresses
192.168.0.x–172.168.255.x Private class C addresses
Finally, IP address 127.0.0.x is reserved for use as the loopback network. Anything sent to an ad-
dress in this range is received by the local host.
IPv6
Although there are more than 4 billion possible IPv4 addresses, the presence of several large re-
served ranges and the way the addresses are allocated into subnetworks reduces the effective
number of addresses considerably. This, coupled with the recent explosion in network-connected
devices, means that the Internet is rapidly running out of IP addresses. The crisis has been fore-
stalled for now by various dynamic host-addressing and address-translation techniques that share
a pool of IP addresses among a larger set of hosts. However, the new drive to put toaster ovens,
television set-top boxes, and cell phones on the Internet is again threatening to exhaust the address
space.
This is one of the major justifications for the new version of TCP/IP, known as IPv6, which expands
the IP address space from 4 to 16 bytes. IPv6 is being deployed on the Internet backbones now,
but this change will not immediately affect local area networks, which will continue to use addresses
backwardly compatible with IPv4. Perl has not yet been updated to support IPv6, but will undoubtedly
do so by the time that IPv6 is widely implemented.
More information about IPv6 can be found in [Stevens 1996] and [Hunt 1998 ].
Network Ports
Once a message reaches its destination IP address, there's still the matter of finding the correct
program to deliver it to. It's common for a host to be running multiple network servers, and it would
be impractical, not to say confusing, to deliver the same message to them all. That's where the port
number comes in. The port number part of the socket address is an unsigned 16-bit number ranging
from 1 to 65535. In addition to its IP address, each active socket on a host is identified by a unique
port number; this allows messages to be delivered unambiguously to the correct program. When a
program creates a socket, it may ask the operating system to associate a port with the socket. If the
port is not being used, the operating system will grant this request, and will refuse other programs
access to the port until the port is no longer in use. If the program doesn't specifically request a port,
one will be assigned to it from the pool of unused port numbers.
There are actually two sets of port numbers, one for use by TCP sockets, and the other for use by
UDP-based programs. It is perfectly all right for two programs to be using the same port number
provided that one is using it for TCP and the other for UDP.
Not all port numbers are created equal. The ports in the range 0 through 1023 are reserved for the
use of "well-known" services, which are assigned and maintained by ICANN, the Internet Corpora-
tion for Assigned Names and Numbers. For example, TCP port 80 is reserved for use for the HTTP
used by Web servers, TCP port 25 is used for the SMTP used by e-mail transport agents, and UDP
port 53 is used for the domain name service (DNS). Because these ports are well known, you can
be pretty certain that a Web server running on a remote machine will be listening on port 80. On
UNIX systems, only the root user (i.e., the superuser) is allowed to create a socket using a reserved
port. This is partly to prevent unprivileged users on the system inadvertently running code that will
interfere with the operations of the host's network services.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Introduction to Berkeley Sockets 57
A list of reserved ports and their associated well-known services is given in Appendix C. Most serv-
ices are either TCP- or UDP-based, but some can communicate with both protocols. In the interest
of future compatibility, ICANN usually reserves both the UDP and TCP ports for each service. How-
ever, there are many exceptions to this rule. For example, TCP port 514 is used on UNIX systems
for remote shell (login) services, while UDP port 514 is used for the system logging daemon.
In some versions of UNIX, the high-numbered ports in the range 49152 through 65535 are reserved
by the operating system for use as "ephemeral" ports to be assigned automatically to outgoing TCP/
IP connections when a port number hasn't been explicitly requested. The remaining ports, those in
the range 1024 through 49151, are free for use in your own applications, provided that some other
service has not already claimed them. It is a good idea to check the ports in use on your machine
by using one of the network tools introduced later in this chapter (Network Analysis Tools) before
claiming one.
$packed_address = inet_aton($dotted_quad)
Given an IP address in dotted-quad form, this function packs it into binary form suitable for use by sock-
addr_in(). The function will also operate on symbolic hostnames. If the hostname cannot be looked up, it
returns undef.
$dotted_quad = inet_ntoa($packed_address)
This function takes a packed IP address and converts it into human-readable dotted-quad form. It does not
attempt to translate IP addresses into hostnames. You can achieve this effect by using gethostbyaddr(),
discussed later.
$socket_addr = sockaddr_in($port,$address)
($port,$address) = sockaddr_in($socket_addr)
When called in a scalar context, sockaddr_in() takes a port number and a binary IP address and packs
them together into a socket address, suitable for use by socket(). When called in a list context, sock-
addr_in() does the opposite, translating a socket address into the port and IP address. The IP address must
still be passed through inet_ntoa() to obtain a human-readable string.
$socket_addr = pack_sockaddr_in($port,$address)
($port,$address) = unpack_sockaddr_in($socket_addr)
If you don't like the confusing behavior of sockaddr_in(), you can use these two functions to pack and
unpack socket addresses in a context-insensitive manner.
We'll use several of these functions in the example that follows in the next section.
In some references, you'll see a socket's address referred to as its "name." Don't let this confuse
you. A socket's address and its name are one and the same.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Introduction to Berkeley Sockets 58
The script gets the dotted IP address of the daytime server from the command line. When we run
it on 64.7.3.43, which is the IP address of the wuarchive.wustl.edu software archive, we get
output like this:
% daytime_cli.pl 64.7.3.43
Sat Jan 6 19:11:22 2001
Lines 1–3: Load modules—We turn on strict type checking with use strict. This will avoid bugs
introduced by mistyped variable names, the automatic interpretation of bare words as strings,
and other common mistakes. We then load the Socket module, which imports several functions
useful for socket programming.
Lines 4–6: Define constants—We now define several constants. DEFAULT_ADDR is the IP ad-
dress of a remote host to contact when an address isn't explicitly provided on the command line.
We use 127.0.0.1, the loopback address.
PORT is the well-known port for the daytime service, port 13, while IPPROTO_TCP is the numeric
protocol for the TCP protocol, needed to construct the socket.
It's inelegant to hard code these constants, and in the case of IPPROTO_TCP may impede port-
ability. The next section shows how to look up these values at run time using symbolic names.
Lines 7–9: Construct the destination address—The next part of the code constructs a destination
address for the socket using the IP address of the daytime host and the port number of the
daytime service. We begin by recovering the dotted-quad address from the command line or, if
no address was specified, we default to the loopback address.
We now have an IP address in string form, but we need to convert it into packed binary form
before passing it to the socket creation function. We do this by using the inet_aton() function,
described earlier.
The last step in constructing the destination address is to create the sockaddr_in structure,
which combines the IP address with the port number. We do this by calling the sock-
addr_in() function with the port number and packed IP address.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Introduction to Berkeley Sockets 59
Line 10: Create the socket—The socket() function creates the communications endpoint. The
function takes four arguments. The first argument, SOCK, is the filehandle name to use for the
socket. Sockets look and act like filehandles, and like them are capitalized by convention. The
remaining arguments are the address family, the socket type, and the protocol number. With the
exception of the protocol, which we have hard-coded, all constants are taken from the socket
module.
This call to socket() creates a stream-style Internet-domain socket that uses the TCP protocol.
Line 11: Connect to the remote host— SOCK corresponds to the local communications endpoint.
We now need to connect it to the remote socket. We do this using the built-in Perl function
connect(), passing it the socket handle and the remote address that we built earlier. If con-
nect() is successful, it will return a true value. Otherwise, we again die with an error message.
Line 12: Read data from remote host and print it—We can now treat SOCK like a read/write
filehandle, either reading data transmitted from the remote host with the read() or <> functions,
or sending data to the host using print(). In this case, we use the angle bracket operator
(<>) to read all lines transmitted by the remote host and immediately echo the data to standard
output.
We discuss the socket() and connect () calls in more detail in Chapter 4.
($name,$aliases,$type,$len,$packed_addr) = gethostbyname($name);
$packed_addr = gethostbyname($name);
If the hostname doesn't exist, gethostbyname() returns undef. Otherwise, in a scalar context it returns the
IP address of the host in packed binary form, while in a list context it returns a five-element list.
The first element, $name, is the canonical hostname—the official name—for the requested host. This is fol-
lowed by a list of hostname aliases, the address type and address length, usually AF_INET and 4, and the
address itself in packed form. The aliases field is a space-delimited list of alternative names for the host. If
there are no alternative names, the list is empty.
You can pass the packed address returned by gethostbyname() directly to the socket functions, or translate
it back a human-readable dotted-quad form using inet_ntoa(). If you pass gethostbyname() a dotted-
quad IP address, it will detect that fact and return the packed version of the address, just as inet_aton()
does.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Introduction to Berkeley Sockets 60
$name = gethostbyaddr($packed_addr,$family);
($name,$aliases,$type,$len,$packed_addr) = gethostbyaddr($packed_addr, $family);
The gethostbyaddr() call performs the reverse lookup, taking a packed IP address and returning the host-
name corresponding to it.
gethostbyaddr() takes two arguments, the packed address and the address family (usually AF_INET). In
a scalar context, it returns the hostname of the indicated address. In a list context, it returns a five-element list
consisting of the canonical hostname, a list of aliases, the address type, address length, and packed address.
This is the same list returned by gethostbyname(literal).
If the call is unsuccessful in looking up the address, it returns undef or an empty list.
The inet_aton() function can also translate hostnames into packed IP addresses. In general,
you can replace scalar context calls to gethostbyname() with calls to inet_aton():
Licensed by
$packed_address = inet_aton($host);
$packed_address = gethostbyname($host);
Which of these two functions you use is a matter of taste, but be aware that gethostbyname() is
built into Perl, but inet_aton() is available only after you load the socket module. I prefer
Stjepan Maric
inet_aton() because, unlike gethostbyname(), it isn't sensitive to list context.
4218908
quad IP addresses (Figure 3.5). Given a file of hostnames stored in hostnames.txt, the script's output
looks like this:
Figure 3.6 shows a short program for performing the reverse operation, translating a list of dotted-
quad IP addresses into hostnames. For each line of input, the program checks that it looks like a
valid IP address (line 6) and, if so, packs the address using inet_aton(). The packed address
is then passed to gethostbyaddr(), specifying AF_INET for the address family (line 7). If suc-
cessful, the translated hostname is printed out.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Introduction to Berkeley Sockets 61
$number = getprotobyname($protocol)
($name,$aliases,$number) = getprotobyname($protocol)
getprotobyname() takes a symbolic protocol name, such as "udp", and converts it into its corresponding
numeric value. In a scalar context, just the protocol number is returned. In a list context, the function returns
the protocol name, a list of aliases, and the number. Multiple aliases are separated by spaces. If the named
protocol is unknown, the function returns undef or an empty list.
$name = getprotobynumber($protocol_number)
($name,$aliases,$number) = getprotobyname($protocol_number)
The rarely used getprotobynumber() function reverses the previous operation, translating a protocol num-
ber into a protocol name. In a scalar context, it returns the protocol number's name as a string. In a list context,
it returns the same list as getprotobyname(). If passed an invalid protocol number, this function returns
undef or an empty list.
$port = getservbyname($service,$protocol)
($name,$aliases,$port,$protocol) = getservbyname($service,$protocol)
The getservbyname() function converts a symbolic service name, such as "echo," into a port number suit-
able for passing to sockaddr_in(). The function takes two arguments corresponding to the service name
and the desired protocol. The reason for the additional protocol argument is that some services, echo included,
come in both UDP and TCP versions, and there's no guarantee that the two versions of the service use the
same port number, although this is almost always the case.
In a scalar context, getservbyname() returns the port number of the service, or undef if unknown. In a list
context, the function returns a four-element list consisting of the canonical name for the service, a space-
delimited list of aliases, if any, the port number, and the protocol number. If the service is unknown, the function
returns an empty list.
$name = getservbyport($port,$protocol)
($name,$aliases,$port,$protocol) = getservbyport($port,$protocol)
The getservbyport() function reverses the previous operation by translating a port number into the cor-
responding service name. Its behavior in scalar and list contexts is exactly the same as getservbyname().
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Introduction to Berkeley Sockets 62
Figure 3.7. Daytime client, using symbolic hostnames and service names
Line 5: Look up IP address of the daytime host—We call gethostbyname() to translate the
hostname given on the command line into a packed IP address. If no command-line argument
is given, we default to using the loopback address. If the user provided a dotted-IP address
instead of a hostname, then gethostbyname() simply packs it in the manner of
inet_aton(). If gethostbyname() fails because the provided hostname is invalid, it returns
undef and we exit with an error message.
Line 6: Look up the TCP protocol number—We call getprotobyname() to retrieve the value
of the TCP protocol number, rather than hard coding it as before.
Line 7: Look up the daytime service port number—Instead of hard coding the port number for
the daytime service, we call getservbyname() to look it up in the system service database. If
for some reason this doesn't work, we exit with an error message.
Lines 8–11: Connect and read data—The remainder of the program is as before, with the ex-
ception that we use the TCP protocol number retrieved from getprotobyname() in the call to
socket().
You can test the script on the host http://www.modperl. com [http://www.modperl.com]
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Introduction to Berkeley Sockets 63
Net::DNS
This module offers you much greater control over how hostnames are resolved using the domain
name system. In addition to the functionality offered by gethostbyname() and gethost-
byaddr(), Net::DNS allows you to fetch and iterate over all hosts in a domain; get the e-mail
address of the network administrator responsible for the domain; and look up the machine respon-
sible for accepting e-mail for a domain (the "mail exchanger," or MX).
Net::NIS
Many traditional UNIX networks use Sun's Network Information System (NIS) to distribute such
things as hostnames, IP addresses, usernames, passwords, automount tables, and cryptographic
keys. This allows a user to have the same login username, password, and home directories on all
the machines on the network. Perl's built-in network information access functions, such as
gethostbyname() and getpwnam(), provide transparent access to much, but not all, of NIS's
functionality. The Net::NIS module provides access to more esoteric information, such as automount
tables.
Net::LDAP
NIS is slowly being displaced by the Lightweight Directory Access Protocol (LDAP), which provides
greater flexibility and scalability. In addition to the types of information stored by NIS, LDAP is often
used to store users' e-mail addresses, telephone numbers, and other "white pages"–type informa-
tion. Net::LDAP gives you access to these databases.
Win32::NetResource
On Windows networks, the NT domain controller provides directory information on hosts, printers,
shared file systems, and other network information. Win32::NetResources provides functions for
reading and writing this information.
ping
The ping utility, available as a preinstalled utility on all UNIX and Windows machines, is the single
most useful network utility. It sends a series of ICMP "ping" messages to the remote IP address of
your choice, and reports the number of responses the remote machine returns.
ping can be used to test if a remote machine is up and reachable across the network. It can also be
used to test network conditions by looking at the length of time between the outgoing ping and the
incoming response, and the number of pings that have no response (due to either loss of the out-
going message or the incoming response).
For example, this is how ping can be used to test connectivity to the machine at IP address
216.32.74.55 (which happens to be www.yahoo.com):
% ping 216.32.74.55
PING 216.32.74.55: 56 data bytes
64 bytes from 216.32.74.55: icmp_seq=0 ttl=245 time=41.1 ms
64 bytes from 216.32.74.55: icmp_seq=1 ttl=245 time=16.4 ms
64 bytes from 216.32.74.55: icmp_seq=2 ttl=245 time=16.3 ms
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Introduction to Berkeley Sockets 64
^C
This session shows good connectivity. The average response time is a snappy 24 ms, and no
packets were lost. You can also give ping a DNS name, in which case it will attempt to resolve the
name before pinging the host.
One thing to watch for is that some firewall systems are configured to block ping. In this case, the
destination machine may be unpingable, although you can reach it via telnet or other means.
There are many variants of ping, each with a different overlapping set of features.
nslookup
The nslookup utility, available on most UNIX systems, can be used to test and verify the DNS. It
can be used interactively or as a one-shot command-line tool. To use it from the command line, call
it with the DNS name of the host or domain you wish to look up. It will perform the DNS search, and
return IP addresses and other DNS information corresponding to the name. For example:
% nslookup www.yahoo.com
Server: presto.lsjs.org
Address: 64.7.3.44
Non-authoritative answer:
Name: www.yahoo.akadns.net
Addresses: 204.71.200.67, 204.71.200.68, 204.71.202.160, 204.71.200.74, 204.71.200.75
Aliases: www.yahoo.com
This tells us that the host www.yahoo.com has a canonical name of www.yahoo.akadns.net,
and has five IP addresses assigned to it. This is typical of a heavily loaded Web server, where
multiple physical machines balance incoming requests by servicing them in a round-robin fashion.
traceroute
While ping tells you only whether a packet can get from A to B, the traceroute program
displays the exact path a network packet takes to get there. Call it with the IP address of the des-
tination. Each line of the response gives the address of a router along the way. For example:
% traceroute www.yahoo.com
traceroute to www.yahoo.akadns.net (216.32.74.52), 30 hops max, 40 byte packets
1 gw.lsjs.org (192.168.3.1) 2.52 ms 8.78 ms 4.85 ms
2 64.7.3.46 (64.7.3.46) 9.7 ms 9.656 ms 3.415 ms
3 mgp-gw.nyc.megapath.net (64.7.2.1) 19.118 ms 23.619 ms 16.601 ms
4 216.35.48.242 (216.35.48.242) 10.532 ms 10.515 ms 11.368 ms
5 dcr03-g2-0.jrcy01.exodus.net (216.32.222.121) 9.068 ms 9.369 ms 9.08 ms
6 bbr02-g4-0.jrcy01.exodus.net (209.67.45.126) 9.522 ms 11.091 ms 10.212 ms
7 bbr01-p5-0.stng01.exodus.net (209.185.9.98) 15.516 ms 15.118 ms 15.227 ms
8 dcr03-g9-0.stng01.exodus.net (216.33.96.145) 15.497 ms 15.448 ms 15.462 ms
9 csr22-ve242.stng01.exodus.net (216.33.98.19) 16.044 ms 15.724 ms 16.454 ms
10 216.35.210.126 (216.35.210.126) 15.954 ms 15.537 ms 15.644 ms
11 www3.dcx.yahoo.com (216.32.74.52) 15.644 ms 15.582 ms 15.577 ms
traceroute can be invaluable for locating a network outage when a host can no longer be pinged.
The listing will stop without reaching the desired destination, and the last item on the list indicates
the point beyond which the breakage is occurring.
As with ping, some firewalls can interfere with traceroute. Traceroute is preinstalled on most UNIX
systems.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Introduction to Berkeley Sockets 65
netstat
The netstat utility, preinstalled on UNIX and Windows NT/2000 systems, prints a snapshot of all
active network services and connections. For example, running netstat on an active Web and FTP
server produces the following display (abbreviated for space):
% netstat -t
Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 brie.cshl.org:www writer.loci.wisc.e:1402 ESTABLISHED
tcp 0 0 brie.cshl.org:www 157-238-71-168.il.:1215 FIN_WAIT2
tcp 0 0 brie.cshl.org:www 157-238-71-168.il.:1214 FIN_WAIT2
tcp 0 0 brie.cshl.org:www 157-238-71-168.il.:1213 TIME_WAIT
tcp 0 0 brie.cshl.org:6010 brie.cshl.org:2225 ESTABLISHED
tcp 0 0 brie.cshl.org:2225 brie.cshl.org:6010 ESTABLISHED
tcp 0 2660 brie.cshl.org:ssh presto.lsjs.org:64080 ESTABLISHED
tcp 0 0 brie.cshl.org:www 206.169.243.7:1724 TIME_WAIT
tcp 0 20 brie.cshl.org:ftp usr25-wok.cableine:2173 ESTABLISHED
tcp 0 891 brie.cshl.org:www usr25-wok.cableine:2117 FIN_WAIT1
tcp 0 80 brie.cshl.org:ftp soa.sanger.ac.uk:49596 CLOSE
The -t argument restricts the display to TCP connections. The Recv-Q and Send-Q columns
show the number of bytes in the sockets' read and write buffers, respectively. The Local and
Foreign Address columns show the name and port numbers of the local and remote peers,
respectively, and the State column shows the current state of the connection.
netstat can also be used to show services that are waiting for incoming connections, as well as
UDP and UNIX-domain sockets. The netstat syntax on Windows systems is slightly different. To
get a list of TCP connections similar to the one shown above, use the command netstat -p tcp.
tcpdump
The tcpdump utility, available preinstalled on many versions of UNIX, is a packet sniffer. It can be
used to dump the contents of every packet passing by your network card, including those not di-
rected to your machine. It features a powerful filter language that can be used to detect and display
just those packets you are interested in, such as those using a particular protocol or directed toward
a specific port.
MacTCP Watcher
MacTCP Watcher for the Macintosh combines the functionality of ping, dnslookup, and netstat
into one user-friendly application. It can be found by searching the large shareware collection located
at http://www.shareware.com. [http://www.shareware.com]
scanner.exe
For Windows 98/NT/2000 developers, the small scanner.exe utility, also available from http://
www.shareware.com , combines the functionality of ping and dnslookup with the ability to scan
a remote host for open ports. It can be used to determine the services a remote host provides.
net-toolbox.exe
This is a comprehensive set of Windows network utilities that include ping, dnslookup, tcpdump,
and netstat functionality. It can be found by anonymous FTP to gatekeeper.dec.com in the di-
rectory /pub/micro/pc/winsite/win95/netutil/.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Introduction to Berkeley Sockets 66
Summary
A socket is an endpoint for communications. In Perl, a socket looks and acts much like a filehandle.
There are several species of socket distinguished by their address families, type, and communica-
tions protocol. The most frequently used sockets belong to the AF_INET (Internet) address families,
and use either the stream-oriented TCP protocol or the datagram-oriented UDP protocol.
UNIX-domain sockets use addresses based on local filenames, whereas Internet-domain sockets
use a combination of IP address and port number. Addresses must be packed into binary form
before passing them to any of Perl's built-in network functions.
Perl provides a complete set of functions for interconverting the numeric and symbolic forms of host
addresses, protocols, and services. Using the symbolic names makes programs easier to use and
maintain, and promotes portability.
We closed this chapter with a brief list of the utilities that are commonly used for detecting and
diagnosing network configuration problems.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
67
In this chapter we look at TCP, a reliable, connection-oriented byte-stream protocol. These features
make working with TCP sockets similar to working with familiar filehandles and pipes. After opening
a TCP socket, you can send data through it using print() or syswrite(), and read from it using
<>, read(), or sysread().
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
The TCP Protocol 68
Lines 1–6: Load modules—We turn on strict syntax checking, and load the Socket and IO::Han-
dle modules. We use Socket for its socket-related constants, and IO::Handle for its auto-
flush() method.
Line 7: Declare globals—We create two global variables for keeping track of the number of bytes
we send and receive.
Lines 8–9: Process command-line arguments—We read the destination host and port number
from the command line. If the host isn't provided, we default to "localhost". If the port number
isn't provided, we use getservbyname() to look up the port number for the echo service.
Lines 10–11: Look up protocol and create packed IP address—We use getprotobyname()
to find the TCP protocol number for use with socket(). We then use inet_aton() to turn
the hostname into a packed IP address for use with sockaddr_in().
Line 12: Create the socket—We call socket() to create a socket filehandle named SOCK in
the same way that we did in Figure 3.4 (in Chapter 3). We pass arguments specifying the
AF_INET Internet address family, a stream-based socket type of SOCK_STREAM, and the TCP
protocol number looked up earlier.
Lines 13–14: Create destination address and connect socket to it—We use sockaddr_in()
to create a packed address containing the destination's IP address and port number. This ad-
dress is now used as the destination for a connect() call. If successful, connect() returns
true. Otherwise, we die with an error message.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
The TCP Protocol 69
Line 15: Turn on autoflush mode for the socket—We want data written to the socket to be flushed
immediately, rather than hang around in a local buffer. We call the socket's autoflush()
method to turn on autoflush mode. This method is available courtesy of IO::Handle.
Lines 16–22: Enter main loop—We now enter a small loop. Each time through the loop we read
a line of text from standard input, and print it verbatim to SOCK, sending it to the remote host.
We then use the <> operator to read a line of response from the server, and print it to standard
output.
Each time through the loop we tally the number of bytes sent and received. This continues until
we reach EOF on standard input.
Lines 23–24: Close the socket and print statistics—After the loop is done, we close the socket
and print to standard error our summary statistics on the number of bytes sent and received.
A session with tcp_echo_cli1.pl looks like this:
% tcp_echo_cli1.pl www.modperl.com
How now brown cow?
How now brown cow?
There's an echo in here.
There's an echo in here.
Yo-de-lay-ee-oo!
Yo-de-lay-ee-oo!
^D
bytes_sent = 61, bytes_received = 61
The ^D on the second-to-last line of the transcript shows where I got tired of this game and pressed
the end-of-input key. (On Windows systems, this would be ^Z.)
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
The TCP Protocol 70
The close() call works with sockets just like it does with ordinary filehandles. The socket is closed for busi-
ness. Once closed, the socket can no longer be read from or written to. On success, close() returns a true
value. Otherwise, it returns undef and leaves an error message in $!.
The effect of close() on the other end of the connection is similar to the effect of closing a pipe. After the
socket is closed, any further reads on the socket at the other end return an end of file (EOF). Any further writes
result in a PIPE exception.
$boolean = shutdown (SOCK, $how)
shutdown() is a more precise version of close() that allows you to control which part of the bidirectional
connection to shut down. The first argument is a connected socket. The second argument, $how, is a small
integer that indicates which direction to shut down. As summarized in Table 4.1, a $how of 0 closes the socket
for further reads, a value of 1 closes the socket for writes, and a value of 2 closes the socket for both reading
and writing (like close()). A nonzero return value indicates that the shutdown was successful.
Licensed by
Value of HOW Description
0 Closes socket for reading
1 Closes socket for writing
2 Closes socket completely
Stjepan Maric
In addition to its ability to half-close a socket, shutdown() has one other advantage over
close(). If the process has called fork() at any point, there may be copies of the socket file-
handle in the original process's children. A conventional close() on any of the copies does not
4218908
actually close the socket until all other copies are closed as well (filehandles have the same be-
havior). In consequence, the client at the other end of the connection won't receive an EOF until
both the parent and its child process(es) have closed their copies. In contrast, shutdown() closes
all copies of the socket, sending the EOF immediately. We'll take advantage of this feature several
times in the course of this book.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
The TCP Protocol 71
Figure 4.2. When accept() receives an incoming connection, it returns a new socket connected to the client
5. Perform I/O on the connected socket. The server uses the connected socket to talk to the
peer. When the server is done, it closes the connected socket.
6. Accept more connections. —Using the listening socket, the server can accept() as many
connections as it likes. When it is done, it will close() the listening socket and exit.
Our example server is named tcp_echo_serv.pl. This server is a slightly warped version of the
standard echo server. It echoes everything sent to it, but rather than send it back verbatim, it re-
verses each line right to left (but preserves the newline at the end). So if one sends it "Hello world!,"
the line echoed back will be "!dlrow olleH." (There's no reason to do this except that it adds some
spice to an otherwise boring example.)
This server can be used by the client of Figure 4.1, or with the standard Telnet program. Figure
4.3 lists the server code.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
The TCP Protocol 72
Lines 1–9: Load modules, initialize constants and variables—We start out as in the client by
bringing in the Socket and IO::Handle modules. We also define a private echo port of 2007 that
won't conflict with any existing echo server. We set up the $port and $protocol variables as
before (lines 8–9) and initialize the counters.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
The TCP Protocol 73
Lines 10–13: Install INT interrupt handler—There has to be a way to interrupt the server, so we
install a signal handler for the INT (interrupt) signal, which is sent from the terminal when the
user presses ^C. This handler simply prints out the accumulated byte counters' statistics and
exits.
Line 14: Create the socket—Using arguments identical to those used by the TCP client in Figure
4.1, we call socket() to create a stream TCP socket.
Line 15: Set the socket's SO_REUSEADDR option—The next step is to set the socket's
SO_REUSEADDR option to true by calling setsockopt(). This option is commonly used to allow
us to kill and restart the server immediately. Otherwise, there are conditions under which the
system will not allow us to rebind the local address until old connections have timed out.
Lines 16–17: Bind the socket to a local address—We now call bind() to assign a local address
to a socket. We create a local address using sockaddr_in(), passing it our private echo port
for the port, and INADDR_ANY as the IP address. INADDR_ANY acts as a wildcard. It allows the
operating system to accept connections on any of the host's IP addresses (including the loop-
back address and any network interface card addresses it might have).
Line 18: Call listen() to make socket ready to accept incoming connections—We call lis-
ten() to alert the operating system that the socket will be used for incoming connections.
The listen() function takes two arguments. The first is the socket, and the second is an
integer indicating the number of incoming connections that can be queued to wait for processing.
It's common for multiple clients to try to connect at roughly the same time; this argument deter-
mines how large the backlog of pending connections can get. In this case, we use a constant
defined in the Socket module, SOMAXCONN, to take the maximum number of queued connections
that the system allows.
Lines 19–34: Enter main loop—The bulk of the code is the server's main loop, in which it waits
for, and services, incoming connections.
Line 21: accept() an incoming connection—Each time through the loop we call accept(),
using the name of the listening socket as the second argument, and a name for the new socket
(SESSION) as the first. (Yes, the order of arguments is a little odd.) If the call to accept() is
successful, it returns the packed address of the remote socket as its function result, and returzns
the connected socket in SESSION.
Lines 22–23: Unpack client's address —We call sockaddr_in() in a list context to unpack the
client address returned by accept() into its port and IP address components, and print the
address to standard error. In a real application, we might write this information to a time-stamped
log file.
Lines 24–33: Handle the connection —This section handles communications with the client us-
ing the connected socket. We first put the SESSION socket into autoflush mode to prevent buf-
fering problems. We now read one line at a time from the socket using the <> operator, reverse
the text of the line, and send it back to the client using print(>).
This continues until <SESSION> returns undef, which indicates that the peer has closed its end
of the connection. We close the SESSION socket, print a status message, and go back to
accept() to wait for the next incoming connection.
Line 35: Clean up —After the main loop is done we close the listening socket. This part of the
code is never reached because the server is designed to be terminated by the interrupt key.
When we run the example server from the command line, it prints out the "waiting for incoming
connections" message and then pauses until it receives an incoming connection. In the session
shown here, there are two connections, one from a local client at the 127.0.0.1 loopback address,
and the other from a client at address 192.168.3.2. After interrupting the server, we see the statistics
printed out by the INT handler.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
The TCP Protocol 74
% tcp_echo_serv1.p1
waiting for incoming connections on port 2007...
Connection from [127.0.0.1,2865]
Connection from [127.0.0.1,2865] finished
Connection from [192.168.3.2,2901]
Connection from [192.168.3.2,2901] finished
bytes_sent = 26, bytes_received = 26
The INT handler in this server violates the recommendation from Chapter 2 that signal handlers not
do any I/O. In addition, by calling exit() from within the handler, it risks raising a fatal exception
on Windows machines as they shut down. We will see a more graceful way of shutting down a server
in Chapter 10.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
The TCP Protocol 75
The getsockname() function returns the packed binary address of the socket at the local side, or undef if
the socket is unbound. The getpeername() function behaves in the same way, but returns the address of
the socket at the remote side, or undef if the socket is unconnected.
In either case, the returned address must be unpacked with sockaddr_in(), as illustrated in this short ex-
ample:
if ($remote_addr = getpeername(SOCK)) {
my ($port,$ip) = sockaddr_in($remote_addr);
my $host = gethostbyaddr($ip,AF_INET);
print "Socket is connected to $host at port $port\n";
}
Limitations of tcp_echo_serv1.pl
Although tcp_echo_serv1.pl works as written, it has a number of drawbacks that are addressed in
later chapters. The drawbacks include the following:
1. No support for multiple incoming connections. This is the biggest problem. The server can
accept only one incoming connection at a time. During the period that it is busy servicing an
existing connection, other requests will be queued up until the current connection terminates
and the main loop again calls accept(). If the number of queued clients exceeds the value
specified by listen(), new incoming connections will be rejected.
To avoid this problem, the server would have to perform some concurrent processing with
threads or processes, or cleverly multiplex its input/output operations. These techniques are
discussed in Part III of this book.
2. Server remains in foreground. After it is launched, the server remains in the foreground, where
any signal from the keyboard (such as a ^C) can interrupt its operations. Long-running servers
will want to dissociate themselves from the keyboard and put themselves in the background.
Techniques for doing this are described in Chapter 10, Forking Servers and the inetd Daemon.
3. Server logging is primitive. The server logs status information to the standard error output
stream. However, a robust server will run as a background process and shouldn't have any
standard error to write to. The server should append log entries to a file or use the operating
system's own logging facilities. Logging techniques are described in Chapter 16, Bulletproof-
ing Servers.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
The TCP Protocol 76
The getsockopt() and setsockopt() functions allow you to examine and change a socket's options. The
first argument is the filehandle for a previously created socket. The second argument, $level, indicates the
level of the networking stack you wish to operate on. You will usually use the constant SO_SOCKET, indicating
that you are operating on the socket itself. However, a getsockopt() and setsockopt() are occasionally
used to adjust options in the TCP or UDP protocols, in which case you use the protocol number returned by
getprotobyname().
The third argument, $option_name, is an integer value selected from a large list of possible constants. The
last argument, $option_value, is the value to set the option to, or it returns undef if inapplicable. On suc-
cess, getsockopt() returns the value of the requested option, or undef if the call failed. On success,
setsockopt() returns a true value if the option was successfully set; otherwise, it returns undef.
The value of an option is often a Boolean flag indicating whether the option should be turned on and
off. In this case, no special code is needed to set or examine the value. For example, here is how
to set the value of SO_BROADCAST to true (broadcasting is discussed in Chapter 20):
setsockopt(SOCK,SO_SOCKET,SO_BROADCAST,1);
However, a few options act on integers or more esoteric data types, such as C timeval structures.
In this case, you must pack the values into binary form before passing them to setsockopt() and
unpack them after calling getsockopt(). To illustrate, here is the way to recover the maximum
size of the buffer that a socket uses to store outgoing data. The SO_SNDBUF option acts on a packed
integer (the "I" format):
$send_buffer_size = unpack("I",getsockopt($sock,SOL_SOCKET,SO_SNDBUF));
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
The TCP Protocol 77
For example, to make a socket linger for up to 120 seconds you would call:
setsockopt(SOCK,SOL_SOCKET, SO_LINGER,pack("II",1,120))
or die "Can't set SO_LINGER: $!";
SO_BROADCAST is valid for UDP sockets only. If set to a true value, it allows send() to be used
to send packets to the broadcast address for delivery to all hosts on the local subnet. We discuss
broadcasting in Chapter 20.
The SO_OOBINLINE flag controls how out-of-band information is handled. This feature allows the
peer to be alerted of the presence of high-priority data. We describe how this works in Chapter 17.
SO-SNDLOWAT and SO_RCVLOWAT set the low-water marks for the output and input buffers,
respectively. The significance of these options is discussed in Chapter 13, Nonblocking I/O. These
options are integers, and must be packed and unpacked using the I pack format.
SO_TYPE is a read-only option. It returns the type of the socket, such as SOCK_STREAM. You will
need to unpack this value with the I format before using it. The IO::Socket sockopt() method
discussed in Chapter 5 does the conversion automatically.
Table 4.2. Common Socket Options
Option Description
SO_REUSEADDR Enable reuse of the local address.
SO_KEEPALIVE Enable the transmission of periodic "keepalive" messages.
SO_LINGER Linger on close if data present.
SO_BROADCAST Allow socket to send messages to the broadcast address.
SO_OOBINLINE
Keep out-of-band data inline.
SO_SNDLOWAT Get or set the size of the output buffer "low water mark."
SO_RECVLOWAT Get or set the size of the input buffer "low water mark."
SO_TYPE Get the socket type (read only).
SO_ERROR Get and clear the last error on the socket (read only).
Last, the read-only SO_ERROR option returns the error code, if any, for the last operation on the
socket. It is used for certain asynchronous operations, such as nonblocking connects (Chapter
13). The error is cleared after it is read. As before, users of getsockopt() need to unpack the
value with the I format before using it, but this is taken care of automatically by IO::Socket.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
The TCP Protocol 78
The downside of setting SO_REUSEADDR is that it allows you to launch your server twice. Both
processes will be able to bind to the same address without triggering an error, and they will then
compete for incoming connections, leading to confusing results. Servers that we develop later (e.g.,
Chapters 10, 14, and 16) avoid this possibility by creating a file when the program starts up and
deleting it on exit. The server refuses to launch if it sees that this file already exists.
The operating system does not allow a socket address bound by one user's process to be bound
by another user's process, regardless of the setting of SO_REUSEADDR.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
The TCP Protocol 79
The socketpair() function is similar to the pipe() function that we saw in Chapter 2, except
that the connection is bidirectional. Typically a script creates a pair of sockets and then fork(),
with the parent closing one socket and the child closing the other. The two sockets can then be used
for bidirectional communication between parent and child.
While in principle socketpair() can used for the INET protocol, in practice most systems only
support socketpair() for creating UNIX-domain sockets. Here is the idiom:
socketpair(SOCK1,SOCK2,AF_UNIX,SOCK_STREAM,PF_UNSPEC) or die $!;
These symbols are not exported by default, but must be brought in with use either individually or
by importing the ":crlf" tag. In the latter case, you probably also want to import
the ":DEFAULT" tag in order to get the default socket-related constants as well:
use Socket qw(:DEFAULT :crlf);
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
The TCP Protocol 80
TCP can't overcome all problems. This section briefly discusses the common exceptions, as well
as some common programming errors.
Licensed by
system can't figure out how to route the message to the desired destination, because of a local
misconfiguration or the failure of a router somewhere along the way. In this case, con-
nect() fails with a "Network is unreachable" (ENETUNREACH) error.
4. There is a programmer error. —Various errors are due to common programming mistakes.
Stjepan Maric
For example, an attempt to call connect() with a filehandle rather than a socket results in a
"Socket operation on non-socket" (ENOTSOCK) error. An attempt to call connect() on a
socket that is already connected results in "Transport endpoint is already connected" (EIS-
CONN) error.
4218908
The ENOTSOCK error can also be returned by other socket calls, including bind(), listen(),
accept(), and the sockopt() calls.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
The TCP Protocol 81
One way to avoid blocking indefinitely is to set the SO_KEEPALIVE option on the socket. In
this case, the connection times out after some period of unresponsiveness, and the socket is
closed. The keepalive timeout is relatively long (minutes in some cases) and cannot be
changed.
3. The network goes down while a connection is established. —If a router or network segment
goes down while a connection is established, making the remote host unreachable, the current
I/O operation blocks until connectivity is restored. In this case, however, when the network is
restored the connection usually continues as if nothing happened, and the I/O operation com-
pletes successfully.
There are several exceptions to this, however. If, instead of simply going down, one of the routers
along the way starts issuing error messages, such as "host unreachable," then the connection is
terminated and the effect is similar to scenario (1). Another common situation is that the remote
server has its own timeout system. In this case, it times out and closes the connection as soon as
network connectivity is restored.
Summary
TCP sockets are created using socket(). Clients call connect() to establish an outgoing con-
nection to a remote host. Servers call bind() to assign an address to the socket, listen() to
notify the operating system that the socket is ready to accept connections, and accept() to accept
an incoming connection. Once a TCP socket is connected, it can be used like a filehandle to read
and write stream-oriented data.
Sockets have a number of options that can be set and examined with setsockopt() and get-
sockopt(), respectively. Unlike filehandles, which when closed allow no further reading or writing,
sockets can be half-closed using shutdown(), which allows the socket to be closed for reading,
writing, or both.
You've now seen the entire Berkeley socket API. The next chapter introduces Perl's object-oriented
extensions, which greatly simplify the task of working with sockets.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
82
The last chapter walked through Perl's built-in interface to Berkeley sockets, which closely mirrors
the underlying C-library calls. Some of the built-in functions, however, are awkward because of the
need to convert addresses and other data structures into the binary forms needed by the C library.
The advent of the object-oriented extensions to Perl5 made it possible to create a more intuitive
interface based on the IO::Handle module. IO::Socket and related modules simplify code and make
it easier to read by eliminating the noisy C-language-related calls and allowing you to focus on the
core of the application.
This chapter introduces the IO::Socket API, and then walks through some basic TCP applications.
Using IO::Socket
Before we discuss IO::Socket in detail, let's get a feel for it by reimplementing some of the examples
from Chapters 3 and 4.
A Daytime Client
The first example is a rewritten version of the daytime service client developed in Chapter 3 (Figure
3.7). As you recall, this client establishes an outgoing connection to a daytime server, reads a line,
and exits.
This is a good opportunity to fix a minor bug in the original examples (left there in the interests of
not unnecessarily complicating the code). Like many Internet servers, the daytime service termi-
nates its lines with CRLF rather than a single LF as Perl does. Before reading from daytime, we set
the end-of-line character to CRLF. Otherwise, the line we read will contain an extraneous CR at the
end.
Figure 5.1 lists the code.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
The IO::Socket API 83
Lines 1–4: Initialize module—We load IO::Socket and import the ":crlf" constants as well as
the default constants. These constants are conveniently reexported from Socket. We recover
the name of the remote host from the command line.
Line 5: Set the end-of-line separator—To read lines from the daytime server, we set the $/ end-
of-line global to CRLF. Note that this global option affects all filehandles, not just the socket.
Lines 6–7: Create socket—We create a new IO::Socket object by calling IO::Socket::INET's new
method, specifying the destination address in the form $host:service. We will see other ways to
specify the destination address later.
Lines 8–9: Read the time of day and print it—We read a single line from the server by calling
getline(), and remove the CRLF from the end with chomp(). We print this line to STDOUT.
When we run this program, we get the expected result:
% time_of_day_tcp.pl wuarchive.wustl.edu
Tue Aug 15 07:39:49 2000
Lines 1–8: Initialize script—We load IO::Socket, initialize constants and globals, and process
the command-line arguments.
Line 9: Create socket—We call the IO::Socket::INET->new() method using the $host:
$port argument. If new() is successful, it returns a socket object connected to the remote host.
Lines 10–16: Enter main loop—We now enter the main loop. Each time through the loop we call
getline() on the STDIN filehandle to retrieve a line of input from the user. We send this line
of text to the remote host using print() on the socket, and read the server's response using
the <> operator. We print the response to standard output, and update the statistics.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
The IO::Socket API 84
Lines 17–18: Clean up—The main loop will end when STDIN is closed by the user. We close
the socket and print the accumulated statistics to STDERR.
Did you notice that line 10 uses the object-oriented getline() method with STDIN? This is a
consequence of bringing in IO::Socket, which internally loads IO::Handle. As discussed in Chapter
2, a side effect of IO::Handle is to add I/O object methods to all filehandles used by your program,
including the standard ones.
Unlike with the daytime client, we don't need to worry about what kind of line ends the echo service
uses, because it echoes back to us exactly what we send it.
Note also that we did not need to set autoflush mode on the IO::Socket object, as we did in examples
in Chapter 4. Since IO::Socket version 1.18, autoflush is turned on by default in all sockets created
by the module. This is the version that comes with Perl 5.00503 and higher.
IO::Socket Methods
We'll now look at IO::Socket in greater depth.
One never creates an IO::Socket object directly, but creates either an IO::Socket::INET or an
IO::Socket::UNIX object. We use the IO::Socket::INET subclass both in this chapter and in much of
the rest of this book. Future versions of the I/O library may support other addressing domains.
Other important descendents of IO::Handle include IO::File, which we discussed in Chapter 2, and
IO::Pipe, which provides an object-oriented interface to Perl's pipe() call. From IO::Socket::INET
descends Net::Cmd, which is the parent of a whole family of third-party modules that provide inter-
faces to specific command-oriented network services, including FTP and Post Office Protocol. We
discuss these modules beginning in Chapter 6.
Although not directly descended from IO::Handle, other modules in the IO::* namespace include
IO::Dir for object-oriented methods for reading and manipulating directories, IO::Select for testing
sets of filehandles for their readiness to perform I/O, and IO::Seekable for performing random access
on a disk file. We introduce IO::Select in Chapter 12, where we use it to implement network servers
using I/O multiplexing.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
The IO::Socket API 85
$sock = IO::Socket::INET->new('wuarchive.wustl.edu:daytime');
This object is then used for all I/O related to the socket. Because IO::Socket::INET descends from
IO::Handle, its objects inherit all the methods for reading, writing, and managing error conditions
introduced in Chapter 2. To these inherited methods, IO::Socket::INET adds socket-oriented meth-
ods such as accept(), connect(), bind(), and sockopt().
As with IO::File, once you have created an IO::Socket option you have the option of either using the
object with a method call, as in:
$sock->print('Here comes the data.');
Which syntax you use is largely a matter of preference. For performance reasons discussed at the
end of this chapter, I prefer the function-oriented style whenever there is no substantive difference
between the two.
The IO::Socket::INET->new() constructor is extremely powerful, and is in fact the most com-
pelling reason for using the object-oriented socket interface.
In addition to specifying the service by name or port number, you can combine the two so that
IO::Socket::INET will attempt to look up the service name first, and if that isn't successful, fall back
to using the hard-coded port number. The format is hostname:service(port). For instance, to connect
to the wuarchive echo service, even on machines that for some reason don't have the echo service
listed in the network information database, we can call:
my $echo = IO::Socket::INET->new('wuarchive.wustl.edu:echo(7)')
or die "Can't connect: $!\n";
The new() method can also be used to construct sockets suitable for incoming connections, UDP
communications, broadcasting, and so forth. For these more general uses, new() accepts a named
argument style that looks like this:
my $echo = IO::Socket::INET->new(PeerAddr => 'wuarchive.wustl.edu',
PeerPort => 'echo(7)',
Type => SOCK_STREAM,
Proto => 'tcp')
or die "Can't connect: $!\n";
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
The IO::Socket API 86
Recall from Chapter 1 that the " => " symbol is accepted by Perl as a synonym for ",". The newlines
between the argument pairs are for readability only. In shorter examples, we put all the name/
argument pairs on a single line.
The list of arguments that you can pass to IO::Socket::INET is extensive. They are summarized in
Table 5.1
Table 5.1. Arguments to IO::Socket::INET->new()
Argument Description Value
PeerAddr Remote host address <hostname or address>[:<port>]
PeerHost Synonym for PeerAddr
PeerPort Remote port or service <service name or number>
LocalAddr Local host bind address <hostname or address>[:port]
LocalHost Synonym for LocalAddr
LocalPort Local host bind port <service name or port number>
Proto Protocol name (or number) <protocol name or number>
Type Socket type SOCK_STREAM | SOCK_DGRAM | ...
Listen Queue size for listen <integer>
Reuse Set SO_REUSEADDR before binding <boolean>
Timeout Timeout value <integer>
MultiHomed Try all adresses on multihomed hosts <boolean>
The PeerAddr and PeerHost arguments are synonyms which are used to specify a host to connect
to. When IO::Socket::INET is passed either of these arguments, it will attempt to connect() to the
indicated host. These arguments accept a hostname, an IP address, or a combined hostname and
port number in the format that we discussed earlier for the simple form of new(). If the port number
is not embedded in the argument, it must be provided by PeerPort.
PeerPort indicates the port to connect to, and is used when the port number is not embedded in the
hostname. The argument can be a numeric port number, a symbolic service name, or the combined
form, such as "ftp(22)."
The LocalAddr, LocalHost, and LocalPort arguments are used by programs that are acting as serv-
ers and wish to accept incoming connections. LocalAddr and LocalHost are synonymous, and
specify the IP address of a local network interface. LocalPort specifies a local port number. If
IO::Socket::INET sees any of these arguments, it constructs a local address and attempts to
bind() to it.
The network interface can be specified as an IP address in dotted-quad form, as a DNS hostname,
or as a packed IP address. The port number can be given as a port number, as a service name, or
using the "service(port)" combination. Itis also possible to combine the local IP address with the port
number, as in "127.0.0.1:http(80)." In this case, IO::Socket::INET will take the port number from
LocalAddr, ignoring the LocalPort argument.
If you specify LocalPort but not LocalAddr, then IO::Socket::INET binds to the INADDR_ANY wild-
card, allowing the socket to accept connections from any of the host's network interfaces. This is
usually the behavior that you want.
Stream-oriented programs that wish to accept incoming connections should also specify the Lis-
ten and possibly Reuse arguments. Listen gives the size of the listen queue. If the argument is
present, IO::Socket will call listen() after creating the new socket, using the argument as its
queue length. This argument is mandatory if you wish to call accept() later.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
The IO::Socket API 87
Reuse, if a true value, tells IO::Socket::INET to set the SO_REUSEADDR option on the new socket.
This is useful for connection-oriented servers that need to be restarted from time to time. Without
this option, the server has to wait a few minutes between exiting and restarting in order to avoid
"address in use" errors during the call to bind().
Proto and Type specify the protocol and socket type. The protocol may be symbolic (e.g., "tcp") or
numeric, using the value returned by getprotoby name(). Type must be one of the SOCK_*
constants, such as SOCK_STREAM. If one or more of these options is not provided, IO::Socket::INET
guesses at the correct values from context. For example, if Type is absent, IO::Socket:: INET infers
the correct type from the protocol. If Proto is absent but a service name was given for the port, then
IO::Socket::INET attempts to infer the correct protocol to use from the service name. As a last resort,
IO::Socket::INET defaults to "tcp."
Timeout sets a timeout value, in seconds, for use with certain operations. Currently, timeouts are
used for the internal call to connect() and in the accept() method. This can be handy to prevent
a client program from hanging indefinitely if the remote host is unreachable.
The MultiHomed option is useful in the uncommon case of a TCP client that wants to connect to a
host with multiple IP addresses and doesn't know which IP address to use. If this argument is set
to a true value, the new(), method uses gethostbyname() to look up all the IP addresses for the
hostname specified by PeerAddr. It then attempts a connection to each of the host's IP addresses
in turn until one succeeds.
To summarize, TCP clients that wish to make outgoing connections should call new() with a
Proto argument of tcp, and either a PeerAddr with an appended port number, or a PeerAddr/Peer-
Port pair. For example:
my $sock = IO::Socket::INET->new(Proto => 'tcp',
PeerAddr => 'www.yahoo.com',
PeerPort => 'http(80)');
TCP servers that wish to accept incoming connections should call new(), with a Proto of " tcp ",
a LocalPort argument indicating the port they wish to bind with, and a Listen argument indicating
the desired listen queue length:
my $listen = IO::Socket::INET->new(Proto => 'tcp',
LocalPort => 2007,
Listen => 128);
As we will discuss in Chapter 19, UDP applications need provide only a Proto argument of " udp "
or a Type argument of SOCK_DGRAM. The idiom is the same for both clients and servers:
my $udp = IO::Socket::INET->new(Proto => 'udp');
$connected_socket = $listen_socket->accept()
($connected_socket,$remote_addr) = $listen_socket->accept()
The accept() method performs the same task as the like-named call in the function-oriented API. Valid only
when called on a listening socket object, accept() retrieves the next incoming connection from the queue,
and returns a connected session socket that can be used to communicate with the remote host. The new socket
inherits all the attributes of its parent, except that it is connected.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
The IO::Socket API 88
When called in a scalar context, accept() returns the connected socket. When called in an array context,
accept() returns a two-element list, the first of which is the connected socket, and the second of which is the
packed address of the remote host. You can also recover this address at a later time by calling the connected
socket's peername() method.
$return_val = $sock->connect ($dest_addr)
$return_val = $sock->bind ($my_addr)
$return_val = $sock->listen ($max_queue)
These three TCP-related methods are rarely used because they are usually called automatically by new().
However, if you wish to invoke them manually, you can do so by creating a new TCP socket without providing
either a PeerAddr or a Listen argument:
$sock = IO::Socket::INET->new(Proto=>'tcp');
$dest_addr = sockaddr_in(...) # etc.
$sock->connect($dest_addr);
$result = $sock->connected()
The connected() method returns true if the socket is connected to a remote host, false otherwise. It works
by calling peername().
$protocol = $sock->protocol()
$type = $sock->socktype()
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
The IO::Socket API 89
$domain = $sock->sockdomain()
These three methods return basic information about the socket, including its numeric protocol, its type, and its
domain. These methods can be used only to get the attributes of a socket object. They can't be used to change
the nature of an already-created object.
$value = $sock->sockopt($option [,$value])
The sockopt() method can be used to get and/or set a socket option. It is a front end for both getsock-
opt() and setsockopt(). Called with a single numeric argument, sockopt() retrieves the current value
of the option. Called with an option and a new value, sockopt() sets the option to the indicated value, and
returns a result code indicating success or failure. There is no need to specify an option level, as you do with
getsockopt(), because the SOL_SOCKET argument is assumed.
Unlike the built-in getsockopt(), the object method automatically converts the packed argument returned
by the underlying system call into an integer, so you do not need to unpack the option values returned by
sockopt(). As we discussed earlier, the most frequent exception to this is the SO_LINGER option, which
operates on an 8-byte linger structure as its argument.
$val = timeout([$timeout])
timeout() gets or sets the timeout value that IO::Socket uses for its connect(), and accept() methods.
Called with a numeric argument, it sets the timeout value and returns the new setting. Otherwise, it returns the
current setting. The timeout value is not currently used for calls that send or receive data. The eval{} trick,
described in Chapter 2, can be used to achieve that result.
$bytes = $sock->send ($data [, $flags ,$destination])
$address = $sock-> recv ($buffer,$length [,$flags])
These are front ends for the send() and recv() functions, and are discussed in more detail when we discuss
UDP communications in Chapter 19.
An interesting side effect of the timeout implementation is that setting theIO::Socket::INET timeout
makes the connect() and accept() calls interruptable by signals. This allows a signal handler
to gracefully interrupt a program that is hung waiting on a connect() or accept(). We will see
an example of this in the next section.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
The IO::Socket API 90
Licensed by
Stjepan Maric
4218908
Lines 1–7: Initialize script—We turn on strict syntax checking and load the IO::Socket module.
We import the default constants and the newline-related constants by importing the
tags :DEFAULT and :crlf.
We define our local port as a constant, and initialize the byte counters for tracking statistics. We
also set the global $/ variable to CRLF in accordance with the network convention.
Lines 8–9: Install INT signal handler—We install a signal handler for the INT signal, so that the
server will shut down gracefully when the user presses the interrupt key. The improved handler
simply sets the flag named $quit to true.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
The IO::Socket API 91
Lines 10–15: Create the socket object—We recover the port number from the command line or,
if no port number is provided, we default to the hard-coded constant. We now call
IO::Socket::INET->new() with arguments that cause it to create a listening socket bound
to the specified local port. Other arguments set the SO_REUSEADDR option to true, and specify
a 1-hour timeout (60*60 seconds) for the accept() operation.
The Timeout parameter makes each call to the accept() method return undef if an incoming
connection has not been received within the specified time. However, our motivation for acti-
vating this feature is not for its own sake, but for the fact that it changes the behavior of the
method so that it is not automatically restarted after being interrupted by a signal. This allows
us to interrupt the server with the ^C key without bothering to wrap accept() in an eval{}
block (see Chapter 2, Timing Out Slow System Calls).
Lines 16–31: Enter main loop—After printing a status message, we enter a loop that continues
until the INT interrupt handler has set $quit to true. Each time through the loop, we call the
socket's accept() method. If the accept() method completes without being interrupted by a
signal or timing out on its own, it returns a new connected socket object, which we store in a
variable named $session. Otherwise, accept() returns undef, in which case we go back to
the beginning of the loop. This gives us a chance to test whether the interrupt handler has set
$quit to true.
Lines 19–21: Get remote host's name and port—We call the connected socket's peeraddr()
method to get the packed IP address at the other end of the connection, and attempt to translate
it into a hostname using gethostbyaddr(). If this fails, it returns undef, and we call the
peerhost() method to give us the remote host's address in dotted-quad form.
We get the remote host's port number using peerport(), and print the address and port num-
ber to standard error.
Lines 22–30: Handle the connection—We read lines from the connected socket, reverse them,
and print them back out to the socket, keeping track of the number of bytes sent and received
while we do so. The only change from the earlier example is that we now terminate each line
with CRLF.
When the remote host is done, we get an EOF when we try to read from the connected socket.
We print out a warning, close the connected socket, and go back to the top of the loop to
accept() again.
When we run the script, it acts like the earlier version did, but the status messages give hostnames
rather than dotted-IP addresses.
% tcp_echo_serv2.pl
waiting for incoming connections on port 2007...
Connection from [localhost,2895]
Connection from [localhost,2895] finished
Connection from [formaggio.cshl.org,12833]
Connection from [formaggio.cshl.org,12833] finished
^C
bytes_sent = 50, bytes_received = 50
A Web Client
In this section, we develop a tiny Web client named web_fetch.pl. It reads a Universal Resource
Locator (URL) from the command line, parses it, makes the request, and prints the Web server's
response to standard output. Because it returns the raw response from the Web server without
processing it, it is very useful for debugging misbehaving CGI (Common Gateway Interface) scripts
and other types of Web-server dynamic content.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
The IO::Socket API 92
The Hypertext Transfer Protocol (HTTP) is the main protocol for Web servers. Part of the power
and appeal of the protocol is its simplicity. A client wishing to fetch a document makes a TCP con-
nection to the desired Web server, sends a brief request, and then receives the response from the
server. After the document is delivered, the Web server breaks the connection. The hardest part is
parsing the URL. HTTP URLs have the following general format:
http://hostname:port/path/to/document#fragment
All HTTP URLs begin with the prefix http://. This is followed by a hostname such as www.ya-
hoo.com, a colon, and the port number that the Web server is listening on. The colon and port may
be omitted, in which case the standard server port 80 is assumed. The hostname and port are
followed by the path to the desired document using UNIX-like file path conventions. The path may
be followed by a "#" sign and a fragment name, which indicate a subsection in the document that
the Web browser should scroll to.
Our client will parse the components of this URL into the hostname:port combination and the path.
We ignore the fragment name. We then connect to the designated server using a TCP socket and
send an HTTP request of this form:
GET /path/to/document HTTP/1.0 CRLF CRLF
The request consists of the request method "GET" followed by a single space and the designated
path, copied verbatim from the URL. This is followed by another space, the protocol version number
HTTP/1.0, and two CRLF pairs. After the request is sent, we wait for a response from the server. A
typical response looks like this:
HTTP/1.1 200 OK
Date: Wed, 01 Mar 2000 17:00:41 GMT
Server: Apache/1.3.6 (UNIX)
Last-Modified: Mon, 31 Jan 2000 04:28:15 GMT
Connection: close
Content-Type: text/html
The response is divided into two parts: a header containing information about the returned docu-
ment, and the requested document itself. The two parts are separated by a blank line formed by
two CRLF pairs.
We will delve into the structure of HTTP responses in more detail in Chapter 9, where we discuss
the LWP library, and Chapter 13, where we develop a more sophisticated Web client capable of
retrieving multiple documents simultaneously. The only issue to worry about here is that, whereas
the header is guaranteed by the HTTP protocol to be nice human-readable lines of text, each ter-
minated by a CRLF pair, the document itself can have any format. In particular, we must be prepared
to receive binary data, such as the contents of a GIF or MP3 file.
Figure 5.5 shows the web_fetch.pl script in its entirety.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
The IO::Socket API 93
Lines 1–5: Initialize module—We turn on strict syntax checking and load the IO::Socket module,
importing the default and newline-related constants. As in previous examples, we are dealing
with CRLF-delimited data. However, in this case, we set $/ to be a pair of CRLF sequences.
Later, when we call the <> operator, it will read the entire header down through the CRLF pair
that terminates it.
Lines 6–8: Parse URL—We read the requested URL from the command line and parse it using
a pattern match. The match returns the hostname, possibly followed by a colon and port number,
and the path up through, but not including, the fragment.
Lines 9–10: Open socket—We open a socket connected to the remote Web server. If the URL
contained the port number, it is included in the hostname passed to PeerAddr, and the Peer-
Port argument is ignored. Otherwise, PeerPort specifies that we should connect to the standard
"http" service, port 80.
Line 11: Send the request—We send an HTTP request to the server using the format described
earlier.
Lines 12–14: Read and print the header—Our first read is line-oriented. We read from the socket
using the <> operator. Because $/ is set to a pair of CRLF sequences, this read grabs the entire
header up through the blank line. We now print the header, but since we don't particularly want
extraneous CRs to mess up the output, we first replace all occurrence of $CRLF with a logical
newline ("\n", which will evaluate to whatever is the appropriate newline character for the current
platform.
Line 15: Read and print the document—Our subsequent reads are byte-oriented. We call
read() in a tight loop, reading up to 1024 bytes with each operation, and immediately printing
them out with print(). We exit when read() hits the EOF and returns 0.
Here is an example of what web_fetch.pl looks like when it is asked to fetch the home page for
www.cshl.org:
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
The IO::Socket API 94
% web_fetch.pl http://www.cshl.org/
HTTP/1.1 200 OK
Server: Netscape-Enterprise/3.5.1C
Date: Wed, 16 Aug 2000 00:46:12 GMT
Content-type: text/html
Last-modified: Fri, 05 May 2000 13:19:29 GMT
Content-length: 5962
Accept-ranges: bytes
Connection: close
<HTML>
<HEAD>
<TITLE>Cold Spring Harbor Laboratory</TITLE>
Although it seems like an accomplishment to fetch a Web page in a mere 15 lines of code, client
scripts that use the LWP module can do the same thing—and more—with just a single line. We
discuss how to use LWP to access the Web in Chapter 9.
I will write:
syswrite ($socket,"A man, a plan, a canal, panama!");
This has exactly the same effect as the method call, but avoids its overhead.
For methods that do improve on the function call, don't hesitate to use them. For example, the
accept() method is an improvement over the built-in function because it returns a connected
IO::Socket object rather than a plain filehandle. The method also has a syntax that is more Perl-like
than the built-in.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
The IO::Socket API 95
Concurrent Clients
This chapter concludes by introducing a topic that is one of the central issues of Part III, the problem
of concurrency.
Lines 1–6: Initialize module—We load IO::Socket as before and recover the desired remote
hostname and port number from the command-line arguments.
Lines 7–8: Create socket—We create the socket using IO::Socket::INET->new() or, if
unsuccessful, die with an error message.
Lines 9–21: Enter main loop—We enter a loop. Each time through the loop we read one line
from the socket and print it to standard output. Then we read a line of input from the user and
print it to the socket.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
The IO::Socket API 96
Because the remote server uses CRLF pairs to end its lines, but the user types conventional
newlines, we need to keep setting and resetting $/. The easiest way to do this is to place the
code that reads a line from the socket in a little block, and to localize (with local) the $/ variable,
so that its current value is saved on entry into the block, and restored on exit. Within the block,
we set $/ to CRLF.
If we get an EOF from either the user or the server, we leave the loop by calling last.
At first, this straightforward script seems to work. For example, this transcript illustrates a session
with an FTP server. The first thing we see on connecting with the server is its welcome banner
(message code 220). We type in the FTP USER command, giving the name "anonymous," and get
an acknowledgment. We then provide a password with PASS, and get another acknowledgment.
Everything seems to be going smoothly.
% gab1.pl phage.cshl.org ftp
220 phage.cshl.org FTP server ready.
USER anonymous
331 Guest login ok, send your complete e-mail address as password.
PASS [email protected]
230 Guest login ok, access restrictions apply.
Unfortunately, things don't last that way for long. The next thing we try is the HELP command, which
is supposed to print a multiline summary of FTP commands. This doesn't go well. We get the first
line of the expected output, and then the script stops, waiting for us to type the next command. We
type another HELP, and get the second line of the output from the first HELP command. We type
QUIT, and get the third line of the HELP command.
HELP
214-The following commands are recognized (* =>'s unimplemented).
HELP
USER PORT STOR MSAM* RNTO NLST MKD CDUP
QUIT
PASS PASV APPE MRSQ* ABOR SITE XMKD XCUP
QUIT
ACCT* TYPE MLFL* MRCP* DELE SYST RMD STOU
QUIT
...
Clearly the script has gotten out of synch. As it is written, it can deal with only the situation in which
a single line of input from the user results in a single line of output from the server. Having no way
of dealing with multiline output, it can't catch up with the response to the HELP command.
What if we changed the line that reads from the server to something like this?
while ($from_server = <$socket>) {
chomp $from_server;
print $from_server,"\n";
}
Unfortunately, this just makes matters worse. Now the script hangs after it reads the first line from
the server. The FTP server is waiting for us to send it a command, but the script is waiting for another
line from the server and hasn't even yet asked us for input, a situation known as deadlock.
In fact, none of the straightforward rearrangements of the read and print orders fix this problem. We
either get out of synch or get hopelessly deadlocked.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
The IO::Socket API 97
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
The IO::Socket API 98
The beauty of this is that the child doesn't see the EOF until after it has finished processing any
queued server data. This guarantees that no data is lost. In addition, the scheme works equally well
when the termination of the connection is initiated by the server. The risk of this scheme is that the
server may not cooperate and close its end of the connection when it receives an EOF. However,
most servers are well behaved in this respect. If you encounter one that isn't, you can always kill
both the parent and the child by pressing the interrupt key.
There is one subtle aspect to this scheme. The parent process can't simply close() its copy of the
socket in order to send an EOF to the remote host. There is a second copy of the socket in the child
process, and the operating system won't actually close a filehandle until its last copy is closed. The
solution is for the parent to call shutdown(1) on the socket, forcefully closing it for writing. This
sends EOF to the server without interfering with the socket's ability to continue to read data coming
in the other direction. This strategy is implemented in Figure 5.8, in a script named gab2.pl.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
The IO::Socket API 99
Lines 1–7: Initialize module—We turn on strict syntax checking, load IO::Socket, and fetch the
host and port from the command line.
Line 8: Create the socket—We create the connected socket in exactly the same way as before.
Lines 9–10: Call fork—We call fork(), storing the result in the variable $child. Recall that if
successful, fork() duplicates the current process. In the parent process, fork() returns the
PID of the child; in the child process, fork() returns numeric 0.
In case of error, fork() returns undef. We check for this and exit with an error message.
Lines 11–15: Parent process copies from standard input to socket —The rest of the script is
divided into halves. One half is the parent process, and is responsible for reading lines from
standard input and writing to the server; the other half is the child, which is responsible for reading
lines from the server and writing them to standard output.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
The IO::Socket API 100
In the parent process, $child is nonzero. For the reasons described earlier, we set up a signal
handler for the CHLD signal. This handler simply calls exit(). We then call the
user_to_host() subroutine, which copies user data from standard input to the socket.
When standard input is closed, user_to_host() returns. We call the socket's shutdown()
method, closing it for writing. Now we go to sleep indefinitely, awaiting the expected CHLD signal
that will terminate the process.
Lines 16–19: Child process copies from socket to standard output—In the child process, we call
host_to_user() to copy data from the socket to standard output. This subroutine will return
when the remote host closes the socket. We don't do anything special after that except to warn
that the remote host has closed the connection. We allow the script to exit normally and let the
operating system generate the CHLD message.
Lines 20–26: The user_to_host() subroutine—This subroutine is responsible for copying
lines from standard input to the socket. Our loop reads a line from standard input, removes the
Licensed by
newline, and then prints to the socket, appending a CRLF to the end. We return when standard
input is closed.
Lines 27–34: The host_to_user() subroutine—This subroutine is almost the mirror image of
the previous one. The only difference is that we set the $/ input record separator global to
Stjepan Maric
CRLF before reading from the socket. Notice that there's no reason to localize $/ in this case
because changes made in the child process won't affect the parent. When we've read the last
line from the socket, we return.
You may wonder why the parent goes to sleep rather than simply exit after it has shutdown() its
4218908
copy of the socket. The answer is simply esthetic. As soon as the parent exits, the user will see the
command-line prompt reappear. However, the child may still be actively reading from the socket
and writing to standard output. The child's output will intermingle in an ugly way with whatever the
user is doing at the command line. By sleeping until the child exits, the parent avoids this behavior.
You may also wonder about the call to exit() in the CHLD signal handler. While this is a problematic
construction on Windows platforms because it causes crashes, the sad fact is that the Windows
port of Perl does not generate or receive CHLD signals when a child process dies, so this issue is
moot. To terminate gab2.pl on Windows platforms, press the interrupt key.
When we try to connect to an FTP server using the revised script, the results are much more sat-
isfactory. Multiline results now display properly, and there is no problem of synchronization or dead-
locking.
% gab2.pl phage.cshl.org ftp
220 phage.cshl.org FTP server ready.
USER anonymous
331 Guest login ok, send your complete e-mail address as password.
PASS [email protected]
230 Guest login ok, access restrictions apply.
HELP
214-The following commands are recognized (* =>'s unimplemented).
USER PORT STO RMSAM* RNTO NLST MKD CDUP
PASS PASV APP EMRSQ* ABOR SITE XMKD XCUP
ACCT* TYPE MLFL* MRCP* DELE SYST RMD STOU
SMNT* STRU MAIL* ALLO CWD STAT XRMD SIZE
REIN* MODE MSND* REST XCWD HELP PWD MDTM
QUIT RETR MSOM* RNFR LIST NOOP XPWD
214 Direct comments to [email protected]
QUIT
221 Goodbye.
Connection closed by foreign host.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
The IO::Socket API 101
This client is suitable for talking to many line-oriented servers, but there is one Internet service that
you cannot successfully access via this client—the Telnet remote login service itself. This is because
Telnet servers initially exchange some binary protocol information with the client before starting the
conversation. If you attempt to use this client to connect to a Telnet port (port 23), you will just see
some funny characters and then a pause as the server waits for the client to complete the protocol
handshake. The Net::Telnet module (Chapter 6) provides a way to talk to Telnet servers.
Summary
The object-oriented IO::Socket library significantly simplifies network programming by eliminating
much of the "noise code" handed down from the socket API's C heritage and replacing it with a
clean and convenient interface. Throughout the remainder of this book we create IO::Socket objects
using the class's flexible new() method, and use its object methods whenever they offer a clear
syntactic advantage (e.g., $sock->accept()). In cases where there is no clear syntactic advant-
age, for example between read ($sock,$data,1024) and $sock->read ($data,1024), we
use the Perl built-ins in order to benefit from the performance gains.
With the IO::Socket module it is easy to write simple client and server programs, such as the Web
client and the reverse echo server demonstrated in this chapter. However, things became unex-
pectedly complicated when we tried to write a seemingly trivial tool to echo user commands to a
server and back again. We solved this by creating two processes using fork(). The struggle
against deadlock is a recurring theme in this book, which we deal with again in later chapters.
You have now learned how to build network applications on top of the low-level socket libraries. The
next part of this book moves up a level, showing you how to interact with application-level protocols.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
102
Perl's ease of use in writing network applications has spawned a large number of high-level modules
for everything from sending e-mail to accessing Web servers. There are many dozens of user-
contributed network client modules, ranging in complexity from a few to a few thousand lines of
code. Part II covers some of the popular client modules and shows how you can use them to solve
typical problems. They are built on top of the Berkeley socket API that we discussed in earlier
chapters.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
103
Two of the oldest Internet protocols are the File Transfer Protocol, FTP, and Telnet, for remote login.
They illustrate the two extremes of network protocols: An FTP session is a highly structured and
predictable set of transactions; a Telnet session is unpredictable and highly interactive. Perl has
modules that can tame them both.
Net::FTP
There's a directory on a remote FTP server that changes every few weeks. You want to mirror a
copy of the directory on your local machine and update your copy every time it changes. You can't
use one of the many "mirror" scripts to do this because the directory name contains a timestamp,
and you need to do a pattern match to identify the right directory. Net::FTP to the rescue.
Net::FTP is part of the libnet utilities by Graham Barr. In addition to Net::FTP, libnet includes
Net::SMTP, Net::NNTP, and Net::POP3 discussed in later chapters. When you install the libnet
modules, the install script prompts you for various default configuration parameters used by the
Net::* modules. This includes such things as an FTP firewall proxy and the default mail exchanger
for your domain. See the documentation for Net::Config (also part of the libnet utilities) for informa-
tion on how to override the defaults later.
Net::FTP, like many of the client modules, uses an object-oriented interface. When you first log in
to an FTP server, the module returns a Net::FTP object to you. You then use this object to get
directory listings from the server, to transfer files, and to send other commands.
A Net::FTP Example
Figure 6.1 is a simple example that uses Net::FTP to connect to ftp.perl.org and download the file
named RECENT from the directory /pub/CPAN/. If the program runs successfully, it creates a file
named RECENT in the current directory. This file contains the names of all files recently uploaded
to CPAN.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
FTP and Telnet 104
Lines 1–5: Initialize—We load the Net::FTP module and define constants for the host to connect
to and the file to download.
Line 6: Connect to remote host—We connect to the FTP host by invoking the Net::FTP new()
method with the name of the host to connect to. If successful, new() returns a Net::FTP object
connected to the remote server. Otherwise, it returns undef, and we die with an error message.
In case of failure, new() leaves a diagnostic error message in $@.
Line 7: Log in to the server—After connecting to the server, we still need to log in by calling the
Net::FTP object's login() method with a username and password. In this case, we are using
anonymous FTP, so we provide the username "anonymous" and let Net::FTP fill in a reasonable
default password. If login is successful, login() returns a true value. Otherwise, it returns false
and we die, using the FTP object's message() method to retrieve the text of the server's last
message.
Line 8: Change to remote directory—We invoke the FTP object's cwd() ("change working di-
rectory") method to enter the desired directory. If this call fails, we again die with the server's
last message.
Line 9: Retrieve the file—We call the FTP object's get() method to retrieve the desired file. If
successful, Net::FTP copies the remote file to a local one of the same name in the current
directory. Otherwise we die with an error message.
Lines 10–11: Quit—We call the FTP object's quit() method to close the connection.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
FTP and Telnet 105
Each response from the server to the client consists of one or more CRLF-delimited lines. The first
line always begin with a three-digit numeric result code indicating the outcome of the command.
This is usually followed by a human-readable message. For example, a successful USER command
will result in the following server response:
331 Guest login ok, send your complete e-mail address as password.
Sometimes a server response will stretch over several lines. In this case, the numeric result code
on the first line will end in a "-", and the result code will be repeated (without the dash) on the last
line. The FTP protocol's response to the HELP command illustrates this:
HELP
214-The following commands are recognized (* =>'s unimplemented).
USER PORT STOR MSAM* RNTO NLST MKD CDUP
PASS PASV APPE MRSQ* ABOR SITE XMKD XCUP
ACCT* TYPE MLFL* MRCP* DELE SYST RMD STOU
SMNT* STRU MAIL* ALLO CWD STAT XRMD SIZE
REIN* MODE MSND* REST XCWD HELP PWD MDTM
QUIT RETR MSOM* RNFR LIST NOOP XPWD
214 Direct comments to [email protected]
Commonly the client and server need to exchange large amounts of non-command data. To do this,
the client sends a command to warn the server that the data is coming, sends the data, and then
terminates the information by sending a lone dot (".") on a line by itself. We will see an example of
this in the next chapter when we examine the interaction between an e-mail client and an SMTP
server.
Server result codes are arbitrary but generally follow a simple convention. Result codes between
100 and 199 are used for informational messages, while those in the 200–299 range are used to
indicate successful completion of a command. Codes in the 300–399 range are used to indicate
that the client must provide more information, such as the password that accompanies a username.
Result codes of 400 or greater indicate various errors: the 400–499 codes are used for client errors,
such as an invalid command, while 500 and greater are used for server-side errors, such as an out
of memory condition.
Because command-based servers are so common, the libnet package comes with a generic building
block module called Net::Cmd. The module doesn't actually do anything by itself, but adds func-
tionality to descendents of the IO::Socket module that allow them to easily communicate with this
type of network server. Net::FTP, Net::SMTP, Net::NNTP, and Net::POP3 are all derived from
Net::Cmd.
The two major methods provided by Net::Cmd objects are command() and response():
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
FTP and Telnet 106
Subclasses of Net::Cmd build more sophisticated methods on top of the command() and
response(). For example, the Net::FTP login() method calls command() twice: once to issue
the USER command and again to issue the PASS command. You will not ordinarily call com-
mand() and response(), yourself, but use the more specialized (and convenient) methods pro-
vided by the subclass. However, command() and response() are available should you need
access to functionality that isn't provided by the module.
Several methods provided by Net::Cmd are commonly used by end-user applications. These are
code(), message(), and ok():
$code = $obj->code
Returns the three-digit numeric result code from the last response.
$message = $obj->message
Returns the text of the last message from the server. This is particularly useful for diagnosing errors.
$ok = $obj->ok
The ok() method returns true if the last server response indicated success, false otherwise. It returns true if
the result code is greater than 0 but less than 400.
Option Description
Firewall Name of the FTP proxy to use when your machine is behind certain types of firewalls
BlockSize Block size of transfers (default 10240)
Port FTP port to connect to (default 21)
Timeout Timeout value, in seconds, for various operations (default 120 seconds)
Debug Debug level; set to greater than zero for verbose debug messages
Passive Use FTP passive mode for all file transfers; required by some firewalls
Hash Prints a hash mark to STDERR for each 1024 bytes of data transferred
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
FTP and Telnet 107
@items = $ftp->dir([$directory])
Gets a long-format directory list of all the files and subdirectories in the indicated directory or, if not specified,
in the current working directory. In a scalar context, dir() returns a reference to an array rather than the list
itself.
In contrast to ls(), each member of the returned list is a line of a directory listing that provides the file modes,
ownerships, and sizes. It is equivalent to calling the ls command with the -lg options.
$success = $ftp->get($remote [,$local [, $offset]])
The get() method retrieves the file named $remote from the FTP server. You may provide a full pathname
or one relative to the current working directory.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
FTP and Telnet 108
The $local argument specifies the local pathname to store the retrieved file to. If not provided, Net::FTP
creates a file with the same name as the remote file in the current directory. You may also pass a filehandle
in $local, in which case the contents of the retrieved file are written to that handle. This is handy for sending
files to STDOUT:
$ftp->get('RECENT',\*STDOUT)
The $offset argument can be used to restart an interrupted transmission. It gives a position in the file that
the FTP server should seek before transmitting. Here's an idiom for using it to restart an interrupted transmis-
sion:
my $offset = (stat($file))[7] || 0;
$ftp->get($file,$file,$offset);
The call to stat() fetches the current size of the local file or, if none exists, 0. This is then used as the offset
to get().
$fh = $ftp->retr($filename)
Like get(), the retr() method can be used to retrieve a remote file. However, rather than writing the file to
a filehandle or disk file, it returns a filehandle that can be read from to retrieve the file directly. For example,
here is how to read the file named RECENT located on a remote FTP server without creating a temporary local
file:
$fh = $ftp->retr('REMOTE') or die "can't get file ",$ftp->
message;
print while <$fh>;
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
FTP and Telnet 109
%ftp_mirror.pl ftp.perl.org:/pub/CPAN/RECENT
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
FTP and Telnet 110
The next example mirrors the entire contents of the CPAN modules directory, recursively copying
the remote directory structure into the current local working directory (don't try this verbatim unless
you have a fast network connection and a lot of free disk space):
%ftp_mirror.pl ftp.perl.org:/pub/CPAN/
The script's command-line options include --user and --pass, to provide a username and password
for non-anonymous FTP, --verbose for verbose status reports, and --hash to print out hash marks
during file transfers.
Lines 1–5: Load modules—We load the Net::FTP module, as well as File::Path and Ge-
topt::Long. File::Path provides the mkpath() routine for creating a subdirectory with all its in-
termediate parents. Getopt::Long provides functions for managing command-line arguments.
Lines 6–19: Process command-line arguments—We process the command-line arguments, us-
ing them to set various global variables. The FTP host and the directory or file to mirror are
stored into the variables $HOST and $PATH, respectively.
Licensed by
Lines 20–23: Initialize the FTP connection—We call Net::FTP->new() to connect to the de-
sired host, and login() to log in. If no username and password were provided as command-
line arguments, we attempt an anonymous login. Otherwise, we attempt to use the authentica-
tion information to log in.
Stjepan Maric
After successfully logging in, we set the file transfer type to binary, which is necessary if we want
to mirror exactly the remote site, and we turn on hashing if requested.
Lines 24–26: Initiate mirroring—If all has gone well, we begin the mirroring process by calling
an internal subroutine do_mirror() with the requested path. When do_mirror() is done,
4218908
we close the connection politely by calling the FTP object's quit() method and exit.
Lines 27–36: do_mirror() subroutine—The do_mirror() subroutine is the main entry point
for mirroring a file or directory. When first called, we do not know whether the path requested
by the user is a file or directory, so the first thing we do is invoke a utility subroutine to make that
determination. Given a path on a remote FTP server, find_type() returns a single-character
code indicating the type of object the path points to, a "-" for an ordinary file, or a "d" for a
directory.
Having determined the type of the object, we split the path into the directory part (the prefix) and
the last component of the path (the leaf; either the desired file or directory). We invoke the FTP
object's cwd() method to change into the parent of the file or directory to mirror.
If the find_type() subroutine indicated that the path is a file, we invoke get_file() to mirror
the file. Otherwise, we invoke get_dir().
Lines 37–53: get_file() subroutine—This subroutine is responsible for fetching a file, but
only if it is newer than the local copy, if any. After fetching the file, we try to change its mode to
match the mode on the remote site. The mode may be provided by the caller; if not, we determine
the mode from within the subroutine.
We begin by fetching the modification time and the size of the remote file using the FTP object's
mdtm() and size() methods. Remember that these methods might return undef if we are
talking to an older server that doesn't support these calls. If the mode hasn't been provided by
the caller, we invoke the FTP object's dir() method to generate a directory listing of the re-
quested file, and pass the result to parse_listing(), which splits the directory listing line into
a three-element list consisting of the file type, name, and mode.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
FTP and Telnet 111
We now look for a file on the local machine with the same relative path and stat() it, capturing
the local file's size and modification time information. We then compare the size and modification
time of the remote file to the local copy. If the files are the same size, and the remote file is as
old or older than the local one, then we don't need to freshen our copy. Otherwise, we invoke
the FTP object's get() method to fetch the remote file. After the file transfer is successfully
completed, we change the file's mode to match the remote version.
Lines 54–73: get_dir() subroutine, recursive directory mirroring—The get_dir(), subrou-
tine is more complicated than get_file() because it must call itself recursively in order to
make copies of directories nested within it. Like get_file(), this subroutine is called with the
path of the directory and, optionally, the directory mode.
We begin by creating a local copy of the directory in the current working directory if there isn't
one already, using mkpath() to create intermediate directories if necessary. We then enter the
newly created directory with the chdir() Perl built-in, and change the directory mode if re-
quested.
We retrieve the current working directory at the remote end by calling the FTP object's pwd()
method. This path gets stored into a local variable for safekeeping. We now enter the remote
copy of the mirror directory using cwd().
We need to copy the contents of the mirrored directory to the local server. We invoke the FTP
object's dir() method to generate a full directory listing. We parse each line of the listing into
its type, pathname, and mode using the parse_listing() subroutine. Plain files are passed
to get_file(), symbolic_links() to make_link(), and subdirectories are passed re-
cursively to get_dir().
Having dealt with each member of the directory listing, we put things back the way they were
before we entered the subroutine. We call the FTP object's cwd(), routine to make the saved
remote working directory current, and chdir('..') to move up a level in the local directory
structure as well.
Lines 74–84: find_type() subroutine — find_type() is a not-entirely-satisfactory subrou-
tine for guessing the type of a file or directory given only its path. We would prefer to use the
FTP dir() method for this purpose, as in the preceding get_dir() call, but this is unreliable
because of slight differences in the way that the directory command works on different servers
when you pass it the path to a file versus the path to a directory.
Instead, we test whether the remote path is a directory by trying to cwd() into it. If cwd() fails,
we assume that the path is a file. Otherwise, we assume that the path is a directory. Note that
by this criterion, a symbolic link to a file is treated as a file, and a symbolic link to a directory is
treated as a directory. This is the desired behavior.
Lines 85–92: make_link() subroutine—The make_link() subroutine tries to create a local
symbolic link that mirrors a remote link. It works by assuming that the entry in the remote directory
listing denotes the source and target of a symbolic link, like this:
README.html -> index.html
We split the entry into its two components and pass them to the symlink(), built-in. Only
symbolic links that point to relative targets are created. We don't attempt to link to absolute paths
(such as "/CPAN") because this will probably not be valid on the local machine. Besides, it's a
security issue.
Lines 93–106: parse_listing() subroutine—The parse_listing() subroutine is invoked
by get_dir() to process one line of the directory listing retrieved by Net::FTP->dir(). This
subroutine is necessitated by the fact that the vanilla FTP protocol doesn't provide any other
way to determine the type or mode of an element in a directory listing. The subroutine parses
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
FTP and Telnet 112
the directory entry using a regular expression that allows variants of common directory listings.
The file's type code is derived from the first character of the symbolic mode field (e.g., the "d" in
drwxr-xr-x), and its mode from the remainder of the field. The filename is whatever follows
the date field.
The type, name, and mode are returned to the caller, after first converting the symbolic file mode
into its numeric form.
Lines 107–122: filemode() subroutine—This subroutine is responsible for converting a sym-
bolic file mode into its numeric equivalent. For example, the symbolic mode rw-r--r-- be-
comes octal 0644. We treat the setuid or setgid bits as if they were execute bits. It would be a
security risk to create a set-id file locally.
When we run the mirror script in verbose mode on CPAN, the beginning of the output looks like the
following:
% ftp_mirror.pl --verbose ftp.perl.org:/pub/CPAN
Getting directory CPAN/
Symlinking CPAN.html -> authors/Jon_Orwant/CPAN.html
Symlinking ENDINGS -> .cpan/ENDINGS
Getting file MIRRORED.BY
Getting file MIRRORING.FROM
Getting file README
Symlinking README.html -> index.html
Symlinking RECENT -> indices/RECENT-print
Getting file RECENT.html
Getting file ROADMAP
Getting file ROADMAP.html
Getting file SITES
Getting file SITES.html
Getting directory authors/
Getting file 00.Directory.Is.Not.Maintained.Anymore
Getting file 00upload.howto
Getting file 00whois.html
Getting file 01mailrc.txt.gz
Symlinking Aaron_Sherman -> id/ASHER
Symlinking Abigail -> id/ABIGAIL
Symlinking Achim_Bohnet -> id/ACH
Symlinking Alan_Burlison -> id/ABURLISON
...
When we run it again a few minutes later, we see messages indicating that most of the files are
current and don't need to be updated:
% ftp_mirror.pl --verbose ftp.perl.org:/pub/CPAN
Getting directory CPAN/
Symlinking CPAN.html -> authors/Jon_Orwant/CPAN.html
Symlinking ENDINGS -> .cpan/ENDINGS
Getting file MIRRORED.BY: not newer than local copy.
Getting file MIRRORING.FROM: not newer than local copy.
Getting file README: not newer than local copy.
...
The major weak point of this script is the parse_listing() routine. Because the FTP directory
listing format is not standardized, server implementations vary slightly. During development, I tested
this script on a variety of UNIX FTP daemons as well as on the Microsoft IIS FTP server. However,
this script may well fail with other servers. In addition, the regular expression used to parse directory
entries will probably fail on filenames that begin with whitespace.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
FTP and Telnet 113
Net::Telnet
FTP is the quintessential line-oriented server application. Every command issued by the client takes
the form of a single, easily parsed line, and each response from the server to the client follows a
predictable format. Many of the server applications that we discuss in later chapters, including POP,
SMTP, and HTTP, are similarly simple. This is because the applications were designed to interact
primarily with software, not with people.
Telnet is almost exactly the opposite. It was designed to interact directly with people, not software.
The output from a Telnet session is completely unpredictable, depending on the remote host's con-
figuration, the shell the user has installed, and the setup of the user's environment.
Telnet does some things that make it easy for human beings to use: It puts its output stream into a
mode that echoes back all commands that are sent to it, allowing people to see what they type, and
it puts its input stream into a mode that allows it to read and respond to one character at a time.
This allows command-line editing and full-screen text applications to work.
While these features make it easy for humans to use Telnet-based applications, it makes scripting
such applications a challenge. Because the Telnet protocol is more complex than sending com-
mands and receiving responses, you can't simply connect a socket to port 23 (Telnet's default) on
a remote machine and start exchanging messages. Before the Telnet client and server can talk,
they must engage in a handshake procedure to negotiate communications session parameters. Nor
is it possible for a Perl script to open a pipe to the Telnet client program because the Telnet, like
many interactive programs, expects to be opened on a terminal device and tries to change the
characteristics of the device using various ioctl() calls.
Given these factors, it is best not to write clients for interactive applications. Sometimes, though, it's
unavoidable. You may need to automate a legacy application that is available only as an interactive
terminal application. Or you may need to remotely drive a system utility that is only accessible in
interactive form. A classic example of the latter is the UNIX passwd program for changing users'
login passwords. Like Telnet, passwd expects to talk directly to a terminal device, and you must do
special work to drive it from a Perl script.
The Net::Telnet module provides access to Telnet-based services. With its facilities, you can log
into a remote host via the Telnet protocol, run commands, and act on the results using a straight-
forward pattern-matching idiom. When combined with the IO::Pty module, you can also use Net::Tel-
net to control local interactive programs.
Net::Telnet was written by Jay Rogers and is available on CPAN. It is a pure Perl module, and will
run unmodified on Windows and Macintosh systems. Although it was designed to interoperate with
UNIX Telnet daemons, it is known to work with the Windows NT Telnet daemon available on the
Windows NT Network Resource Kit CD and several of the freeware daemons.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
FTP and Telnet 114
Figure 6.3. remoteps.pl logs into a remote host and runs the "ps" command
Lines 1–3: Load modules—We load the Net::Telnet module. Because it is entirely object-orien-
ted, there are no symbols to import.
Lines 4–6: Define constants—We hard-code constants for the host to connect to, and the user
and password to log in as (no, this isn't my real password!). You'll need to change these as
appropriate for your system.
Line 7: Create a new Net::Telnet object—We call Net::Telnet->new() with the name of the
host. Net::Telnet attempts to connect to the host, returning a new Net::Telnet object if successful
or, if a connection could not be established, undef.
Line 8: Log in to remote host—We call the Telnet object's login() method with the username
and password. login() will attempt to log in to the remote system, and will return true if suc-
cessful.
Lines 9–10: Run the "ps" command—We invoke the cmd() method with the command to run,
in this case ps -ef. If successful, this method returns an array of lines containing the output of
the command (including the newlines). We print the result to standard output.
When we run the remoteps.pl script, there is a brief pause while the script logs into the remote host,
and then the output of the ps command appears, as follows:
% remoteps1.pl
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 Jun26 ? 00:00:04 init
root 2 1 0 Jun26 ? 00:00:15 [kswapd]
root 3 1 0 Jun26 ? 00:00:00 [kflushd]
root 4 1 0 Jun26 ? 00:00:01 [kupdate]
root 34 1 0 Jun26 ? 00:00:01 /sbin/cardmgr
root 114 1 30 Jun26 ? 19:18:46 [kapmd]
root 117 1 0 Jun26 ? 00:00:00 [khubd]
bin 130 1 0 Jun26 ? 00:00:00 /usr/sbin/rpc.portmap
root 134 1 0 Jun26 ? 00:00:25 /usr/sbin/syslogd
...
Net::Telnet API
To accommodate the many differences between Telnet implementations and shells among oper-
ating systems, the Net::Telnet module has a large array of options. We only consider the most
frequently used of them here. See the Net::Telnet documentation for the full details.
Net::Telnet methods generally have both a named-argument form and a "shortcut" form that takes
a single argument only. For example, new() can be called either this way:
my $telnet = Net::Telnet->new('phage.cshl.org');
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
FTP and Telnet 115
or like this:
my $telnet = Net::Telnet->new(Host=>'phage.cshl.org', Timeout=>5);
$telnet = Net::Telnet->new($host)
$telnet = Net::Telnet->new(Option1=>$value1,Option2=>$value2 ..)
The new() method creates a new Net::Telnet object. It may be called with a single argument containing the
name of the host to connect to, or with a series of option/ value pairs that provide finer control over the object.
new() recognizes many options, the most common of which are shown in Table 6.2.
The Host and Port options are the host and port to connect to, and Timeout is the period in seconds
that Net::Telnet will wait for an expected pattern before declaring a timeout.
Binmode controls whether Net::Telnet will perform CRLF translation. By default (Binmode=>0), ev-
ery newline sent from the script to the remote host is translated into a CRLF pair, just as the Telnet
client does it. Likewise, every CRLF received from the remote host is translated into a newline. With
Binmode set to a true value, this translation is suppressed and data is transmitted verbatim.
Cmd_remove_mode controls the removal of echoed commands. Most implementations of the Telnet
server echo back all user input. As a result, text you send to the server reappears in the data read
back from the remote host. If CMD_REMOVE_MODE is set to true, the first line of all data received
from the server will be stripped. A false value prevents stripping, and a value of "auto" allows
Net::Telnet to decide for itself whether to strip based on the "echo" setting during the initial Telnet
handshake.
Errmode determines what happens when an error occurs, typically an expected pattern not being
seen before the timeout. The value of Errmode can be one of the strings "die" (the default) or "return".
When set to "die", Net::Telnet dies on anerror, aborting your program. A value of "return" modifies
this behavior, so that instead of dying the failed method returns undef. You can then recover the
specific error message using errmsg(). In addition to these two strings, Errmode accepts either a
code reference or an array reference. Both of these forms are used to install custom handlers that
are invoked when an error occurs. The Net::Telnet documentation provides further information.
The value for Input_log should be a filename or a filehandle. All data received from the server is
echoed to this file or filehandle. Since the received data usually contains the echoed command, this
is a way to capture a transcript of the Net::Telnet session and is invaluable for debugging. If the
argument is a previously opened filehandle, then the log is written to that filehandle. Otherwise, the
argument is treated as the name of a file to open or create.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
FTP and Telnet 116
The Fhopen argument can be used to pass a previously opened filehandle to Net::Telnet for it to
use in communication. Net::Telnet will use this filehandle instead oftrying to open its own connection.
We use this later to coerce Net::Telnet into working across a Secure Shell link.
Prompt sets the regular expression that Net::Telnet uses to identify the shell command-line prompt.
This is used by the login() and cmd() methods to determine that the command ran to completion.
By default, Prompt is set to a pattern that matches the default sh, csh, ksh, and tcsh prompts.
Once a Net::Telnet object is opened you control it with several object modules:
$result = $telnet->login($username,$password)
$result = $telnet->login(Name => $username,
Password => $password,
[Prompt => $prompt,]
[Timeout=> $timeout])
The login() method attempts to log into the remote host using the provided username and password. In the
named-parameter form of the method call, you may override the values of Prompt and Timeout provided to
new().
If the Errmode is "die" and the login method encounters an error, the call aborts your script with an error
message. Otherwise, login() returns false.
$result = $telnet->print(@values)
Print a value or list of values to the remote host. A newline is automatically added for you unless you explicitly
disable this feature (see the Net::Telnet documentation for details). The method returns true if all of the data
was successfully written.
It is also possible to bypass Net::Telnet's character translation routines and write directly to the remote host
by using the Net::Telnet object as a filehandle:
print $telnet "ls -lF\015\012";
$result = $telnet->waitfor($pattern)
($before,$match) = $telnet->waitfor($pattern)
($before,$match) = $telnet->waitfor([Match=>$pattern,]
[String=>$string,]
[Timeout=>$timeout])
The waitfor() method is the workhorse of Net::Telnet. It waits up to Timeout seconds for the specified string
or pattern to appear on the data stream coming from the remote host. In a scalar context, waitfor() returns
a true value if the desired pattern was seen. In a list context, the method returns a two-element list consisting
of the data seen before the match and the matched string itself.
You can give waitfor() a regular expression to pattern match or a simple string, in which case
Net::Telnet uses index() to scan for it in incoming data. In the method's named-argument form,
use the Match argument for a pattern match, and String for a simple string match. You can specify
multiple alternative patterns or strings to match simply by providing more than one Match and/or
String arguments.
The strings used for MATCH must be correctly delimited Perl pattern match operators. For example,
"/bash> $/" and "m(bash> $)" will both work, but "bash> $" won't because of the absence
of pattern match delimiters.
In the single-argument form of waitfor(), the argument is a pattern match. The Timeout argument
may be used to override the default timeout value.
This code fragment will issue an ls -lF command, wait for the command line prompt to appear, and
print out what came before the prompt, which ought to be the output of the ls command:
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
FTP and Telnet 117
$telnet->print('ls -lF');
($before,$match) = $telnet->waitfor('/[$%#>] $/');
print $before;
To issue a command to the remote server and wait for a response, you can use one of several
versions of cmd():
$result = $telnet->cmd($command)
@lines = $telnet->cmd($command)
@lines = $telnet->cmd(String=>$command,
[Output=>$ref,] [Prompt=>$pattern,]
[Timeout=>$timeout,] [Cmd_remove_mode=>$mode]
The cmd() method is used to send a command to the remote host and return its output, if any. It is equivalent
to a print() of the command, followed by a waitfor() using the default shell prompt pattern.
In a scalar context, cmd() returns true if the command executed successfully, false if the method
timed out before the shell prompt was seen. In a list context, this method returns all the lines received
prior to matching the prompt.
In the named-argument form of the call, the Output argument designates either a scalar reference
or an array reference to receive the lines that preceded the match. The Prompt, Timeout, and
Cmd_remove_mode arguments allow you to override the corresponding settings.
Note that a true result from cmd() does not mean that the command executed successfully. It only
means that the command completed in the time allotted for it.
To receive data from the server without scanning for patterns, use get(), getline(), or get-
lines():
$data = $telnet->get([Timeout=>$timeout])
The get() method performs a timed read on the Telnet session, returning any data that is available. If no data
is received within the allotted time, the method dies if Errmode is set to "die" or returns undef otherwise. The
get() method also returns undef on end-of-file (indicating that the remote host has closed the Telnet ses-
sion). You can use eof() and timed_out() to distinguish these two possibilities.
$line = $telnet->getline([Timeout=>$timeout])
The getline() method reads the next line of text from the Telnet session. Like get(), it returns undef on
either a timeout or an end-of-file. You may change the module's notion of the input record separator using the
input_record_separator() method, described below.
@lines = $telnet->getlines([Timeout=>$timeout])
Return all available lines of text, or an empty list on timeout or end-of-file.
Finally, several methods are useful for debugging and for tweaking the communications session:
$msg = $telnet->errmsg
This method returns the error message associated with a failed method call. For example, after a timeout on
a waitfor(), errmsg() returns "pattern match timed-out."
$line = $telnet->lastline
This method returns the last line read from the object. It's useful to examine this value after the remote host
has unexpectedly terminated the connection because it might contain clues to the cause of this event.
$value = $telnet->input_record_separator([$newvalue])
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
FTP and Telnet 118
$value = $telnet->output_record_separator([$newvalue])
These two methods get and/or set the input and output record separators. The input record separator is used
to split input into lines, and is used by the getline(), getlines(), and cmd() methods. The output record
separator is printed at the end of each line output by the print() method. Both values default to \n.
$value = $telnet->prompt([$newvalue])
$value = $telnet->timeout([$newvalue])
$value = $telnet->binmode([$newvalue])
$value = $telnet->errmode([$newvalue])
These methods get and/or set the corresponding settings, and can be used to examine or change the defaults
after the Telnet object is created.
$telnet->close
The close() method severs the connection to the remote host.
This command line requests the script to change the current user's password on the three machines
chiron, masdorf, and sceptre. The script reports success or failure to change the password on each
of the indicated machines.
The script uses the UNIX passwd program to do its work. In order to drive passwd, we need to
anticipate its various prompts and errors. Here's a sample of a successful interaction:
% passwd
Changing password for lstein
Old password: xyzzy
Enter the new password (minimum of 5, maximum of 8 characters)
Please use a combination of upper and lower case letters and numbers.
New password: plugn
Re-enter new password: plugn
Password changed.
At the three password: prompts I typed my current and new passwords. However, the passwd
program turns off terminal echo so that the passwords don't actually display on the screen.
A number of errors may occur during execution of passwd. In order to be robust, the password-
changing script must detect them. One error occurs when the original password is typed incorrectly:
% passwd
Changing password for lstein
Old password: xyzyy
Incorrect password for lstein.
The password for lstein is unchanged.
Another error occurs when the new password doesn't satisfy the passwd program's criteria for a
secure, hard-to-guess password:
% passwd
Changing password for lstein
Old password: xyzzy
Enter the new password (minimum of 5, maximum of 8 characters)
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
FTP and Telnet 119
Please use a combination of upper and lower case letters and numbers.
New password: hi
Bad password: too short. Try again.
New password: aaaaaaaaaa
Bad password: a palindrome. Try again.
New password: 12345
Bad password: too simple. Try again.
This example shows several attempts to set the password, each one rejected for a different reason.
The common part of the error message is "Bad password." We don't have to worry about a third
common error in running passwd, which is failing to retype the password correctly at the confirmation
prompt.
The change_passwd.pl script is listed in Figure 6.4.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
FTP and Telnet 120
Licensed by
Stjepan Maric
4218908
Lines 1–4: Load modules—We load Net::Telnet and the Getopt::Long module for command-line
option parsing.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
FTP and Telnet 121
Lines 5–12: Define constants—We create a DEBUG flag. If this is true, then we instruct the
Net::Telnet module to log all its input to a file named passwd.log. This file contains password
information, so be sure to delete it promptly. The USAGE constant contains the usage statement
printed when the user fails to provide the correct command-line options.
Lines 13–19: Parse command line options—We call GetOptions() to parse the command-
line options. We default to the current user's login name if none is provided explicitly using the
LOGNAME environment variable. The old and new password options are mandatory.
Line 20: Invoke change_passwd() subroutine—For each of the machines named on the com-
mand line, we invoke an internal subroutine named change_passwd(), passing it the name
of the machine, the user login name, and the old and new passwords.
Lines 21–41: change_passwd() subroutine—Most of the work happens in change_
passwd(). We begin by opening up a new Net::Telnet object on the indicated host, and then
store the object in a variable named $shell. If DEBUG is set, we turn on logging to a hard-coded
file. We also set errmode() to "return" so that Net::Telnet calls will return false rather than dying
on an error.
We now call login() to attempt to log in with the user's account name and password. If this fails,
we return with a warning constructed from the Telnet object's errmsg() routine.
Otherwise we are at the login prompt of the user's shell. We invoke the passwd command and wait
for the expected "Old password:" prompt. If the prompt appears within the timeout limit, we send
the old password to the server. Otherwise, we return with an error message.
Two outcomes are possible at this point. The passwd program may accept the password and prompt
us for the new password, or it may reject the password for some reason. We wait for either of the
prompts to appear, and then examine the match string returned by waitfor() to determine which
of the two patterns we matched. In the former case, we proceed to provide the new password. In
the latter, we return with an error message.
After the new desired password is printed (line 33), there are again two possibilities: passwd may
reject the proposed password because it is too simple, or it may accept it and prompt us to confirm
the new password. We handle this in the same way as before.
The last step is to print the new password again, confirming the change. We do not expect any errors
at this point, but we do wait for the "Password changed" confirmation before reporting success.
Because there is little standardization among passwd programs, this script is likely to work only with
those variants of UNIX that use a passwd program closely derived from the BSD version. To handle
other passwd variants, you will need to modify the pattern matches appropriately by including other
Match patterns in the calls to waitfor().
Running change_passwd.pl on a network of Linux systems gives output like this:
% change_passwd.pl --user=george --old=m00nd0g --new=swampH0und \
localhost pesto prego romano
Password changed for george on localhost.
Password changed for george on pesto.
Password changed for george on prego.
Password changed for george on romano.
While change_passwd.pl is running, the old and new passwords are visible to anyone who runs a
ps command to view the command lines of running programs. If you wish to use this script in pro-
duction, you will probably want to modify it so as to accept this sensitive information from standard
input. Another consideration is that the password information is passed in the clear, and therefore
vulnerable to network sniffers. The SSH-enabled password-changing script in the next section
overcomes this difficulty.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
FTP and Telnet 122
The ssh client takes an optional -l command-line switch to set the name of the user to log in as, and
the name of the remote host (we use the short name rather than the fully qualified DNS name in
this case). ssh prompts for the password on the remote host, and then attempts to log in.
To work with ssh, we have to make two changes to change_passwd.pl: (1) we open a pseudoter-
minal on the ssh client and pass the controlling filehandle to Net::Telnet->new() as the
Fhopen argument and (2) we replace the call to login() with our own pattern matching routine so
as to handle ssh's login prompt.
The IO::Pty module, available on CPAN, has a simple API:
$pty = IO::Pty->new
The new() method takes no arguments and returns a new IO::Pty pseudoterminal object. The returned object
is a filehandle corresponding to the controlling end of the pipe. Your script will ordinarily use this filehandle to
send commands and read results from the program you're driving.
$tty = $pty->slave
Given a pseudoterminal created with a call to IO::Pty->new(), the slave(), method returns the TTY half
of the pipe. You will ordinarily pass this filehandle to the program you want to control.
Figure 6.5 shows the idiom for launching a program under the control of a pseudoterminal. The
do_cmd() subroutine accepts the name of a local command to run and a list of arguments to pass
it. We begin by creating a pseudoterminal filehandle with IO::Pty->new() (line 3). If successful,
we fork(), and the parent process returns the pseudoterminal to the caller. The child process,
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
FTP and Telnet 123
however, has a little more work to do. We first detach from the current controlling TTY by calling
POSIX::setsid() (see Chapter 10 for details). The next step is to recover the TTY half of the
pipe by calling the IO::Pty object's slave(), method, and then close the pseudoterminal half (lines
7–8).
We now reopen STDIN, STDOUT, and STDERR on the new TTY object using fdopen(), and close
the now-unneeded copy of the filehandle (lines 9–12). We make STDOUT unbuffered and invoke
exec() to run the desired command and arguments. When the command runs, its standard input
and output will be attached to thenew TTY, which in turn will be attached to the pseudo-tty controlled
by the parent process.
With do_cmd() written, the other changes to change_passwd.pl are relatively minor. Figure
6.6 shows the revised script written to use the ssh client, change_passwd_ssh.pl.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
FTP and Telnet 124
Lines 1–6: Load modules—We load IO::Pty and the setsid() routine from the POSIX module.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
FTP and Telnet 125
Lines 7–23: Process command-line arguments and call change_passwd()—The only change
here is a new constant, PROMPT, that contains the pattern match that we will expect from the
user's shell command prompt.
Lines 24–27: Launch ssh subprocess—We invoke do_cmd() to run the ssh program using the
requested username and host. If do_cmd() is successful, it returns a filehandle connected to
the pseudoterminal driving the ssh subprocess.
Lines 28–31: Create and initialize Net:: Telnet object —In the change_passwd() routine, we
create a new Net::Telnet object, but now instead of allowing Net::Telnet to open a connection
to the remote host directly, we pass it the ssh filehandle using the Fhopen argument. After
creating the Net::Telnet object, we configure it by putting it into binary mode with binmode(),
setting the input log for debugging, and setting the error mode to "return". The use of binary
mode is a small but important modification of the original script. Since the SSH protocol termi-
nates its lines with a single LF character rather than CRLF pairs, the default Net::Telnet CRLF
translation is inappropriate.
Lines 32–34: Log in—Instead of calling Net::Telnet's built-in login() method, which expects
Telnet-specific prompts, we roll our own by waiting for the ssh "password:" prompt and then
providing the appropriate response. We then wait for the user's command prompt. If, for some
reason, this fails, we return with an error message.
Lines 35–49: Change password—The remainder of the change_passwd() subroutine is iden-
tical to the earlier version.
Lines 50–65: do_cmd() subroutine—This is the same subroutine that we examined earlier.
The change_passwd_ssh.pl program now uses the Secure Shell to establish connections to the
indicated machines and change the user's password. This is a big advantage over the earlier ver-
sion, which was prone to network eavesdroppers who could intercept the new password as it passed
over the wire in unencrypted form. On multiuser systems you will still probably want to modify the
script to read the passwords from standard input rather than from the command line.
For completeness, Figure 6.7 lists a routine, prompt_for_passwd(i), that uses the UNIX stty program
to disable command-line echo temporarily while the user is typing the password. You can use it like
this:
$old = get_password('old password');
$new = get_password('new password');
A slightly more sophisticated version of this subroutine, which takes advantage of the Term::Read-
Key module, if available, appears in Chapter 20.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
FTP and Telnet 126
Summary
This chapter covered Perl client modules for two of the most widespread application-level protocols,
FTP and Telnet. Together they illustrate the extremes of application protocols, from a rigidly defined
command language designed to interact with client programs to a loose interactive environment
designed for people.
The Net::FTP module allows you to write scripts to automatically connect to FTP sites, explore their
holdings, and selectively download or upload files. Net::Telnet's flexible pattern matching facilities
give you the ability to write scripts to automate processes that were designed primarily for the con-
venience of people rather than software.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
127
E-mail is one of the oldest Internet applications, and it should come as no surprise that many client-
side modules have been written to enable Perl to interoperate with the mail system. Various modules
allow you to send and receive mail, manipulate various mailbox formats, and work with MIME at-
tachments.
Net::SMTP
Net::SMTP operates at the lowest level of the e-mail access modules. It interacts directly with the
SMTP daemons to transmit e-mail across the Internet. To use it effectively, you must know a bit
about the innards of SMTP. The payoff for this added complexity is that Net::SMTP is completely
portable, and works as well from Macintoshes and Windows machines as from UNIX systems.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
SMTP: Sending Mail 128
The language spoken by SMTP servers is a simple human-readable line-oriented protocol. Figure
7.1 shows the interaction needed to send a complete e-mail manually using Telnet as the client (the
client's input is in bold).
After connecting to the SMTP port, the server sends us a code "220" message containing a banner
and greeting. We issue a HELO command, identifying the hostname of the client machine, and the
server responds with a "250" message, which essentially means "OK."
After this handshake, we are ready to send some mail. We issue a MAIL command with the argu-
ment <From: sender's address>, to designate the sender. If the sender is OK, the server
responds with another "250" reply. We now issue a RCPT ("recipient") command with the argument
<To: recipient's address> to indicate the recipient. The server again acknowledges the
command. Some SMTP servers have restrictions on the senders and recipients they will service;
for example, they may refuse to relay e-mail to remote domains. Inthis case, they respond with a
variety of error codes in the 500 to 599 range. It is possible to issue multiple RCPT commands for
e-mail that has several recipients at the site(s) served by the SMTP server.
Having established that the sender and recipient(s) are OK, we send the DATA command. The server
responds with a message prompting us for the e-mail message. The server will accept lines of input
until it sees a line containing just a ".".
Internet mail has a standard format consisting of a set of header lines, ablank line, and the body of
the message. Even though we have already specified the sender and recipient, we must do so again
in order to create a valid e-mailmessage. A minimal mail header has a From: field, indicating the
sender, a To: field, indicating the recipient, and a Subject: field. Other standard fields, such as the
date, are filled in automatically by the mail daemon.
We add a blank line to separate the header from the body, enter the e-mail message text, and
terminate the message with a dot. The server's code 250 acknowledgment indicates that the mes-
sage was queued successfully for delivery.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
SMTP: Sending Mail 129
We could now send additional messages by issuing further MAIL commands, but instead we dis-
connect politely by issuing the QUIT command. The full specification of the SMTP protocol can be
found in RFC 821. The standard format for Internet mail headers is described in RFC 822.
If the connection is refused (or times out), new() returns false. Here's an example of contacting the
mail server for the cshl.org domain with a timeout of 60 seconds.
$smtp = Net::SMTP->new('mail.cshl.org',Timeout=>60);
Once the object is created, you can send or retrieve information to the server by calling object
methods. Some are quite simple:
$banner = $smtp->banner()
$domain = $smtp->domain()
Immediately after connecting to an SMTP server, you can retrieve the banner and/or domain name with which
it identified by calling these two methods.
To send mail, you will first call the mail() and recipient() methods to set up the exchange:
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
SMTP: Sending Mail 130
If successful, this method returns a true value. Otherwise, it returns undef, and the inherited message()
method can be used to return the text of the error message.
$success = $smtp->recipient($address1,$address2,$address3,...)
@ok_addr = $smtp-> recipient($addr1,$addr2,$addr3,...,{SkipBad=>1})
The recipient() method issues an RCPT command to the server. The arguments are a list of valid e-mail
addresses to which the mail is to be delivered. The list of addresses may be followed by a hash reference
containing various options.
The addresses passed to recipient() must all be acceptable to the server, or the entire call will
return false. To modify this behavior, pass the option SkipBad in the options hash. The module now
ignores addresses rejected by the server, and returns the list of accepted addresses as its result.
For example:
@ok=$smtp->recipient('[email protected]','[email protected]',{SkipBad=>1})
Licensed by
Provided that the server has accepted the sender and recipient, you may now commence sending
the message text using the data(), datasend(), and dataend() methods.
Stjepan Maric
$success = $smtp->data([$text])
The data() method issues a DATA command to the server. If called with a scalar argument, it transmits the
value of the argument as the content (header and body) of the e-mail message. If you wish to send the message
one chunk at a time, call data without an argument and make a series of calls to the datasend() method.
This method returns a value indicating success or failure of the command.
4218908
$success = $smtp->datasend(@data)
After calling data() without an argument, you may call datasend() one or more times to send lines of e-
mail text to the server. Lines starting with a dot are automatically escaped so as not to terminate the trans-
mission prematurely.
You may call datasend() with an array reference, if you prefer. This method and dataend() are both
inherited from the Net::Cmd base class.
$success = $smtp->dataend
When your e-mail message is sent, you should call dataend() to transmit the terminal dot. If the message
was accepted for delivery, the return value is true.
Two methods are useful for more complex interactions with SMTPservers:
$smtp->reset
This sends an RSET command to the server, aborting mail transmission operations in progress. You might call
this if one of the desired recipients is rejected by the server; it resets the server so you can try again.
$valid = $smtp->verify($address)
@recipients = $smtp->expand($address)
The expand() and verify() methods can be used to check that a recipient address is valid prior to trying
to send mail. verify() returns true if the specified address is accepted.
expand() does something more interesting. If the address is valid, it expands it into one or more aliases, if
any exist. This can be used to identify forwarding addresses and mailing list recipients. The method returns a
list of aliases or, if the specified address is invalid, an empty list. For security reasons, many mail administrators
disable this feature, in which case, the method returns an empty list.
Finally, when you are done with the server, you will call the quit() method:
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
SMTP: Sending Mail 131
$smtp->quit
This method politely breaks the connection with the server.
Using Net::SMTP
With Net::SMTP we can write a one-shot subroutine for sending e-mail. The mail() subroutine
takes two arguments: the text of an e-mail message to send (required), and the name of the SMTP
host to use (optional). Call it like this:
$msg = ≪ 'END';
From: John Doe <[email protected]>
To: L Stein <[email protected]>
Cc: [email protected], [email protected]
Subject: hello there
Regards, JD
END
We create the text of the e-mail message using the here-is (≪) syntax and store it in the variable
$msg. The message must contain an e-mail header with (at a minimum) the From: and To: fields.
We pass the message to the mail() subroutine, which extracts the sender and recipient fields and
invokes Net::SMTP to do the dirty work. Figure 7.2 shows how mail() works.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
SMTP: Sending Mail 132
Lines 1–9: Parse the mail message—We split the message into the header and the body by
splitting on the first blank line. Header fields frequently contain continuation lines that begin with
a blank, so we fold those into a single line.
We parse the header into a hash using a simple pattern match, and store the From: and To:
fields in local variables. The To: field can contain multiple recipients, so we isolate the individual
addressees by splitting on the comma character (this will fail in the unlikely case that any of the
addresses contain commas). We do likewise if the header contained a Cc: field.
Lines 10–16: Send messages—We create a new Net::SMTP object and call its mail(), and
recipient() methods to initiate the message. The call to recipient() uses the SkipBad
option so that the method will try to deliver the mail even if the server rejects some of the recip-
ients. We compare the number of recipients the server accepted to the number we attempted,
returning from the subroutine if none were accepted, or just printing a warning if only some were
rejected.
We call data() to send the complete e-mail message to the server, and quit() to terminate
the connection.
Although this subroutine does its job, it lacks some features. For example, it doesn't handle the Bcc:
field, which causes mail to be delivered to a recipient without that recipient appearing in the header.
The MailTools module, described next, corrects the deficiencies.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
SMTP: Sending Mail 133
MailTools
The MailTools module, also written by Graham Barr, is a high-level object-oriented interface to the
Internet e-mail system. MailTools, available on CPAN, provides a flexible way to create and ma-
nipulate RFC 822-compliant e-mail messages. Once the message is composed, you can send it off
using SMTP or use one of several UNIX command-line mailer programs to do the dirty work. This
might be necessary on a local network that does not have direct access to an SMTP server.
Using MailTools
A quick example of sending an e-mail from within a script will give you the flavor of the MailTools
interface (Figure 7.3).
Lines 1–2: Load modules—We bring in the Mail::Internet module. It brings in other modules that
it needs, including Mail::Header, which knows how to format RFC 822 headers, and Mail::Mailer,
which knows how to send mail by a variety of methods.
Lines 3–8: Create header—We call Mail::Header->new to create a new header object, which
we will use to build the RFC 822 header. After creating the object, we call its add() method
several times to add the From:, To:, Cc:, and Subject: lines. Notice that we can add the same
header multiple times, as we do with the Cc: line. Mail::Header will also insert other required
RFC 822 headers on its own.
Lines 9–13: Create body—We create the body text, which is just a block of text.
Lines 14–16: Create the Mail::Internet object—We now create a new Mail::Internet object by
calling the package's new() method. The named arguments include Header, to which we pass
the header object that we just created, and Body, which receives the body text. The Body ar-
gument expects an array reference containing discrete lines of body text, so we wrap $body
into an anonymous array reference. Modify, the third argument to new(), flags Mail::Internet
that it is OK to reformat the header lines to meet restrictions on line length that some SMTP
mailers impose.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
SMTP: Sending Mail 134
Line 17: Send mail—We call the newly created Mail::Internet object's send() method with an
argument indicating the sending method to use. The "sendmail" argument indicates that Mail::In-
ternet should try to use the UNIX sendmail program to deliver the mail.
Although at first glance Mail::Internet does not hold much advantage over the Net::SMTP-based
mail() subroutine we wrote in the previous section, the ability to examine and manipulate
Mail::Header objects gives MailTools its power. Mail::Header is also the base class for MIME::Head,
which manipulates MIME-compliant e-mail headers that are too complex to be handled manually.
Mail::Header
E-mail headers are more complex than they might seem at first. Some fields occur just once, others
occur multiple times, and some allow multiple values to be strung together by commas or another
delimiter. A field may occupy a single line, or may be folded across multiple lines with leading white-
space to indicate the presence of continuation lines. The mail system also places an arbitrary limit
on the length of a header line. Because of these considerations, you should be cautious of con-
structing e-mail headers by hand for anything much more complicated than the simple examples
shown earlier.
The Mail::Header module simplifies the task of constructing, examining, and modifying RFC 822
headers. Once constructed, a Mail::Header object can be passed to Internet::Mail for sending.
Mail::Header controls the syntax but not the content of the header, which means that you can con-
struct a header with fields that are not recognized by the mail subsystem. Depending on the mailer,
a message with invalid headers might make it through to its destination, or it might get bounced. To
avoid this, be careful to limit headers to the fields listed in the SMTP and MIME RFCs (RFC 822
and RFC 2045, respectively). Table 7.2 gives some of the common headers in e-mail messages.
Fields that begin with X- are meant to be used as extensions. You can safely build a header con-
taining any number of X- fields, and the fields will be passed through unmodified by the mail system.
For example:
$header = Mail::Header->new(Modify=>1);
$header->add('X-Mailer' => "Fido's mailer v1.0");
$header->add('X-HiMom' => 'Hi mom!');
Mail::Header supports a large number of methods. The following list gives the key methods. To
create a new object, call the Mail::Header new() method.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
SMTP: Sending Mail 135
@options, if provided, is a list of named arguments that control various header options. The one used most
frequently is Modify, which if set true allows Mail::Header to reformat header lines to make them fully RFC 822-
compliant. For example:
open HEADERS,"./mail.msg";
$head = Mail::Header(\*HEADERS, Modify=>1);
Once a Mail::Header object is created, you may manipulate its contents in several ways:
$head->read(FILEHANDLE)
As an alternative way to populate a header object, you can create an empty object by calling new() with no
arguments, and then read in the headers from a filehandle using read().
$head->add($name,$value [,$index])
$head->replace($name,$value [,$index])
$head->delete($name [,$index])
The add(), replace(), and delete() methods allow you to modify the Mail::Header object. Each takes
the name of the field to operate on, the value for the field, and optionally an index that selects a member of a
multivalued field.
The add() method appends a field to the header. If $index is provided, it inserts the field into the indicated
position; otherwise, it appends the field to the end of the list.
The replace() method replaces the named field with the indicated value. If the field is multivalued, then
$index is used to select which value to replace; otherwise, the first field is replaced.
Delete() removes the indicated field.
All three of these methods accept a shortcut form that allows you to specify the field name and value
in a single line. This shortcut allows you to replace the Subject line like this:
$head->replace('Subject: returned to sender')
To retrieve information about a header object, you use get() to get the value of a single field, or
tags() and commit() to get information about all the available fields.
Finally, three methods are useful for exporting the header in various forms:
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
SMTP: Sending Mail 136
$string = $head->as_string
Returns the entire header as a string in the form that will appear in the message.
$hashref = $head->header_hashref([\%headers])
The header_hashref() method returns the headers as a hash reference. Each key is the unique name of
a field, and each value is an array reference containing the header's contents. This form is suitable for passing
to Mail::Mailer->open(), as described later in this chapter.
You may also use this method to set the header by passing it a hash reference of your own devising. The
composition of \%headers is similar to header_hashref()'s result, but the hash values can be simple
scalars if they are not multivalued.
$head->print([FILEHANDLE])
Prints header to indicated filehandle or, if not specified, to STDOUT. Equivalent to:
print FILEHANDLE $head->as_string
Mail::Internet
The Mail::Internet class is a high-level interface to e-mail. It allows you to create messages, ma-
nipulate them in various ways, and send them out. It was designed to make it easy to write autor-
esponders and other mail-processing utilities.
As usual, you create a new object using the Mail::Internet new() method:
Once the object is created, several methods allow you to examine and modify its contents:
$arrayref = $mail->body
The body() method returns the body of the e-mail message as a reference to an array of lines of text. You
may manipulate these lines to modify the body of the message.
$header = $mail->head
The head() method returns the message's Mail::Header object. Modifying this object changes the message
header.
$string = $mail->as_string
$string = $mail->as_mbox_string
The as_string() and as_mbox_string() methods both return the message (both header and body) as
a single string. The as_mbox_string() function returns the message in a format suitable for appending to
UNIX mbox-format mailbox files.
$mail->print([FILEHANDLE})
$mail->print_header([FILEHANDLE})
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
SMTP: Sending Mail 137
$mail->print_body([FILEHANDLE})
These three methods print all or part of the message to the designated filehandle or, if not otherwise specified,
STDOUT.
$mail->add_signature([$file])
$mail->remove_sig([$nlines])
These two methods manipulate the signatures that are often appended to the e-mail messages. The
add_signature() function appends the signature contained in $file to the bottom of the e-mail message.
If $file is not provided, then the method looks for the file $ENV{HOME}/.signature.
remove_sig() scans the last $nlines of the message body looking for a line consisting of the characters
"--", which often sets the body off from the signature. The line and everything below it is removed. If not speci-
fied, $nlines defaults to 10.
$reply = $mail->reply
The reply() method creates a new Mail::Internet object with the header initialized to reply to the original
message, and the body text indented. This is suitable for autoreply applications.
Finally, the send() method sends the message via the e-mail system:
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
SMTP: Sending Mail 138
This autoreply script takes advantage of a feature of the UNIX mail system that allows incoming e-
mail to be piped to a program. Provided that you're using such a system, you may activate the script
by creating a .forward file in your home directory that contains lines like the following:
lstein
| /usr/local/bin/autoreply.pl
Replace the first line with your login name, and the second with the path to the autoreply script. This
tells the mail subsystem to place one copy of the incoming mail in the user-specific inbox, and to
send another copy to the standard input of the autoreply.pl script.
Let's step through autoreply.pl.
Lines 1–3: Load modules—We turn on strict type checking and load the Mail::Internet module.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
SMTP: Sending Mail 139
Lines 4–7: Define constants—One problem with working with programs run by the mailer dae-
mon is that the standard user environment isn't necessarily set up. This means that
$ENV{HOME} and other standard environment variables may not exist. Our first action, therefore,
is to look up the user's home directory and login name and store them in appropriate constants.
Lines 4 and 5 use the getpwuid() function to retrieve this information. We then use the
HOME constant to find the locations of the .vacation and .signature files.
Lines 8–9: Create a Mail::Internet object—We check that the .vacation file is present and, if
it is not, exit. Otherwise, we create a new Mail::Internet object initialized from the message sent
us on STDIN.
Lines 10–19: Check that the message should be replied to—We shouldn't autoreply to certain
messages, such as those sent to us in the Cc: line, or those distributed to a mailing list. Another
type of message we should be very careful not to reply to are bounced messages; replying to
those has the potential to set up a nasty infinite loop. The next section of the code tries to catch
these situations.
We recover the header by calling the Mail::Internet object's head() method, and perform a
series of pattern matches on its fields. First we check that our username is mentioned on the
To: line. If not, we may be receiving this message as a Cc: or as a member of a mailing list. We
next check the Precedence: field. If it's "bulk," then this message is probably part of a mass
mailing. If the Subject: line contains the strings "returned mail" or "bounced mail", or if the sender
is the mail system itself (identified variously as "mailer daemon," "mail subsystem," or "post-
master"), then we are likely dealing with returned mail and we shouldn't reply or risk setting up
a loop. In each of these cases, we just exit normally.
Lines 20–21: Generate reply—To create a new message initialized as a reply to the original, we
call the mail message object's reply() method.
Lines 22–26: Prepend vacation message to text—The reply() method will have created body
text consisting of the original message quoted and indented. We prepend the contents of
the .vacation file to this. We open the contents of .vacation, call the mail message's
body() method to return a reference to the array of body lines, and then use unshift() to
insert the contents of .vacation in front of the body. We could replace the body entirely, if we
preferred.
Lines 27–28: Add signature—We call the reply's add_signature() method to append the
contents of the user's signature file, if any, to the bottom of the message body.
Lines 29–30: Send message—We call the reply's send() method to send the message by the
most expedient means.
Here is an example of a reply issued by the autoreply.pl script in response to the sample mes-
sage we composed with Net::SMTP in the previous section. The text at the top came from
~/.vacation and the signature at the bottom from ~/.signature. The remainder is quoted from
the original message.
To: John Doe <[email protected]>
From: L Stein <[email protected]>
Subject: Re: hello there
Date: Fri, 7 Jul 2000 08:12:17 -0400
Message-Id: <200007071212.IAA12128@pesto>
Hello,
Lincoln
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
SMTP: Sending Mail 140
> Regards, JD
--
======================================================================
Lincoln D. Stein Cold Spring Harbor Laboratory
======================================================================
If you adapt this autoreply program to your own use, you might want to check the size of the quoted
body and delete it if it is unusually large. Otherwise, you might inadvertently echo back a large binary
enclosure.
For complex e-mail-processing applications, you should be sure to check out the procmail program,
which uses a special-purpose programming language to parse and manipulate e-mail. A number of
sophisticated applications have been written on top of procmail, including autoresponders, mailing
list generators, and filters for spam mail.
Mail::Mailer Licensed by
The last component of MailTools that we consider is Mail::Mailer, which is used internally by
Mail::Internet to deliver mail. Mail::Mailer provides yet another interface for sending Internet mail.
Stjepan Maric
Although it doesn't provide Mail::Internet's header- and body-handling facilities, I find it simpler and
more elegant to use in most circumstances.
Unlike Net::SMTP and Mail::Internet, which use object methods to compose and send mail, the
Mail::Mailer object acts like a filehandle. This short code fragment shows the idiom:
use Mail::Mailer;
my $mailer = Mail::Mailer->new;
$mailer->open( {To 4218908
=> '[email protected]',
From => '[email protected]',
CC => ['[email protected]','[email protected]'],
Subject => 'hello there'});
print $mailer "This is just a simple e-mail message.\n";
print $mailer "Nothing to get excited about.\n\n";
print $mailer "Regards, JD\n";
$mailer->close;
After creating the object with new(), we initialize it by calling open() with a hash reference con-
taining the contents of the e-mailer header. We then use the mailer object as a filehandle to print
several lines of the body text. Then we call the object's close() method to finish processing the
message and send it out.
The complete list of Mail::Mailer methods is relatively short.
The contents of @args depends on the method. In the "mail" and "sendmail" methods, whatever
you provide in @args is appended to the command line used to invoke the mail and sendmail
programs. For the "smtp" method, you can pass the named argument Server to specify the SMTP
server to use. For example:
$mailer = Mail::Mailer->new('smtp',Server => 'mail.lsjs.org')
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
SMTP: Sending Mail 141
Internally, Mail::Mailer opens up a pipe to the indicated mailer program unless "smtp" is specified,
in which case it uses Net::SMTP to send the message. If no method is explicitly provided, then
Mail::Mailer scans the command PATH looking for the appropriate executables and chooses the
first method it finds, beginning with "mail." The Mail::Mailer documentation describes how you can
alter this search order by setting the PERL_MAILERS environment variable.
Once created, you initialize the Mail::Mailer object with a set of header fields:
$fh = $mailer->open(\%headers)
The open() method begins a new mail message with the specified headers. For the "mail", "sendmail", and
"test" mailing methods, this call forks and execs the mailer program and then returns a pipe opened on the
mailer. For the "smtp" method, open(), returns a tied filehandle that intercepts calls to print() and passes
them to the datasend() method of Net::SMTP. The returned filehandle is identical to the original Mail::Mailer
object, so you are free to use it as a Boolean indicating success or failure of the open() call.
The argument to open() is a hash reference whose keys are the fields of the mail header, and
whose values can be scalars containing the contents of the corresponding field, or array references
containing the values for multivalued fields such as Cc: or To:. This format is compatible with the
header_hashref(), method of the Mail::Header class. For example:
$mailer->open({To => ['[email protected]','[email protected]'],
From => '[email protected]'}) or die "can't open: $!";
Once the object is initialized, you will print the body of the message to it using it as a filehandle:
print $mailer "This is the first line of the mail message.\n";
When the body is done, you should call the object's close() method:
$mailer->close
close() tidies up and sends the message. You should not use the close() Perl built-in for this purpose,
because some of the Mail::Mailer methods need to do postprocessing on the message before sending it.
MIME-Tools
Net::SMTP and MailTools provide the basic functionality to create simple text-only e-mail messages.
The MIME-Tools package takes this a step further by allowing you to compose multipart messages
that contain text and nontext attachments. You can also parse MIME-encoded messages to extract
the attachments, add or remove attachments, and resend the modified messages.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
SMTP: Sending Mail 142
1. Every message body has a type. —In the MIME world, the body of every message has a type
that describes its nature; this type is given in the Content-Type: header field. MIME uses a
type/subtype nomenclature in which type indicates the category of document, and subtype
gives its specific format. Table 7.4 lists some common types and subtypes. The major media
categories are "audio," "video," "text," and "image." The "message" category is used for e-mail
enclosures, such as when you forward an e-mail onward to someone else, and the "applica-
tion" category is a hodgepodge of things that could not be classified otherwise. We'll talk about
"multipart" momentarily.
2. Every message body has an encoding. —Internet e-mail was originally designed to handle
messages consisting entirely of 7-bit ASCII text broken into relatively short lines; some parts
of the e-mail system are still limited to this type of message. However, as the Internet became
global, it became necessary to accommodate non-English character sets that have 8- or even
16-bit characters. Another problem was binary attachments such as image files, which are not
even text-oriented.
To accommodate the full range of messages that people want to send without rewriting the
SMTP protocol and all supporting software, MIME provides several standard encoding algo-
rithms that can encapsulate binary data in a text form that conventional mailers can handle.
Each header has a Content-Transfer-Encoding: field that describes the message body's en-
coding. Table 7.5 lists the five standard encodings.
If you are dealing with 8-bit data, only the quoted-printable and base64 encodings are guar-
anteed to make it through e-mail gateways.
3. Any message may have multiple parts. —The multipart/* MIME types designate messages
that have multiple parts. Each part has its own content type andMIME headers. It's even pos-
sible for a part to have its own subparts. The multipart/alternativec MIME type is used when
the various subparts correspond to the same document repeated in different formats. For
example, some browser-based mailers send their messages in both text-only and HTML form.
multipart/mixed is used when the parts are not directly related to each other, for example an
e-mail message and a JPEG enclosure.
Table 7.4. Common MIME Types
Type Description
audio/* A sound
audio/basic Sun microsystem's audio "au" format
audio/mpeg An MP3 file
audio/midi An MIDI file
audio/x-aiff AIFF sound format
audio/x-wav Microsoft's "wav" format
image/* An image
image/gif Compuserve GIF format
image/jpeg JPEG format
image/png Portable network graphics format
image/tiff TIFF format
message/* An e-mail message
message/news Usenet news message format
message/rfc822 Internet e-mail message format
multipart/* A message containing multiple parts
multipart/alternative The same information in alternative forms
multipart/mixed Unrelated pieces of information mixed together
text/* Human-readable text
text/html Hypertext Markup Language
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
SMTP: Sending Mail 143
Type Description
text/plain Plain text
text/richtext Enriched text in RFC 1523 format
text/tab-separated-values Tables
video/* Moving video or animation
video/mpeg MPEG movie format
video/quicktime Quicktime movie format
video/msvideo Microsoft "avi" movie format
application/* None of the above
application/msword Microsoft Word Format
application/news-message-id News posting format
application/octet-stream A raw binary stream
application/postscript PostScript
application/rtf Microsoft rich text format
application/wordperfect5.1 Word Perfect 5.1 format
application/gzip Gzip file compression format
application/zip PKZip file compression format
Any part of a multipart MIME message may contain a Content-Disposition: header, which is a hint
to the mail reader as to how to handle the part. Possible dispositions include attachment, which tells
the reader to treat the part's body as an enclosure to be saved to disk, and inline, which tells the
reader to try to display the part as a component of the document. For example, a mail reader ap-
plication may beable to display an inline image in the same window as the textual part of the mes-
sage. The Content-Disposition: field can also suggest a filename to store attachments under. An-
other field, Content-Description:, provides an optional human-readable description of the part.
Notice that an e-mail message with a JPEG attachment is really a multipart MIME message con-
taining two parts, one for the text of the message and the other for the JPEG image.
Without going into the format of a MIME message in detail, Figure 7. 5 shows a sample multipart
message to give you a feel for the way they work. This message has four parts: a 7-bit text message
that appears at the top of the message, a base64-encoded audio file that uses the Microsoft WAV
format, a base64-encoded JPEG file, and a final 7-bit part that contains some parting words and
the e-mail signature. (The binary enclosures have been truncated to save space.)
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
SMTP: Sending Mail 144
Notice that each part of the message has its own header and body, and that the parts are delimited
by a short unique boundary string beginning with a pair of hyphens. The message as a whole has
its own header, which is a superset of the RFC 822 Internet mail header, and includes a Content-
Type: field of multipart/mixed.
This is pretty much all you need to know about MIME. The MIME modules will do all the rest of the
work for you.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
SMTP: Sending Mail 145
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
SMTP: Sending Mail 146
Lines 1–3: Load modules—We turn on strict type checking and load the MIME::Entity module.
It brings in the other modules it needs, including MIME::Head and MIME::Body.
Lines 4–8: Create top-level MIME::Entity—Using the MIME::Entity->build(), method, we
create a "top-level" multipart MIME message that contains the two subparts. The arguments to
build() include the From: and To: fields, the Subject: line, and a MIME Type of multipart/
mixed. This returns a MIME::Entity object.
Lines 9–18: Attach the text of the message—We create the text of the message and store it in
a scalar variable. Then, using the top-level MIME entity's attach() method, we incorporate
the text data into the growing multipart message, specifying a MIME Type of text/plain, an
Encoding of 7bit, and the message text as the Data.
Lines 19–23: Attach the audio file—We again call attach(), but this time specify a Type of
audio/wav and an Encoding of base64. We don't want to read the whole audio file into memory,
so we use the Path argument to direct MIME::Entity to the file where the audio data can be found.
The Description argument adds a human-readable description of the attachment to the outgoing
message.
Lines 24–25: Sign the message—We call the MIME entity object's sign() utility to append our
signature file to the text of the message.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
SMTP: Sending Mail 147
Lines 26–27: Send the message—We call the send() method to format and mail the completed
message using the smtp method.
That's all there is to it. In the next sections we will look at the MIME modules more closely.
MIME::Entity
MIME::Entity is a subclass of Mail::Internet and, like it, represents an entire e-mail message. How-
ever, there are some important differences between Mail::Internet and MIME::Entity. Whereas
Mail::Internet contains just a single header and body, the body of a MIME::Entity can be composed
of multiple parts, each of which may be composed of subparts. Each part and subpart is itself a
MIME::Entity (Figure 7.7). Because of these differences, MIME:: Entity adds several methods for
manipulating the message's body in an object-oriented fashion.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
SMTP: Sending Mail 148
Figure 7.7. A MIME message can contain an unlimited number of nested attachments
This summary omits some obscure methods. See the MIME::Entity POD documentation for the full
details.
The main constructor for MIME::Entity is build(): build() negotiates a large number of con-
structors. These are the most common:
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
SMTP: Sending Mail 149
The build() method is the main constructor for MIME::Entity. It takes a series of named arguments and
returns an initialized MIME::Entity object. The following arguments are the most common.
Field name. Any of the RFC 822 or MIME-specific fields can be used as arguments, and the pro-
vided value will be incorporated into the message header. As in Mail::Header, you can use an array
reference to pass a multivalued field. You should probably confine yourself to using RFC 822 fields,
such as From: and To:, because any MIME fields that you provide will override those generated by
MIME::Entity.
Data. For single-part entities only, the data to use as the message body. This can be a scalar or
an array reference containing lines to be joined to form the body.
Path. For single-part entities only, the path to a file where the data for the body can be found. This
can be used to attach to the outgoing message a file that is larger than you could store in main
memory.
Boundary. The boundary string to place between parts of a multipart message. MIME::Entity will
choose a good default for you; ordinarily you won't want to use this argument.
Description. A human-readable description of the body used as the value of the Content-Descrip-
tion: field.
Disposition. This argument becomes the value of the header's Content-Disposition: field. It may be
either attachment or inline, defaulting to inline if the argument is not specified.
Encoding. The value of this argument becomes the Content-Encoding: field. Youshould provide
one of 7bit, 8bit, binary, quoted-printable, or base64. Include this argument even if you are sending
a simple text message because, if you don't, MIME::Entity defaults to binary. You may also provide
a special value of-SUGGEST to have MIME::Entity make a guess based on a byte-by-byte inspec-
tion of the entire body.
Filename. The recommended filename for the mail reader to use when saving this entity to disk. If
not provided, the recommended filename will be derived from the value of Path.
Type. —The MIME type of the entity, text/plain by default. MIME::Entity makes no attempt to guess
the MIME type from the file name indicated by the Path argument or from the contents of the Data
argument.
Here's the idiom for creating a single-part entity (which may later be attached to a multipart entity):
$part = MIME::Entity->build(To => '[email protected]',
Type => 'image/jpeg',
Encoding => 'base64',
Path => '/tmp/pictures/oranges.jpg');
And here's the idiom for creating a multipart entity, to which subparts will be added:
$multipart = MIME::Entity->build(To => '[email protected]',
Type => 'multipart/mixed');
Notice that single-part entities should have a body specified using either the Data or the Path ar-
guments. Multipart entities should not.
Once the MIME::Entity is created, you will attach new components to it using add-part() or
attach():
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
SMTP: Sending Mail 150
If you attempt to add a part to a single-part entity, MIME::Entity automagically converts the entity into type
multipart/mixed, and reattaches the original contents as a subpart. The entity you are adding then becomes
the second subpart on the list. This feature allows you to begin to compose a single-part message and later
add attachments without having to start anew.
$part = $entity->attach(arg1 => $val1, arg2 => $val2, ...)
The attach() method is a convenience function that first creates a new MIME::Entity object using
build(), and then calls $entity->add_part() to insert the newly created part into the message. The
arguments are identical to those of build(). If successful, the method returns the new MIME::Entity.
$head = $entity->head([$newhead])
Licensed by
The head() method returns the MIME::Head object associated with the entity. You can then call methods in
the head object to examine and change fields. The optional $newhead argument, if provided, can be used to
replace the header with a different MIME::Body object.
$body = $entity->bodyhandle([$newbody])
The bodyhandle() method gets or sets the MIME::Body object associated with the entity. You can then use
Stjepan Maric
this object to retrieve or modify the unencoded contents of the body. The optional $newbody argument can
be used to replace the body with a different MIME::Body object. Don't confuse this method with body(), which
returns an array ref containing the text representation of the encoded body.
If the entity is multipart, then there will be no body, in which case bodyhandle(), returns undef. Before trying
4218908
to fetch the body, you can use the is_multipart(), method to check for this possibility.
$pseudohandle = $entity->open($mode)
The open() method opens the body of the entity for reading or writing, and returns a MIME pseudohandle.
As described later in the section on the MIME::Body class, MIME pseudohandles have object methods similar
to those in the IO::Handle class (e.g., read(), getline(), and print()), but they are not handles in the
true sense of the word. The pseudohandle can be used to retrieve or change the contents of the entity's body.
$mode is one of "r" for reading, or "w" for writing.
@parts = $entity->parts($index)
$parts = $entity->parts($index)
@parst= $entity->parts(\@parts)
The parts() method returns the list of MIME::Entity parts in a multipart entity. If called with no arguments,
the method returns the entire list of parts; if called with an integer index, it returns the designated part. If passed
the reference to an array of parts, the method replaces the current parts with the contents of the array. This
allows you delete parts or rearrange their order.
For example, this code fragment reverses the order of the parts in the entity:
$entity->parts([reverse $entity->parts])
$type = $entity->mime_type
$type = $entity->effective_type
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
SMTP: Sending Mail 151
The mime_type() and effective_type() methods both return the MIME type of the entity's body. Al-
though the two methods usually return the same value, there are some error conditions in which MIME::Parser
cannot decode the entity and is therefore unable to return the body in its native form. In this case,
mime_type(), returns the type that the body is supposed to be, and effective_type() returns the type
that actually returns when you retrieve or save the body data (most probably application/octet-stream). To be
safe, use effective_type() when retrieving the body of an entity created by MIME::Parser. For entities
you created yourself with MIME::Entity->build(), there's no difference.
$boolean = $entity->is_multipart
The is_multipart() method is a convenience routine that returns true if the entity is multipart, false if it
contains a single part only.
$entity->sign(arg1 => $val1, arg2=> $val2, ...)
The sign() method attaches a signature to the message. If the message contains multiple parts, MIME::Entity
searches for the first text entity and attaches the signature to that.
The method adds some improvements to the version implemented in Mail::Internet, however you must provide
at least one set of named arguments. Possibilities include:
File. This argument allows you to use the signature text contained in a file. Its value should be the path to
a local file.
Signature. This argument uses the indicated text as the signature. Its value can be a scalar or a reference
to an array of lines.
Force. Sign the entity even if its content type isn't text/*. The value is treated as a Boolean.
Remove. Call remove_sig() to scan for an existing signature and remove it before adding the new
signature. The value of this argument is passed to remove_sig(). Provide 0 to disable signature removal
entirely.
For example, here's how to add a signature using a scalar value:
$entity->sign(Signature => "That's all folks!");
$entity->remove_sig([$nlines])
Remove_sig() scans the last $nlines of the message body as it looks for a line consisting of the characters
"--". The line and everything below it is removed. $nlines defaults to 10.
$entity->dump_skeleton([FILEHANDLE])
Dump_skeleton() is a debugging utility. It dumps a text representation of the structure of the entity and its
subparts to the indicated filehandle, or, if no filehandle is provided, to standard output.
Finally, several methods are involved in exporting the entity as text and mailing it:
$entity->print([FILEHANDLE])
$entity->print_header([FILEHANDLE])
$entity->print_body([FILEHANDLE])
These three methods, inherited from Mail::Internet, print the encoded text representations of the whole mes-
sage, the header, or the body, respectively. The parts of a multipart entity are also printed. If no filehandle is
provided, it prints to STDOUT.
$arrayref = $entity->header
The header() method, which is inherited from Mail::Internet, returns the text representation of the header as
a reference to an array of lines. Don't confuse this with the head() method, which returns a MIME::Head object.
$arrayref = $entity->body
This method, which is inherited from Mail::Internet, returns the body of the message as a reference to an array
of lines. The lines are encoded in a form suitable for passing to a mailer. Don't confuse this method with
bodyhandle() (discussed next), which returns a MIME::Body object.
$string = $entity->as_string $string
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
SMTP: Sending Mail 152
$string = $entity->stringify_body
$string $entity->stringify_header
The as_string() method converts the message into a string, encoding any parts that need to be. The
stringify_body() and stringify_header() methods respectively operate on the body and header
only.
$result = $entity->send([$method])
The send() method, which is inherited from Mail::Internet, sends off the message using the selected method.
I have noticed that some versions of the UNIX mail program have problems with MIME headers, and so it's
best to set $method explicitly to either "sendmail" or "smtp".
$entity->purge
If you have received the MIME::Entity object from MIME::Parser, it is likely that the body of the entity or one of
its subparts is stored in a temporary file on disk. After you are finished using the object, you should call
purge() to remove these temporary files, reclaiming the disk space. This does not happen automatically when
the object is destroyed.
MIME::Head
The MIME::Head class contains information about a MIME entity's header. It is returned by the
MIME::Entity head() method.
MIME::Head is a class of Mail::Header and inherits most of its methods from there. It is a historical
oddity that one module is called "Head" and the other "Header." MIME::Head adds a few utility
methods to Mail::Header, the most useful of which are read() and from_file():
$head = MIME::Head->read(FILEHANDLE)
In addition to creating a MIME::Head object manually by calling add() for each header field, you can create
a fully initialized header from an open filehandle by calling the read() method. This supplements Mail::Head-
er's read() method, which allows you to read a file only into a previously created object.
$head = MIME::Head->from_file($file)
The from_file() constructor creates a MIME::Head object from the indicated file by opening it and passing
the resulting filehandle to read().
All other functions behave as they do in Mail::Header. For example, here is one way to retrieve and
change the subject line in a MIME::Entity object:
$old_subject = $entity->head->get('Subject');
$new_subject = "Re: $old_subject";
$entity->head->replace(Subject => $new_subject);
Like Mail::Header, MIME::Head->get() also returns newlines at the ends of removed field values.
MIME::Body
The MIME::Body class contains information on the body part of a MIME::Entity. MIME::Body objects
are returned by the MIME::Entity bodyhandle() method, and are created as needed by the
MIME::Entity build() and attach() methods. You will need to interact with MIME::Body objects
when parsing incoming MIME-encoded messages.
Because MIME-encoded data can be quite large, an important feature of MIME::Body is its ability
to store the data on disk or in memory ("in core" as the MIME-Tools documentation calls it). The
methods available in MIME::Body allow you to control where the body data is stored, to read and
write it, and to create new MIME::Body objects.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
SMTP: Sending Mail 153
MIME::Body has three subclasses, each specialized for storing data in a different manner:
MIME::Body::File: This subclass stores its body data in a disk file. This is suitable for large binary
objects that wouldn't easily fit into main memory.
MIME::Body::Scalar: This subclass stores its body data in a scalar variable in main memory.
It's suitable for small pieces of data such as the text part of ane-mail message.
MIME::Body::InCore: This subclass stores its body data in an array reference kept in main
memory. It's suitable for larger amounts of text on which you will perform multiple reads or writes.
Normally MIME::Parser creates MIME::Body::File objects to store body data on disk while it is pars-
ing.
$body = MIME::Body::File->new($path)
To create a new MIME::Body object that stores its data to a file, call the MIME::, Body::File->new()
method with the path to the file. The file doesn't have to exist, but will be created when you open the body for
writing.
$body = MIME::Body::Scalar->new(\$string)
The MIME::Body::Scalar->new() method returns a body object that stores its data in a scalar reference.
$body = MIME::Body::InCore->new($string)
$body = MIME::Body::InCore->new(\$string)
$body = MIME::Body::InCore->new(\@string)
The MIME::Body::InCore class has the most flexible constructor. Internally it stores its data in an array refer-
ence, but it can be initialized from a scalar, a reference to a scalar, or a reference to an array.
Once you have a MIME::Body object, you can access its contents by opening it with the open()
method.
$pseudohandle = $body->open($mode)
This method takes a single argument that indicates whether to open the body for reading ("r") or writing ("w").
The returned object is a pseudohandle that implements the IO::Handle methods read(), print(), and
getline(). However, it is not a true filehandle, so be careful not to pass the returned pseudohandle to any
of the built-in procedures such as <> or read().
The following code fragment illustrates how to read the contents of a large MIME::Body stored in a
MIME::Entity object and print it to STDOUT. The contents recovered in this way are in their native
form, free of any MIME encoding:
$body = $entity->body handle or die "no body";
$handle = $body->open("r");
print $data while $handle->read($data,1024);
For line-oriented data, we would have used the getline() method instead.
Another code fragment illustrates how to write a MIME::Body's contents using its print() method.
If the body is attached to a file, the data is written there. Otherwise, it is written to an in-memory data
structure:
$body = $entity->body handle or die "no body";
$handle = $body->open("w");
$handle->print($_) while <>;
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
SMTP: Sending Mail 154
@lines = $body->as_lines
$string = $body->as_string
as_lines() and as_string() are convenience functions that return the entire contents of the body in a
single operation. as_lines() opens the body and calls get_line() repeatedly, returning an array of new-
line-terminated lines. as_string() reads the entire body into a scalar. Because either method can read a
large amount of data into memory, you should exercise some caution before calling them.
$path = $body->path([$newpath])
If the body object is attached to a file, as in MIME::Body::File, then path() returns the path to the file or sets
it if the optional $newpath argument is provided. If the data is kept in memory, then path() returns undef.
$body->print([FILEHANDLE])
The print() method prints the unencoded body to the indicated filehandle, or, if none is provided, to the
currently selected filehandle. Do not confuse this with the print() method provided by the pseudohandles
returned by the open() method, which is used to write data into the body object.
$body->purge
Purge unlinks the file associated with the body object, if any. It is not called automatically when the object is
destroyed.
MIME::Parser
The last major component of MIME-Tools is the MIME::Parser class, which parses the text repre-
sentation of a MIME message into its various components. The class is simple enough to use, but
has a large number of options that control various aspects of its operation. The short example in
Figure 7.8 will give you the general idea.
Lines 1–3: Load modules—We turn on strict type checking and load the MIME::Parser module.
It brings in the other modules it needs, including MIME::Entity.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
SMTP: Sending Mail 155
Lines 4–5: Open a message—We recover the name of a file from the command line, which
contains a MIME-encoded message, and open it. This filehandle will be passed to the parser
later.
Lines 6–8: Create and configure the parser—We create a new parser object by calling
MIME::Parser->new(). We then call the newly created object's output_dir(), method to
set the directory where the parser will write the body data of extracted enclosures.
Lines 9–10: Parse the file—We pass the open filehandle to the parser's parse(), method. The
value returned from the method is a MIME::Entity object corresponding to the top level of the
message.
Lines 11–14: Print information about the top-level entity—To demonstrate that we parsed the
message, we recover and print the From: and Subject: lines of the header, calling the entity's
head() method to get the MIME::Head object each time. We also print the MIME type of the
whole message, and the number of subparts, which we derive from the entity's parts()
method.
Lines 15–17: Print information about the parts—We loop through each part of the message. For
each, we call its mime_type() method to retrieve the MIME type, and the path() method of
the corresponding MIME::Body to get the name of the file that contains the data.
Line 18: Clean up—When we are finished, we call purge() to remove all the parsed body data
files.
When I ran the program on a MIME message stored in the file mime.test, this is was the result:
% simple_parse.pl ~/mime.test
From = Lincoln Stein <[email protected]>
Subject = testing mime parser
MIME type = multipart/mixed
Parts = 5
text/plain /tmp/msg-1857-1.dat
audio/wav /tmp/assimilated.wav
image/jpeg /tmp/aw-2-19.jpg
audio/mpeg /tmp/NorthwestPassage.mp3
text/plain /tmp/msg-1857-2.dat
This multipart message contains five parts. The first and last parts contain text data and correspond
to the salutation and the signature. The remaining parts are enclosures, consisting of an audio/wav
sound file, a JPEG image, and a ripped MP3 track.
We will walk through a more complex example of MIME::Parser in Chapter 8, where we deal with
writing Post Office Protocol clients. The example developed there will spawn external viewers to
view image and audio attachments.
Because MIME files can be quite large, MIME::Parser's default is to store the parsed MIME::Body
parts as files using the MIME::Body::File class. You can control where these files are stored using
either the output_dir() or the output_under() methods. The output_dir() method tells
MIME::Parser to store the parts directly inside a designated directory. output_under(), on the
other hand, creates a two-tier directory. For each parsed e-mail message, MIME::Parser creates a
subdirectory under the base directory specified by output_under(), and then writes the
MIME::Body::File data there.
In either case, all the temporary files are cleared when you call the top-level MIME::Entity's
purge() method. You can instead keep some or all of the parts. To keep some parts, step through
the message parts and call purge() selectively on those that you don't want to keep. You can
either leave the other parts where they are or move them to a different location for safekeeping. To
keep all parsed parts, don't call purge() at all.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
SMTP: Sending Mail 156
Parsing is complex, and the parse() method may die if it encounters any of a number of exceptions.
You can catch such exceptions and attempt to perform some error recovery by wrapping the call to
parse() in an eval{} block:
$entity = eval { $parser->parse(\*F) };
warn $@ if $@;
Here is a brief list of the major functions in MIME::Parser, starting with the constructor.
$parser = MIME::Parser->new
The new() method creates a new parser object with default settings. It takes no arguments.
$dir = $parser->output_dir
$previous = $parser->output_dir($newdir)
The output_dir() method gets or sets the output directory for the parse. This is the directory in which the
various parts and enclosures of the parsed message are (temporarily) stored.
If called with no arguments, it returns the current value of the output directory. If called with a directory path, it
sets the output directory and returns its previous value. The default setting is ".", the current directory.
$dir = $parser->output_under
$parser->output_under($basedir [,DirName=>$dir [,Purge=>$purge]])
output_under() changes the temporary file strategy to use a two-tier directory. MIME::Parser creates a
subdirectory inside the specified base directory and then places the parsed MIME::Body::File data in the newly
created subdirectory.
$entity = $parser->parse(\*FILEHANDLE)
The parse() method parses a MIME message by reading its text from an open filehandle. If successful, it
returns a MIME::Entity object. Otherwise, parse() can throw any number of run-time exceptions. To catch
those exceptions, wrap parse() in an eval{} block as described earlier.
$entity = $parser->parse_data($data)
The parse_data() method parses a MIME message that is contained in memory. $data can be a scalar
holding the text of the message, a reference to a scalar, or a reference to an array of scalars. The latter is
intended to be used on an array of the message's lines, but can be any array which, when concatenated, yields
the text of the message. If successful, parse_data() returns a MIME::Entity object. Otherwise, it generates
a number of run-time exceptions.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
SMTP: Sending Mail 157
$entity = $parser->parse_open($file)
The parse_open() method is a convenience function. It opens the file provided, and then passes the resulting
filehandle to parse(). It is equivalent to:
open (F,$file);
$entity = $parser->parse(\*F);
Because parse_open() uses Perl's open() function, you can play the usual tricks with pipes. For
example:
$entity = $parser->parse_open("zcat ./mailbox.gz |");
This uncompresses the compressed mailbox using the zcat program and pipes the result to
parse().
Several other methods control the way the parse operates:
$flag = $parser->output_to_core
$parser->output_to_core($flag)
The output_to_core() method controls whether MIME::Parser creates files to hold the decoded body data
of MIME::Entity parts, or attempts to keep the data in memory. If $flag is false (the default), then the parts
are parsed into disk files. If $flag is true, then MIME::Parser stores the body parts in main memory as
MIME::Body::InCore objects.
Since enclosures can be quite large, you should be cautious about doing this. With no arguments, this method
returns the current setting of the flag.
$flag = $parser->ignore_errors
$parser->ignore_errors($flag)
The ignore_errors() method controls whether MIME::Parser tolerates certain syntax errors in the MIME
message during parsing. If true (the default), then errors generate warnings, but if not, they cause a fatal
exception during parse().
$error = $parser->last_error
$head = $parser->last_head
These two methods are useful for dealing with unparseable MIME messages. last_error() returns the last
error message generated during the most recent parse. It is set when an error was encountered, and either
ignore_errors() is true, or the call to parse() was wrapped in an eval{}.
last_head() returns the top-level MIME::Head object from the last stream we attempted to parse. Even
though the body of the message wasn't successfully parsed, we can use the header returned by this method
to salvage some information, such as the subject line and the name of the sender.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
SMTP: Sending Mail 158
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
SMTP: Sending Mail 159
Lines 1–4: Load modules—We turn on strict syntax checking and load the Net::FTP and
MIME::Entity modules.
Lines 5–9: Define constants—We set constants corresponding to the FTP site to connect to, the
CPAN directory, and the name of the RECENT file itself. We also declare a constant with the e-
mail address of the recipient of the message (in this case, my local username), and a DEBUG
constant to turn on verbose progress messages.
Lines 10–11: Declare globals—The %RETRIEVE global contains the list of files to retrieve from
CPAN. $TMPDIR contains the path of a directory in which to store the downloaded files tempo-
rarily before mailing them. This is derived from the TMPDIR environment variable, or, if not oth-
erwise specified, from /usr/tmp. Windows and Macintosh users have to check and modify this
for their systems.
Lines 12–15: Log into CPAN and fetch the RECENT file—We create a new Net::FTP object and
log into the CPAN mirror. If successful, we change to the directory that contains the archive and
call the FTP object's retr() method to return a filehandle from which we can read the RE-
CENT file.
Lines 17–23: Parse the RECENT file— RECENT contains a list of all files on the CPAN archive
that are new or have changed recently, but we don't want to download them all. The files we're
interested in have lines that look like this:
modules/by-module/Apache/Apache-Filter-1.011.tar.gz
modules/by-module/Apache/Apache-iNcom-0.09.tar.gz
modules/by-module/Audio/Audio-Play-MPG123-0.04.tar.gz
modules/by-module/Bundle/Bundle-WWW-Search-ALL-1.09.tar.gz
We open the file for reading and scan through it one line at a time, looking for lines that match
the appropriate pattern. We store the filename and its CPAN path in %RETRIEVE.
After processing the filehandle, we close it.
Lines 24–32: Begin the mail message—We begin the outgoing mail message with ashort text
message that gives the number of enclosures. We create a new MIME::Entity object by calling
the build() constructor with the introduction as its initial contents.
Notice that the arguments we pass to build() create a single-part document of type text/
plain. Later, when we add the enclosures, we rely on MIME::Entity's ability to convert the mes-
sage into a multipart message when needed.
Lines 33–44: Retrieve modules and attach them to the mail—We loop through the filenames
stored in %RETRIEVE. For each one, we call the FTP object's get(), method to download the
file to the temporary directory. If successful, we use the Filename argument to attach the file to
the outgoing mail message by calling the top-level entity's attach() method. Other
attach() arguments set the encoding to base64, and the MIME type to application/x-gzip.
CPAN files are gzipped by convention. We also add a short description to the attachment; cur-
rently it is just a copy of the filename.
Line 45: Add signature to the outgoing mail —If there is a file named .signature in the current
user's home directory, we call the MIME entity's sign() method to attach it to the end of the
message.
Lines 46–49: Send the mail—We call the entity's send() method to MIME-encode the message
and send it via the SMTP protocol. When this is done, we call the entity's purge() method,
deleting the downloaded files in the temporary directory. This works because the files became
the basis for the MIME-entity bodies via the MIME::Body::File subclass when they were attached
to the outgoing message, and purge() recursively deletes these files.
Note that the send() method relies on libnet being correctly configured to find a working SMTP
server. If this is not the case, check and fix the Libnet.cfg file.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
SMTP: Sending Mail 160
Line 51: Close FTP connection —Our last step is to close the FTP connection by calling the FTP
object's quit() method.
Figure 7.10 shows a screenshot of Netscape Navigator displaying the resulting MIME message.
Clicking on one of the enclosures will prompt you to save it to disk so that you can unpack and build
the module.
Licensed by
Stjepan Maric
4218908
A deficiency in the program is that the CPAN filenames can be cryptic, and it isn't always obvious
what a package does. A nice enhancement to this script would be to unpack the package, scan
through its contents looking for the POD documentation, and extract the description line following
the NAME heading. This information could then be used as the MIME::Entity Description: field rather
than the filename itself. A simpler alternative would be to enclose the .readme file that frequently
(but not always) accompanies a package's .tar.gz file.
Summary
The Net::SMTP, Mail::Internet, and Mail::Mailer modules make it possible, and convenient, to send
properly formatted Internet mail. The MIME-Tools package builds on these classes to construct and
process complex messages that contain MIME attachments.
The next chapter shows the other side of the equation: how to receive and process incoming mes-
sages. In addition, it contains practical examples of processing message attachments using
MIME::Parser.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
161
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
POP, IMAP, and NNTP 162
Lines 1–6: Load modules—We bring in the Net::POP3 module to contact the remote POP server,
and Mail::Header to parse the retrieved mail headers. We also bring in a new home-brewed
utility module, PromptUtil, which provides the get_passwd() function, along with a few other
user prompting functions.
Lines 6–8: Get username, host, and password—We get the username and host from the com-
mand line, and prompt the user to enter his or her password using the get_passwd() function.
The latter turns off terminal echo so that the password is not visible on the screen.
Line 9: Connect to mailbox host—We call the Net::POP3 new() method to connect to the indi-
cated host, giving the server 30 seconds in which to respond with the welcome banner. The
new() constructor returns a Net::POP3 object.
Lines 10–13: Log in and count messages—We call the POP3 object's login() method to log
in with the user's name and password. If the login is successful, it returns the total number of
messages in the user's mailbox; if there are no messages in the mailbox, it returns 0E0 ("zero
but true"). This value has a property of 1 if treated in a logical text to test whether login was
successful, and is equal to 0 when used to count the number of available messages.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
POP, IMAP, and NNTP 163
Next we call the POP3 object's last() method to return the number of the last message the
user read (0 if none read). We will use this to list the unread messages. Because the message
count retrieved by new() can be 0E0, we add zero to it to convert it into a more familiar number.
We then print the total number of old and new messages.
Lines 14–21: Summarize messages—Each message is numbered from 1 to the total of mes-
sages in the mailbox. For each one, we call the POP object's top() method to retrieve the
message header as a reference to an array of lines, and pass this to Mail::Header-
>new() for parsing. We call the parsed header's get() method twice to retrieve the Subject:
and From: lines, and pass the sender's address to the clean_from() utility subroutine to clean
it up a bit. We then print out the message number, sender's name, and subject.
Line 22: Log out—The POP object's quit() method logs out cleanly.
Lines 23–29: Clean up with the clean_from() subroutine—This subroutine cleans up sender
addresses a bit, by extracting the sender's name from these three common address formats:
"Lincoln Stein" <[email protected]>
Lincoln Stein <[email protected]>
[email protected] (Lincoln Stein)
Net::POP3 API
The Net::POP3 API is simple. You can log in, log out, list messages, retrieve message headers,
retrieve the entire message, and delete messages.
If successful, login() returns the total number of messages in the user's mailbox. If there are no
messages, login() returns the following point number 0E0, which will be treated as true when
used in a logical context to test whether login was successful, but evaluate to zero when treated in
a numeric context to count the number of available messages. If an error occurs, login() returns
undef and $pop->message() contains an error message.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
POP, IMAP, and NNTP 164
If the login fails, you may try again or try to login using apop(). Some servers close the connection
after a number of unsuccessful login attempts. With the exception of quit(), none of the other
methods will be accepted until the server accepts the login.
Some POP servers support the APOP command.
$messages = $pop->apop($username,$password)
APOP is similar to a standard login, but instead of sending passwords across the network in the clear, it uses
a challenge/ response system to authenticate the user without processing cleartext passwords. Unlike
login(), .netrc is not consulted if the username and password are absent. The value returned from
apop() is the same as that from login().
Many POP3 servers need special configuration before the APOP command will authenticate cor-
rectly. In particular, most UNIX servers need a password file distinct from the system password file.
Once login is successful, you can use a variety of methods to access the mailbox:
$last_msgnum = $pop->last
POP messages are numbered from 1 through the total number of messages in the inbox. At any time, the user
may have read one or more messages using the RETR command (see below), but not deleted them from the
inbox. Last() returns the highest number from the set of retrieved messages, or 0 if no messages have been
retrieved. New messages begin at $last_msgnum+1.
Many POP servers store the last-read information between connections; however, a few discard this informa-
tion.
$arrayref = $pop->get($msgnum [,FILEHANDLE])
Following a successful login, the get() method retrieves the message indicated by its message number, using
the POP3 RETR command. It can be called with a filehandle, in which case the contents of the message (both
header and body) are written to the filehandle. Otherwise, the get() method returns an array reference con-
taining the lines of the message.
$handle = $pop->getfh($msgnum)
This is similar to get(), but the return value is a tied filehandle. Reading from this handle returns the contents
of the message. When the handle returns end-of-file, it should be closed and discarded.
$flag = $pop->delete($msgnum)
delete() marks the indicated message for deletion. Marked messages are not removed until the quit()
method is called, and can be unmarked by calling reset().
$arrayref = $pop->top($msgnum[,$lines])
The top() method returns the header of the indicated message as a reference to an array of lines. This format
is suitable for passing to the Mail::Header->new() method. If the optional $lines argument is provided,
then the indicated number of lines of the message body are included.
$hashref = $pop->list
$size = $pop->list($msgnum)
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
POP, IMAP, and NNTP 165
The list() method returns information on the size of mailbox messages. Called without arguments, it returns
a hash reference in which the keys are message IDs, and the values are the sizes of the messages, in bytes.
Called with a message ID, the method returns the size of the indicated message, or if an invalid message
number was provided, it returns undef.
($msg_count,$size) = $pop->popstat
pop_stat() returns a two-element list that consists of the number of undeleted messages in the mailbox and
the size of the mailbox in bytes.
$uidl = $pop->uidl([$msgnum])
The uidl() method returns a unique identifier for the given message number. Called without an argument,
it returns a hash reference in which the keys are the message numbers for the entire mailbox, and the values
are their unique identifiers. This method is intended to help clients track messages across sessions, since the
message numbers change as the mailbox grows and shrinks.
When you call the quit() method, messages marked for deletion are removed unless you
reset() first.
$pop->reset
This method resets the mailbox, unmarking the messages marked for deletion.
$pop->quit
The quit() method quits the remote server and disconnects. Any messages marked for deletion are removed
from the mailbox.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
POP, IMAP, and NNTP 166
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
POP, IMAP, and NNTP 167
I choose to read the message, causing the program to display the message header and the text
part of the body. It then reports that the message has two attachments (technically, two non-text/
plain MIME parts). For each one, the program prompts me for the disposition of the attachment. For
the first attachment, of type image/jpeg, I choose to view the attachment, causing my favorite image
viewer (the XV application, written by John Bradley) to pop up in a new window and show the picture.
After I quit the viewer, the script prompts me again for the disposition. This time I choose to save
the image under its default name.
The next attachment is a Microsoft Word document. No viewer is defined for this document type,
so the prompt only allows the attachment to be saved to disk.
After dealing with the last attachment, the program prompts me to keep or delete the entire message
from the inbox, or to quit. I quit. The program then moves on to the next unprocessed message.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
POP, IMAP, and NNTP 168
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
POP, IMAP, and NNTP 169
Lines 1–6: Activate taint checking and load modules—Since we will be launching external ap-
plications (the viewers) based on information from untrusted sources, we need to be careful to
check for tainted variables. The -T switch turns on taint checking. (See Chapter 10 for more
information.)
We load PopParser and PromptUtil, two modules developed for this application.
Lines 7–11: Define viewers—We define constants for certain external viewers. For example,
HTML files are invoked with the command lynx %s, where %s is replaced by the name of the
HTML file to view. For variety, some of the viewers are implemented as pipes. For example, the
player for MP3 audio files is invoked as mpg123-, where the - symbol tells the player to take its
input from standard input.
At the end of the code walkthrough, we'll discuss replacing this section of code with the standard
mailcap facility.
Lines 12–13: Taint check precautions —As explained in more depth in Chapter 10, taint checking
will not let us run with an untrusted path or with several other environment variables set. We set
PATH to a known, trusted state, and delete four other environment variables that affect the way
that commands are processed.
Lines 14–20: Recover username and mailbox host—We process the command-line arguments
to recover the name of the user and the POP3 host.
The $entity global holds the most recent parsed MIME::Entity object. We make it global so
that the script's END{} block can detect it and call its purge(), method in case the user quits
the program prematurely. This will delete all temporary files from disk. For similar reasons, we
intercept the INT signal to exit gracefully if the user hits the interrupt key.
Lines 21–26: Log in to mailbox server—The PopParser.pm module defines a new subclass of
Net::POP3 that inherits all the behavior of the base class, but returns parsed MIME::Entity ob-
jects from the get() method rather than the raw text of the message. We create a new Pop-
Parser object connected to the mailbox host. If this is successful, we call get_passwd() (im-
ported from the PromptUtil module) to get the user's login password.
Next, we authenticate ourselves to the remote host. We don't know a priori whether the server
accepts APOP authentication or the less secure cleartext authentication method, so we try them
both. If the apop() method fails, then we try login(). If that also fails, we die with an error
message.
If login is successful, we print the number of messages returned by the apop(), or login()
methods. We add 0 to the message count to convert the 0E0 result code into a more user-friendly
integer.
Lines 27–38: Enter the main message-processing loop —We now enter the main message-
processing loop. For each message, we fetch its header by calling the PopParser object's
top() method (which is inherited without modification from Net::POP3). The header text is then
passed to our print_header() method to display it as a one-line message summary.
We ask the user if he or she wants to read the message, and if so, we call the PopParser object's
get() method, which fetches the indicated message, parses it, and returns a MIME::Entity
object. This object is passed to our display_entity(), subroutine in order to display it and
its subparts. When display_entity() is finished, we delete the entity's temporary files by
calling its purge() method.
The last step is to ask the user if he or she wants to delete the message from the remote mailbox,
and if the answer is affirmative, we call the PopParser's delete() method.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
POP, IMAP, and NNTP 170
Licensed by
parts. Otherwise, we invoke a subroutine called display_part() to display the contents of
the entity.
Lines 61–78: The handle_multipart() subroutine—The handle_multipart(), subrou-
tine loops through and processes each part of a multipart MIME::Entity object. We begin by
Stjepan Maric
calling the entity's parts() method to fetch each of the subparts as a MIME::Entity object. We
then call Perl's grep() built-in twice to sort the parts into those that we can display directly and
those that are to be treated as attachments that must be displayed using an external application.
Since we know how to display only plain text, we sort on the MIME type text/plain.
For each of the text/plain parts, we call the display_part() subroutine to print the message
4218908
body to the screen. If there are nontext attachments, we prompt the user for permission to display
them, and if so, invoke display_entity(), recursively on each attachment. This recursive
invocation of display_entity(), allows for attachments that are themselves multipart mes-
sages, such as forwarded e-mails.
Lines 79–99: The display_part() subroutine—The display_part() subroutine is invoked
to display a single-part MIME::Entity. Depending on the user's wishes, its job is to display, save,
or ignore the part.
We begin by retrieving the part's header, MIME type, description, and suggested filename for
saving (derived from the Content-Disposition: header, if present). We also recover the part's
MIME::Body object by calling its bodyhandle() method. This object gives us access to the
body's unencoded content.
If the part's MIME type is text/plain, we do not need an external viewer to display it. We simply
call the body object's print() method to print the contents to standard output. Otherwise, we
call get_viewer() to return the name of an external viewer that can display this MIME type.
We print a summary that contains the part's MIME type, description, and suggested filename,
and then prompt the user to view or save the part. Depending on the user's response, we invoke
save_body() to save the part's content to disk, or display_body() to launch the external
viewer to display it. This continues in a loop until the user chooses "n" to go to the next part.
If no viewer is defined for the part's MIME type, the user's only option is to save the content to
disk.
Lines 100–114: The save_body() subroutine—The save_body() subroutine accepts a
MIME::Body object and a default filename. It gives the user the opportunity to change the file-
name, opens the file, and writes the contents of the part to disk.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
POP, IMAP, and NNTP 171
The most interesting feature of this subroutine is the way that we treat the default filename for
the attachment. This filename is derived from the Content-Disposition: header, and as such is
untrusted data. Someone who wanted to spoil our day could choose a malicious pathname, such
as one that would overwrite a treasured configuration file. For this reason we forbid absolute
pathnames and those that contain the ".." relative path component. We also forbid filenames
that contain unusual characters such as shell metacharacters. Having satisfied these tests, we
extract the filename using a pattern match, thereby untainting it. Perl will now allow us to open
the file for writing. We do so and write the attachment's contents to it by calling the MIME::Body
object's print() method.
Lines 116–128: The display_body () subroutine—The display_body() subroutine is
called to launch an external viewer to display an attachment. It is passed a MIME::Body object,
and a command to launch an external viewer to display it.
To make this application a bit more interesting, we allow for two types of viewers: those that
read the body data from a file on disk and those that read from standard input. The former are
distinguished from the latter by containing the symbol %s, which will be replaced by the filename
before execution (this is a standard convention in the UNIX mailcap file).
We begin by calling the MIME::Body object's path() method to obtain the path to the temporary
file in which the object's data is stored. We then use this in a pattern substitution to replace any
occurrence of %s in the viewer command. If the substitution is successful, it returns a true value,
and we call system() to invoke the command.
Otherwise, we assume that the viewer will read the data from standard input. In this case, we
use open() to open a pipe to the viewer command, and invoke the body object's print()
method to print to the pipe filehandle. Before doing this, however, we set the PIPE handler to
IGNORE to avoid the program terminating unexpectedly because of a recalcitrant viewer.
This subroutine works correctly both for line-oriented applications, such as the Lynx HTML
viewer, and for windowing applications, such as XV.
Lines 129–137: The get_viewer() subroutine— get_viewer() is an extremely simple sub-
routine that uses a pattern match to examine the MIME type of the attachment and selects a
hard-coded viewer for it.
Lines 138–140: END{} block—This script's END{} block takes care of calling any leftover
MIME::Entity's purge() method. This deletes temporary files that might be left around if the
user interrupted the script's execution unexpectedly.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
POP, IMAP, and NNTP 172
Lines 1–6: Load modules—We turn on strict checking and load the Net::POP3 and MIME::Parser
modules. We use the global @ISA array to tell Perl that PopParser is a subclass of Net::POP3.
Lines 7–15: Override the new() method—We override the Net::POP3 new() method in order
to create and initialize a MIME::Parser for later use. We first invoke our parent's new() method
to create the basic object and connect to the remote host, create and configure a MIME::Parser
object, and store the parser for later use by invoking our parser() accessor method.
Lines 16–21: The parser() method—This method is an accessor for the MIME::Parser object
created during the call to new(). If we are called with a parser object on our subroutine stack,
we store it among our instance variables. Otherwise, we return the current parser object to the
caller.
The way we stash the parser object among our instance variables looks weird, but it is the
conventional way to store instance variables in filehandle objects:
${*$self}{'pp_parser'} = shift
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
POP, IMAP, and NNTP 173
What this is doing is referencing a hash in the symbol table that happens to have the same name
as our filehandle. We then index into that as if it were a conventionally created hash. We need
to store our instance variables this way because Net::POP3 ultimately descends from IO::Han-
dle, which creates and manipulates blessed filehandles, rather than more conventional blessed
hash references.
Lines 22–30: Override the get() method—The last part of this module overrides the Net::POP3
get() method. We are called with the number of the message to retrieve, which we pass to
getfh() to obtain a tied filehandle from which to read the desired message. The returned
filehandle is immediately passed to our stored MIME::Parser object to parse the message and
return a MIME::Entity object.
The nice thing about the design of the PopParser module is that message retrieval and message
parsing occur in tandem, rather than downloading the entire message and parsing it in two steps.
This saves considerable time for long messages.
There are a number of useful enhancements one could make to pop_fetch.pl. The one with the
greatest impact would be to expand the range and flexibility of the viewers for nontext attachments.
The best way to do this would be to provide support for the system /etc/mailcap and per-
user .mailcap files, which on UNIX systems map MIME types to external viewers. This would allow
the user to install and customize viewers without editing the code. Support for the mailcap system
can be found in the Mail::Cap module, which is part of Graham Barr's MailTools package. To use
Mail::Cap in the pop_fetch.pl script, replace lines 7 through 11 of Figure 8.3 with these lines:
use Mail::Cap;
my $mc = Mail::Cap-new;
This brings in the Mail::Cap module and creates a new Mail::Cap object that we can use to fetch
information from the mailcap configuration files.
Replace line 90, which invokes the get_viewer() subroutine, with the equivalent call from
Mail::Cap:
my $viewer = $mc->viewCmd($type);
This takes a MIME type and returns the command to invoke to view it if one is defined.
The last modification is to replace line 97, which invokes the display_ body() subroutine to
invoke the viewer on the body of an attachment, with the Mail::Cap equivalent:
$mc->view($type,$body->path);
This call looks up the appropriate view command for the specified MIME type, does any needed
string substitutions, and invokes the command using system().
We no longer need the get_viewer() and display_body() subroutines, because Mail::Cap
takes care of their functionality. You can delete them.
Other potential enhancements to this script include:
• the ability to reply to messages
• the ability to list old and new messages and jump directly to messages of interest
• a full windowing display using the text-mode Curses module or the graphical PerlTK package,
both available from CPAN
With a little work, you could turn this script into a full-featured e-mail client!
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
POP, IMAP, and NNTP 174
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
POP, IMAP, and NNTP 175
Lines 1–5: Load modules—We load Net::IMAP::Simple, Mail::Header, and the Prompt Util mod-
ule used in earlier examples.
Lines 6–9: Process command-line arguments—We parse out the username and mailbox host
from the first command-line argument, and recover the mailbox name from the second. If no
mailbox name is provided, we default to INBOX, which is the default mailbox name on many
UNIX systems. We then prompt for the user's password.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
POP, IMAP, and NNTP 176
@mailboxes = $imap->mailboxes
The mailboxes() method returns a list of all the user's mailboxes.
$messages = $imap->select($mailbox)
The select() method selects a mailbox by name, making it current. If the mailbox exists, select() returns
the number of messages it contains (0 for a mailbox that happens to be empty). If the mailbox does not exist,
the method returns undef and the current mailbox is not changed.
$success = $imap->create_mailbox($mailbox)
$success = $imap->delete_mailbox($mailbox)
$success = $imap->rename_mailbox($old_name,$new_name)
The create_mailbox(), delete_mailbox(), and rename_mailbox() methods attempt to create, de-
lete, and rename the named mailbox, respectively. They return true if successful, and false otherwise.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
POP, IMAP, and NNTP 177
Once you have selected a mailbox, you can examine and retrieve its contents.
$last_msgnum = $imap->last
The last() method returns the highest number of the read messages in the current mailbox, just as
Net::POP3 does. You can also get this information by calling the seen() method, as described below.
$arrayref = $imap->get($msgnum)
The get() method retrieves the message indicated by the provided message number from the current mail-
box. The return value is a reference to an array containing the message lines.
$handle = $imap->getfh($msgnum)
This is similar to get() but the return value is a filehandle that can be read from in order to retrieve the indicated
message. This method differs from the similarly named Net::POP3 method by returning a filehandle opened
on a temporary file, rather than a tied filehandle. This means that the entire message is transferred from the
remote server to the local machine behind the scenes before you can begin to ork with it.
$flag = $imap->delete($msgnum)
The delete() method marks the indicated message for deletion from the current mailbox. Marked messages
are not removed until the quit() method is called. However, there is no reset() call to undo a deletion.
$arrayref = $imap->top($msgnum)
The top() method returns the header of the indicated message as a reference to an array of lines. This format
is suitable for passing to the Mail::Header->new() method. There is no option for fetching a certain number
of lines from the body text.
$hashref = $imap->list
$size = $imap->list($msgnum)
The list() method returns information on the size of mailbox messages. Called without arguments, it returns
a hash reference in which the keys are message IDs, and the values are the sizes of the messages, in bytes.
Called with a message ID, the method returns the size of the indicated message, or if an invalid message
number was provided, it returns undef.
$flag = $imap->seen($msgnum)
The seen() method returns true if the indicated message has been read (by calling the get() method), or
false if it has not.
$success = $imap->copy($msgnum,$mailbox_destination)
The copy() method attempts to copy the indicated message from the current mailbox to the indicated desti-
nation mailbox. If successful, the method returns a true value and the indicated message is appended to the
end of its destination. You may wish to call delete() to remove the message from its original mailbox.
When you are finished, the quit() method will clean up:
$imap->quit()
quit() takes no arguments. It deletes all marked messages and logs off.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
POP, IMAP, and NNTP 178
Because of its sheer size (more than 34,000 newsgroups and daily news flow rates measured in
the gigabytes), Usenet has been diminishing in favor among Internet users. However, there has
been a resurgence of interest recently in using Netnews for private discussion servers, helpdesk
applications, and other roles in corporate intranets.
Netnews is organized in a two-level hierarchy. At the upper level are the newsgroups. These have
long meaningful names like comp.graphics.rendering.raytracing. Each newsgroup, in turn, contains
zero or more articles. Users post articles to their local Netnews server, and the Netnews distribution
software takes care of distributing the article to other servers. Within a day or so, a copy of the article
appears on every Netnews server in the world. Articles live on Netnews for some period before they
are expired. Depending on each server's storage capacity, a message may be held for a few days
or a few weeks before expiring it. A few large Netnews servers, such as the one at
www.deja.com, hold news articles indefinitely.
Newsgroups are organized using a hierarchical namespace. For example, all newsgroups beginning
with comp. are supposed to have something to do with computers or computer science, and all
those beginning with soc.religion. are supposed to concern religion in society. The creation and
destruction of newsgroups, by and large, is controlled by a number of senior administrators. The
exception is the alt hierarchy, in which newsgroups can be created willy-nilly by anyone who desires
to do so. Some very interesting material resides in these groups.
Regardless of its position in the namespace hierarchy, a newsgroup can be moderated or unmod-
erated. Moderated groups are "closed." Only a small number of people (typically a single moderator)
have the right to post to the newsgroup. When others attempt to post to the newsgroup, their posting
is automatically forwarded to the moderator via e-mail. The moderator then posts the message at
his or her discretion. Anyone can post to unmoderated groups. The posted article is visible imme-
diately on the local server, and diffuses quickly throughout the system.
Articles are structured like e-mails, and in fact share the same RFC 822 specification. Figure 8.6
shows a news article recently posted to comp.lang.perl.modules. The article consists of a message
header and body. The header contains several fields that you will recognize from the standard e-
mail, such as the Subject: and From: lines, and some fields that are specific to news articles, such
as Article:, Path:, Message-ID:, Distribution:, and References:. Many of these fields are added au-
tomatically by the Netnews server.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
POP, IMAP, and NNTP 179
To construct a valid Netnews article, you need only take a standard e-mail message and add a
Newsgroups: header containing a comma-delimited list of newsgroups to post to. Another frequently
used article header is Distribution:, which limits the distribution of an article. Valid values for Distri-
bution: depend on the setup of your local Netnews server, but they are typically organized geo-
graphically. For example, the usa distribution limits message propagation to the political boundaries
of the United States, and nj limits distribution to New Jersey. The most common distribution is
world, which allows the article to propagate globally.
Other article header fields have special meaning to the Netnews system, and can be used to create
control messages that cancel articles, add or delete newsgroups, and perform other special func-
tions. See [Spencer and Lawrence 1998] for information on constructing your own control mes-
sages.
Netnews interoperates well with MIME. An article can have any number of MIME-specific headers,
parts, and subparts, and MIME-savvy news readers are able to decode and display the parts.
Articles can be identified in either of two ways. Within a newsgroup, an article can be identified by
its message number within the group. For example, the article shown in Figure 8.6 is message
number 36,166 of the newsgroup comp.lang.perl.modules. Because articles are constantly expiring
and being replaced by new ones, the number of the first message in a group is usually not 1, but
more often a high number. The message number for an article is stable on any given news server.
On two subsequent days, you can retrieve the same article by entering a particular newsgroup and
retrieving the same message number. However, message numbers are not stable across servers.
An article's number on one news server may be quite different on another server.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
POP, IMAP, and NNTP 180
The other way to identify articles is by the message ID. The message ID of the sample article is
<[email protected]>, including the angle brackets at either
side. Message IDs are unique, global identifiers that remain the same from server to server.
Net::NNTP
Historically, Netnews has been distributed in a number of ways, but the dominant mode is now the
Net News Transfer Protocol, or NNTP, described in RFC 977. NNTP is used both by Netnews
servers to share articles among themselves and by client applications to scan and retrieve articles
of interest. Graham Barr's Net::NNTP module, part of the libnet utilities, provides access to NNTP
servers.
Like other members of the libnet clan, Net::NNTP descends from Net::Cmd and inherits that mod-
ule's methods. Its API is similar to Net::POP3 and Net::IMAP::Simple. You connect to a remote
Netnews server, creating a new Net::NNTP object, and use this object to communicate with the
server. You can list and filter newsgroups, make a particular newsgroup current, list articles, down-
Licensed by
load them, and post new articles.
newsgroup_stats.pl is a short script that uses Net::NNTP to find all newsgroups that match a pattern
and count the number of articles in each. For example, to find all the newsgroups that have some-
thing to do with Perl, we could search for the pattern "*.perl*" (the output has been edited slightly
Stjepan Maric
for space):
% newsgroup_stats.pl '*.perl*'
alt.comp.perlcgi.freelance 454 articles
alt.flame.marshal.perlman 3 articles
4218908
alt.music.perl-jam 11 articles
alt.perl.sockets 45 articles
comp.lang.perl.announce 43 articles
comp.lang.perl.misc 18940 articles
comp.lang.perl.moderated 622 articles
comp.lang.perl.modules 2240 articles
comp.lang.perl.tk 779 articles
cz.comp.lang.perl 63 articles
de.comp.lang.perl.cgi 1989 articles
han.comp.lang.perl 174 articles
it.comp.lang.perl 715 articles
japan.comp.lang.perl 53 articles
Notice that the pattern match wasn't perfect, and we matched alt.music.perl-jam as well as news-
groups that have to do with the language. Figure 8.7 lists the code.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
POP, IMAP, and NNTP 181
Lines 1–3: Load modules—We turn on strict checking and load the Net::NNTP module.
Line 4: Create new Net::NNTP object—We call Net::NNTP->new() to connect to a Netnews
host. If the host isn't specified explicitly, then Net::NNTP chooses a suitable host from environ-
ment variables or the default NNTP server specified when libnet was installed.
Lines 5–6: Print stats and quit—For each argument on the command line, we call the
print_stats() print_stats() subroutine to look up the pattern and print out matching news-
groups. We then call the NNTP object's quit() method.
Lines 7–17: print_stats() subroutine—In the print_stats() subroutine we invoke the
NNTP object's newsgroups() method to find newsgroups that match a pattern. If successful,
newsgroups() returns a hash reference in which the keys are newsgroup names and the
values are brief descriptions of the newsgroup.
If the value returned by newsgroups() is undef or empty, we return. Otherwise, we sort the
groups alphabetically by name, and loop through them. For each group, we call the NNTP ob-
ject's group() method to return a list containing information about the number of articles in the
group and the message numbers of the first and last articles. We print the newsgroup name and
the number of articles it contains.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
POP, IMAP, and NNTP 182
the "^" character may be used to invert a set—for example, "[^A–Z]" to match any character that is
not in the range A through Z. Any other character matches itself exactly once. As in the shell (and
unlike Perl's regular expression operations), NNTP patterns are automatically anchored to the be-
ginning and end of the target string.
Articles can be referred to by their number in the current newsgroup, by their unique message IDs,
or, for some methods, by a range of numbers. In the latter case, the range is specified by providing
a reference to a two-element array containing the first and last message numbers of the range.
Some methods allow you to search for particular articles by looking for wildcard patterns in the
header or body of the message using the same syntax as newsgroup name wildcards.
Other methods accept times and dates, as for example, the newgroups() method that searches
for newsgroups created after a particular date. In all cases, the time is expressed in its native Perl
form as seconds since the epoch, the same as that returned by the time() built-in.
In addition to the basic NNTP functions, many servers implement a number of extension commands.
These extensions make it easier to search a server for articles that match certain criteria and to
summarize quickly the contents of a discussion group. Naturally, not all servers support all exten-
sions, and in such cases the corresponding method usually returns undef In the discussion that
follows, methods that depend on NNTP extensions are marked.
We look first at methods that affect the server itself.
$nntp = Net::NNTP->new([$host],[$option1=>$val1,$option2=>$val2…])
The new() method attempts to connect to an NNTP server. The $host argument is the DNS name or IP
address of the server. If not specified, Net::NNTP looks for the server name in the NNTPSERVER and NEWS-
HOSTS environment variables first, and then in the Net::Config nntp_hosts key. If none of these variables is
set, the Netnews host defaults to news.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
POP, IMAP, and NNTP 183
$nntp->slave()
$nntp->reader() [extension]
The slave() method puts the NNTP server into a mode in which it expects to engage in bulk transfer with
the client. The reader() method engages a mode more suitable for the interactive transfer of individual
articles. Unless explicitly disabled, reader() is issued automatically by the new() method.
$nntp->quit()
The quit() method cleans up and severs the connection with the server. This is also issued automatically
when the NNTP object is destroyed.
Once created, you can query an NNTP object for information about newsgroups. The following
methods deal with newsgroup-level functions.
$group_info = $nntp->list()
The list() method returns information about all active newsgroups. The return value is a hash reference in
which each key is the name of a newsgroup, and each value is a reference to a three-element array that
contains group information. The elements of the array are [$first,$last,$postok], where $first and
$last are the message numbers of the first and last articles in the group, and $postok is "y" if the posting
is allowed to the group or "m" if the group is moderated.
$group = $nntp->group([$group])
($articles,$first,$last,$name) = $nntp->group([$group])
The group() method gets or sets the current group. Called with a group name as its argument, it sets the
current group used by the various article-retrieval methods.
Called without arguments, the method returns information about the current group. In a scalar context, the
method returns the group name. In a list context, the method returns a four-element list that contains the number
of articles in the group, the message numbers of the first and last articles, and the name of the group.
$group_info = $nntp->newgroups($since [,$distributions])
The newgroups() method works like list(), but returns only newsgroups that have been created more
recently than the date specified in $since. The date must be expressed in seconds since the epoch as returned
by time().
The $distributions argument, if provided, limits the returned list to those newsgroups that are restricted
to the specified distribution(s). You may provide a single distribution name as a string, such as nj, or a reference
to an array of distributions, such as ['nj','ct','ny'] for the New York tristate region.
$new_articles = $nntp->newnews($since [,$groups [,$distributions]])
The newnews() method returns a list of articles that have been posted since the time value indicated by
$since. You may optionally provide a group pattern or a reference to an array of patterns in $groups, and
a distribution pattern or reference to an array of distribution patterns in $distributions.
If successful, the method returns a reference to an array that contains the message IDs of all the matching
articles. You may then use the article() and/or articlefh() methods described below to fetch the con-
tents of the articles. This method is chiefly of use for mirroring an entire group or set of groups.
$group_info = $nntp->active([$pattern]) [extension]
The active() method works like list(), but limits retrieval to those newsgroup that match the wildcard
pattern $pattern. If no pattern is specified, active() is functionally equivalent to list().
This method and the ones that follow all use common extensions to the NTTP protocol, and are not guaranteed
to work with all NNTP servers.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
POP, IMAP, and NNTP 184
Once a group is selected using the group() method, you can list and retrieve articles. Net::NNTP
gives you the option of retrieving a specific article by specifying its ID or message number, or iter-
atively fetching articles in sequence, starting at the current message number and working upward.
Should something go wrong, article() returns undef and $nntp->message contains an error
message from the server. A common error is "no such article number in this group", which can be
issued even when the message number is in range because of articles that expire or are cancelled
while the NNTP session is active.
Other article-retrieval methods are more specialized.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
POP, IMAP, and NNTP 185
$fh = $nntp->headfh([$message])
$fh = $nntp->bodyfh([$message])
These three methods act like article(), head(), and body(), but return a tied filehandle from which the
contents of the article can be retrieved. After using the filehandle, you should close it. For example, here is
one way to read message 10000 of the current newsgroup:
$fh = $nntp->articlefh(10000) or die $nntp->message;
while (<$fh>) {
print;
}
$msgid = $nntp->next()
$msgid = $nntp->last()
$msgid = $nntp->nntpstat($message)
The next(), last(), and nntpstat() methods control the current article pointer. next() advances the
current article pointer to the next article in the newsgroup, and last() moves the pointer to the previous entry.
The nntpstat() method moves the current article pointer to the position indicated by $message, which
should be a valid message number. After setting the current article pointer, all three methods return the mes-
sage ID of the current article.
Net::NMTP allows you to post new articles using the post(), postfh(), and ihave() methods.
$success = $nntp->post([$message])
The post() method posts an article to Netnews. The posted article does not have to be directed to the current
newsgroup; in fact, the news server ignores the current newsgroup when accepting an article and looks only
at the contents of its Newsgroups: header. The article may be provided as an array containing the lines of the
article or as a reference to such an array. Alternatively, you may call post() with no arguments and use the
datasend() and dataend() methods inherited from Net::Cmd to send the article one line at a time.
If successful, post() returns a true value. Otherwise, it returns undef and $nntp->message contains an
error message from the server.
$fh = $nntp->postfh()
The postfh() method provides an alternative interface for posting an article. If the server allows posting, this
method returns a tied filehandle to which you can print the contents of the article. After finishing, be sure to
close the filehandle. The result code from close() indicates whether the article was accepted by the server.
$wants_it = $nntp->ihave($messageID[,$message])
The ihave() method is chiefly of use for clients that are acting as news relays. The method asks the Netnews
server whether it wishes to accept the article whose ID is $messageID.
If the server indicates its assent, it returns a true result. The article must then be transferred to the server, either
by providing the article's contents in the $message argument or by sending the article one line at a time using
the Net::Cmd datasend() and dataend() methods. $message can be an array of article lines or a reference
to such an array.
Last, several methods allow you to search for particular articles of interest.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
POP, IMAP, and NNTP 186
The header field is case-insensitive. However, not all headers can be retrieved in this way because NNTP
servers typically index only that subset of the headers used to generate overview listings (see the next method).
The xpat() method is similar to xhdr(), but it filters the articles returned for those with $header fields that
match the wildcard pattern in $pattern. The xrover() method returns the cross-reference fields for articles
in the specified range. It is functionally identical to:
$xref = $nntp->xhdr('References',[$start,$end]);
The result of this call is a hash reference in which the keys are message numbers and the values are the
message IDs that the article refers to. These are typically used to reconstruct discussion threads.
$overview_hashref = $nntp->xover($message_range) [extension]
$format_arrayref = $nntp->overview_fmt() [extension]
The overview_fmt() and xover() methods return newsgroup "overview" information. The overview is a
summary of selected article header fields; it typically contains the Subject: line, References:, article Date:, and
article length. It is used by newsreaders to index, sort, and thread articles.
Pass the xover() method a message range (a single message number or a reference to an array containing
the extremes of the range). If successful, the method's return value is a hash reference in which each key is
a message number and each value is a reference to an array of the overview fields.
To discover what these fields are, call the overview_fmt() method. It returns an array reference containing
field names in the order in which they appear in the arrays returned by xover(). Each field is followed by a
colon and, occasionally, by a server-specific modifier. For example, my laboratory's Netnews server returns
the following overview fields:
('Subject:','From:','Date:','Message- ID:','References:',
'Bytes:','Lines:','Xref:full')
If you would prefer the values of the overview array to be a hash reference rather than an array
reference, you can use the small subroutine shown here to do the transformation. The trick is to use
the list of field names returned by overview_fmt() to create a hash slice to which we assign the
article overview array:
sub get_overview {
my ($nntp,$range) = @_;
my @fields = map {/(\w+):/&& $1} @{$nntp->overview_fmt};
my $over = $nntp->xover($range) || return;
foreach (keys %$over) {
my $h = {};
@{$h}{@fields}= @{$over->{$_}};
$over->{$_} = $h;
}
return $over;
}
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
POP, IMAP, and NNTP 187
A News-to-Mail Gateway
The last code example of this chapter is a custom news-to-mail gateway. It periodically scans Net-
news for articles of interest, bundles them into a MIME message, and mails them via Internet mail.
Each time the script is run it keeps track of the messages it has previously sent and only sends
messages that haven't been seen before.
You control the script's scope by specifying a list of newsgroups and, optionally, one or more patterns
to search for in the subject lines of the articles contained in the newsgroups. If you don't specify any
subject-line patterns, the script fetches the entire contents of the listed newsgroups.
The subject-line patterns take advantage of Perl's pattern-matching engine, and can be any regular
expression. For performance reasons, however, we use the built-in NNTP wildcard patterns for
newsgroup names.
The following command searches the comp.lang.perl.* newsgroups for articles that have the word
"Socket" or "socket" in the subject line. Matching articles will be mailed to the local e-mail address
lstein. Options include -subject, to specify the subject pattern match, -mail to set the mail recipient(s),
and -v to turn on verbose progress messages.
% scan_newsgroups.pl -v -mail lstein -subj '[sS]ocket' 'comp.lang.perl.*'
Searching comp.lang.perl.misc for matches
Fetching overview for comp.lang.perl.misc
found 39 matching articles
Searching comp.lang.perl.announce for matches
Fetching overview for comp.lang.perl.announce
found 0 matching articles
Searching comp.lang.perl.tk for matches
Fetching overview for comp.lang.perl.tk
found 1 matching articles
Searching comp.lang.perl.modules for matches
Fetching overview for comp.lang.perl.modules
found 4 matching articles
44 articles, 40 unseen
sending e-mail message to lstein
The received e-mail message contains a brief prologue that describes the search and newsgroup
patterns, followed by the matching articles. Each article is attached as an enclosure of MIME type
message/rfc822. Depending on the reader's mail-reading software, the enclosures are displayed
as either in-line components of the message or attachments. The result is particularly nice in the
Netscape mail reader (Figure 8.8) because each article is displayed using fancy fonts and hyper-
links.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
POP, IMAP, and NNTP 188
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
POP, IMAP, and NNTP 189
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
POP, IMAP, and NNTP 190
Lines 1–7: Load modules—We load the Net::NNTP and MIME::Entity modules, as well as the
Getopt::Long module for argument processing. We need to keep track of all the messages that
we have found during previous runs of the script, and the easiest way to do that is to keep the
message IDs in an indexed DBM database. However, we don't know a priori what DBM library
is available, so we import the AnyDBM_File module, which chooses a library for us. The code
contained in the BEGIN{} block changes the DBM library search order, as described in the
AnyDBM_File documentation.
We also load the Fcntl module in order to have access to several constants needed to initialize
the DBM file.
Lines 9–22: Define constants—We choose a name for the DBM file, a file named .newscache
in the user's home directory, and create a usage message.
Lines 23–25: Declare globals—The first line of globals correspond to command-line options. The
second line of globals are various data structures manipulated by the script. The %Seen hash
will be tied to the DBM file. Its keys are the message IDs of articles that we have previously
Licensed by
retrieved. %Articles contains information about the articles recovered during the current
search. Its keys are message IDs, and its values are hash references of header fields derived
from the overview index. Last, @Fields contains the list of header fields returned by the
xover() method.
Stjepan Maric
Lines 26–34: Process command-line arguments—We call GetOptions() to process the com-
mand-line options, and then check consistency of the arguments. If the e-mail recipient isn't
explicitly given on the command line, we default to the user's login name.
Lines 35–36: Open connection to Netnews server—We open a connection to the Netnews server
by calling Net::NNTP->new(). If the server isn't explicitly given on the command line, the
4218908
$SERVER option is undefined and Net::NNTP picks a suitable default.
Lines 37–39: Open DBM file—We tie %Seen to the .newscache file using the AnyDBM_File
module. The options passed to tie() cause the file to be opened read/write and to be created
with file mode 0640 (-rw-r-----), if it doesn't already exist.
Lines 40–41: Compile the pattern match—For efficiency's sake, we compile the pattern matches
into an anonymous subroutine. This subroutine takes the text of a subject line and returns true
if all the patterns match, and false otherwise. The match_code() subroutine takes the list of
pattern matches, compiles them, and returns an appropriate code reference.
Lines 42–43: Expand newsgroup patterns—We pass the list of newsgroups to a subroutine
named expand_newsgroups(). It calls the NNTP server to expand the wildcards in the list of
newsgroups and returns the expanded list of newsgroup names.
Lines 44–45: Search for matching articles—We loop through the expanded list of newsgroups
and call grep_group() for each one. The arguments to grep_group() consist of the news-
group name and a code reference to filter them. Internally, grep_group() accumulates the
matched articles' message IDs into the %Articles hash. We do it this way because the same
article may be cross-posted to several related newsgroups; using the article IDs in a hash avoids
accumulating duplicates.
Lines 46–48: Filter out articles already seen—We use Perl's grep() function to filter out articles
whose message IDs are already present in the tied %Seen hash. New article IDs are added to
the hash so that on subsequent runs we will know that we've seen them. The unseen article IDs
are assigned to the @to_fetch array.
If the user ran the script with the -all option, we short-circuit the grep() operation so that all
articles are retrieved, including those we've seen before. This does not affect the updating of
the tied %Seen hash.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
POP, IMAP, and NNTP 191
Lines 49–52: Add articles to an outgoing mail message and quit—We pass the list of article IDs
to send_mail(), which retrieves their contents and adds them to an outgoing mail message.
We then call the NNTP object's quit() method to disconnect from the server, and exit our-
selves.
Lines 53–62: The match_code() subroutine—The match_code() subroutine takes a list of
zero or more patterns and constructs a code reference on the fly. The subroutine is built up line-
by-line in a scalar variable called $code. The subroutine is designed to return true only if all the
patterns match the passed subject line. If no patterns are specified, the subroutine returns true
by default. If the -insensitive option was passed to the script, we do case-insensitive pattern
matches with the i flag. Otherwise, we do case-sensitive matches.
After constructing the subroutine code, we eval() it and return the result to the caller. If the
eval() fails (presumably because of an error in one or more of the regular expressions), we
propagate the error message and die.
Lines 63–73: The expand_newsgroups() subroutine—The expand_newsgroups(), sub-
routine takes a list of newsgroup patterns and calls the NNTP object's newsgroups() method
on each of them in turn, expanding them to a list of valid newsgroup names. If a newsgroup
contains no wildcards, we just pass it back unchanged.
Lines 74–85: The grep_group() subroutine— grep_group() scans the specified news-
group for articles whose subject lines match a set of patterns. The patterns are provided in the
form of a code reference that returns true if the subject line matches.
We call the get_overview() subroutine to return the server's overview index for the news-
group. get_overview() returns a hash reference in which each key is a message number
and each value is a hash of indexed header fields. We step through each message, recover its
Subject: and Message-ID: fields, and pass the subject field to the pattern-matching code refer-
ence. If the code reference returns false, we go on to the next article. Otherwise, we add the
article's message ID and overview data to the %Articles global.
When all articles have been examined, we return to the caller the number of those that matched.
Lines 89–102: The get_overview() subroutine—The get_overview() subroutine used
here is a slight improvement over the version shown earlier. We start by calling the NNTP object's
group() method, recovering the newsgroup's first and last message numbers. We then call
the object's overview_fmt() method to retrieve the names of the fields in the overview index.
Since this information isn't going to change during the lifetime of the script, however, we cache
it in the @Fields global and call overview_fmt() only if the global is empty. Before assigning
to @Fields, we clean up the field names by removing the ":" and anything following it.
We recover the overview for the entire newsgroup by calling the xover() method for the range
spanning the first and last article numbers. We now loop through the keys of the returned over-
view hash, replacing its array reference values, which lists fields by position, with anonymous
hashes that list fields by name. In addition to recording the header fields that occur in the article
itself, we record a pseudofield named Message-Number: that contains the group name and
message number in the form group.name:number. We use this information during e-mail con-
struction to create the default name for the article enclosure.
Lines 103–124: The send_mail() subroutine— send_mail() is called with an array of article
IDs to fetch, and is responsible for constructing a multipart MIME message containing each
article as an attachment.
We create a short message prologue that summarizes the program's run-time options and create
a new MIME::Entity by calling the build() method. The message starts as a single-part mes-
sage of type text/plain, but is automatically promoted to a multipart message as soon as we start
attaching articles to it.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
POP, IMAP, and NNTP 192
We then call attach_article() for each article listed in $to_fetch. This array may be
empty, in which case we make no attachments. When all articles have been attached, we call
the MIME entity's smtpsend() method to send out the mail using the Mail::Mailer SMTP
method, and clean up any temporary files by calling the entity's purge() method.
Lines 125–134: The attach_article() subroutine—For the indicated message ID we fetch
the entire article's contents as an array of lines by calling the NNTP object's article() method.
We then attach the article to the outgoing mail message, specifying a MIME type of message/
rfc822, a description corresponding to the article's subject line, and a suggested filename derived
from the article's newsgroup and message number (taken from the global %Articles hash).
An interesting feature of this script is the fact that because we are storing unique global message
IDs in the .newscache hashed database, we can switch to a different NNTP server without worrying
about retrieving articles we have already seen.
Summary
Net::POP3 and Net::IMAP::Simple allow client programs to receive and process Internet mail.
Net::NNTP provides access to the Netnews system via the NNTP protocol. These modules can be
combined with MIME-Tools to perform sophisticated mail processing and sorting tasks.
The ease with which the Net::*, Mail::*, and MIME::* modules interoperate is a tribute to the design
skills of the authors of those modules as well as to the elegance of the Internet mail system itself.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
193
In the previous chapters we reviewed client modules for sending and receiving Internet mail, trans-
ferring files via FTP, and interacting with Netnews servers. In this chapter we look at LWP, the Library
for Web access in Perl. LWP provides a unified API for interacting with Web, FTP, News and Mail
servers, as well as with more obscure services such as Gopher.
With LWP you can (1) request a document from a remote Web server using its URL; (2) POST data
to a Web server, emulating the submission of a fill-out form; (3) mirror a document on a remote Web
server in such a way that the document is transferred only if it is more recent than the local copy;
(4) parse HTML documents to recover links and other interesting features; (5) format HTML docu-
ments as text and postscript; and (6) handle cookies, HTTP redirects, proxy servers, and HTTP user
authentication. Indeed, LWP implements all the functionality one needs to write a Web browser in
Perl, and if you download and install the Perl-TK distribution, you'll find it contains a fully functional
graphical Web browser written on top of LWP.
The base LWP distribution contains 35 modules, and another dozen modules are required for HTML
parsing and formatting. Because of its size and scope, we will skim the surface of LWP. For an
exhaustive treatment, see LWP's POD documentation, or the excellent, but now somewhat dated
Web Client Programming with Perl [Wong 1999].
Installing LWP
The first version of LWP appeared in 1995, and was written by Martijn Koster and Gisle Aas. It has
since been maintained and extended by Gisle Aas, with help from many contributors.
The basic LWP library, distributed via CPAN in the file libwww-X.XX.tar.gz (where X.XX is the most
recent version number), provides supports for the HTTP, FTP, Gopher, SMTP, NNTP, and HTTPS
(HTTP over Secure Sockets Layer) protocols. However, before you can install it, you must install a
number of prerequisite modules:
You could download and install each of these modules separately, but the easiest way is to install
LWP and all its prerequisites in batch mode using the standard CPAN module. Here is how to do
this from the command line:
% perl -MCPAN -e 'install Bundle::LWP'
This loads the CPAN module and then calls the install() function to download, build, and install
LWP and all the ancillary modules that it needs to run.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Web Clients 194
The HTML-parsing and HTML-formatting modules were once bundled with LWP, but are now dis-
tributed as separate packages named HTML-Parser and HTML-Formatter, respectively. They each
have a number of prerequisites, and again, the easiest way to install them is via the CPAN module
using this command:
% perl -CPAN -e 'install HTML::Parser' -e 'install HTML::Formatter'
If you want to install these libraries manually, here is the list of the packages that you need to
download and install:
To use the HTTPS (secure HTTP) protocol, you must install one of the Perl SSL modules,
IO::Socket::SSL, as well as OpenSSL, the open source SSL library that IO::Socket::SSL depends
on. OpenSSL is available from http://www.openssl.org/.
LWP is pure Perl. You don't need a C compiler to install it. In addition to the module files, when you
install LWP you get four scripts, which serve as examples of how to use the library, as well as useful
utilities in their own right. The scripts are:
• lwp-request—Fetch a URL and display it.
• lwp-download—Download a document to disk, suitable for files too large to hold in memory.
• lwp-mirror—Mirror a document on a remote server, updating only the local copy if the remote
one is more recent.
• lwp-rget—Copy an entire document hierarchy recursively.
LWP Basics
Figure 9.1 shows a script that downloads the URL given on the command line. If successful, the
document is printed to standard output. Otherwise, the script dies with an appropriate error mes-
sage. For example, to download the HTML source for Yahoo's weather page, located at http://
www.yahoo.com/r/wt, you would call the script like this:
The script can just as easily be used to download a file from an FTP server like this:
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Web Clients 195
% get_url.pl ftp://www.cpan.org/CPAN/RECENT
The script will even fetch news articles, provided you know the message ID:
% get_url.pl news:[email protected]
In fact, we could reduce Figure 9.1 even further to this one-line command:
% perl -MLWP::Simple -e 'getprint shift' http://www.yahoo.com/r/wt
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Web Clients 196
The procedural interface is suitable for fetching and mirroring Web documents when you do not
need control over the outgoing request and you do not wish to examine the response in detail. The
object-oriented interface is there when you need to customize the outgoing request by providing
authentication information and data to post to a server script, or by changing other header informa-
tion passed to the server. The object-oriented interface also allows you to interrogate the response
to recover detailed information about the remote server and the returned document.
HTTP::Request
The Web paradigm generalizes all client/server interactions to a client request and a server re-
sponse. The client request consists of a Uniform Resource Locator (URL) and a request method.
The URL, which is known in the LWP documentation by its more general name, URI (for Uniform
Resource Identifier), contains information on the network protocol to use and the server to contact.
Each protocol uses different conventions in its URLs. The protocols supported by LWP include:
HTTP—The Hypertext Transfer Protocol, the "native" Web protocol described in RFCs 1945 and
2616, and the one used by all Web servers. HTTP URLs have this familiar form:
http://server.name:port/path/to/document
The http: at the beginning identifies the protocol. This is followed by the server DNS name, IP ad-
dress, and, optionally, the port the server is listening on. The remainder of the URL is the path to
the document.
FTP—A document stored on an FTP server. FTP URLs have this form:
ftp://server.name:port/path/to/document
GOPHER —A document stored on a server running the now rarely used gopher protocol. Gopher
URLs have this form:
gopher://server.name:port/path/to/document
SMTP—LWP can send mail messages via SMTP servers using mailto: URLs. These have the form:
mailto:[email protected]
where [email protected] is the recipient's e-mail address. Notice that the location of the SMTP server
isn't part of the URL. LWP uses local configuration information to identify the server.
NNTP—LWP can retrieve a news posting from an NNTP server given the ID of the message you
wish to retrieve. The URL format is:
news:message-id
As in mail: URLs, there is no way to specify the particular NNTP server. A suitable server is identified
automatically using Net::NNTP's rules (see Chapter 8).
In addition to the URL, each request has a method. The request method indicates the type of trans-
action that is requested. A number of methods are defined, but the most frequent ones are:
GET—Fetch a copy of the document indicated by the URL. This is the most common way of fetching
a Web page.
PUT—Replace or create the document indicated by the URL with the document contained in the
request. This is most commonly seen in the FTP protocol when uploading a file, but is also used by
some Web page editors.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Web Clients 197
POST—Send some information to the indicated URL. It was designed for posting e-mail messages
and news articles, but was long ago appropriated for use in sending fill-out forms to CGI scripts and
other server-side programs.
DELETE—Delete the document indicated by the URL. This is used to delete files from FTP servers
and by some Web-based editing systems.
HEAD—Return information about the indicated document without changing or downloading it.
HTTP protocol requests can also contain other information. Each request includes a header that
contains a set of RFC 822-like fields. Common fields include Accept:, indicating the MIME type(s)
the client is prepared to receive, User-agent:, containing the name and version of the client software,
and Content-type:, which describes the MIME type of the request content, if any. Other fields handle
user authentication for password-protected URLs.
For the PUT and POST methods, but not for GET, HEAD, and DELETE, the request also contains
content data. For PUT, the content is the document to upload to the location indicated by the URL.
For POST, the content is some data to send, such as the contents of a fill-out form to send to a CGI
script.
The LWP library uses a class named HTTP::Request to represent all requests, even those that do
not use the HTTP protocol. You construct a request by calling HTTP::Request->new() with the
name of the desired request method and the URL you wish to apply the request to. For HTTP
requests, you can then add or alter the outgoing headers to do such things as add authentication
information or HTTP cookies. If the request method expects content data, you'll normally add the
data to the request object using its content () method.
The API description that follows lists the most frequently used HTTP:: Request methods. Some of
them are defined in HTTP::Request directly, and others are inherited.
One begins by creating a new request object with HTTP::Request->new().
new() also accepts optional header and content arguments. $header should be a reference to an
HTTP::Headers object. However, we will not go over the HTTP::Headers API because it's easier to
allow HTTP::Request to create a default headers object and then customize it after the object is
created. $content is a string containing whatever content you wish to send to the server.
Once the request object is created, the header() method can be used to examine or change
header fields.
This example sets the Referer: field, which indicates the URL of the document that referred to the
one currently being requested:
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Web Clients 198
An HTTP header field can be multivalued. For example, a client may have a Cookie: field for each
cookie assigned to it by the server. You can set multivalued field values by using an array reference
as the value, or by passing a string in which values are separated by commas. This example sets
the Accept: field, which is a multivalued list of the MIME types that the client is willing to accept:
$request->header(Accept => ['text/html','text/plain','text/rtf'])
Alternatively, you can use the push_header() method described later to set multivalued fields.
$request->scan(\&sub)
The scan() method iterates over each of the HTTP headers in turn, invoking the code reference provided in
\&sub. The subroutine you provide will be called with two arguments consisting of the field name and its value.
For multivalued fields, the subroutine is invoked once for each value.
$request->date()
$request->expires()
$request->last_modified()
$request->if_modified_since()
$request->content_type()
$request->content_length()
$request->referer()
$request->user_agent()
These methods belong to a family of 19 convenience methods that allow you to get and set a number of
common unique-valued fields. Called without an argument, they return the current value of the field. Called
with a single argument, they set it. The methods that deal with dates use system time format, as returned by
time().
Three methods allow you to set and examine one request's content.
$request->content([$content])
$request->content_ref
The content() method sets the content of the outgoing request. If no argument is provided, it returns the
current content value, if any. content_ref() returns a reference to the content, and can be used to manip-
ulate the content directly.
When POSTing a fill-out form query to a dynamic Web page, you use content() to set the query string, and
call content_type() to set the MIME type to either application/x-www-form-urlencoded or multipart/form-
data.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Web Clients 199
It is also possible to generate content dynamically by passing content() a reference to a piece of code that
returns the content. LWP invokes the subroutine repeatedly until it returns an empty string. This facility is useful
for PUT requests to FTP servers, and POST requests to mail and news servers. However, it's inconvenient to
use with HTTP servers because the Content-Length: field must be filled out before sending the request. If you
know the length of the dynamically generated content in advance, you can set it using the con-
tent_length() method.
$request->add_content($data)
This method appends some data to the end of the existing content, if any. It is useful when reading content
from a file.
Finally, several methods allow you to change the URL and method.
$request->uri([$uri])
This method gets or sets the outgoing request's URI.
$request->method([$method])
This method() gets or sets the outgoing request's method.
$string = $request->as_string
The as_string() method returns the outgoing request as a string, often used during debugging.
HTTP::Response
Once a request is issued, LWP returns the server's response in the form of an HTTP::Response
object. HTTP::Response is used even for non-HTTP protocols, such as FTP.
HTTP::Response objects contain status information that reports the outcome of the request, and
header information that provides meta-information about the transaction and the requested docu-
ment. For GET and POST requests, the HTTP::Response usually contains content data.
The status information is available both as a numeric status code and as a short human-readable
message. When using the HTTP protocol, there are more than a dozen status codes, the most
common of which are listed in Table 9.1. Although the text of the messages varies slightly from
server to server, the codes are standardized and fall into three general categories:
• Informational codes, in the range 100 through 199, are informational status codes issued before
the request is complete.
• Success codes, which occupy the 200 through 299 range, indicate successful outcomes.
• Redirection status codes, in the 300 through 399 range, indicate that the requested URL has
moved elsewhere. These are commonly encountered when a Web site has been reorganized
and the administrators have installed redirects to avoid breaking incoming external links.
• Errors in the 400 through 499 range indicate various client-side errors, and those 500 and up
are server-side errors.
When dealing with non-HTTP servers, LWP synthesizes appropriate status codes. For example,
when requesting a file from an FTP server, LWP generates a 200 ("OK") response if the file was
downloaded, and 404 ("Not Found") if the requested file does not exist.
The LWP library handles some status codes automatically. For example, if a Web server returns a
redirection response indicating that the requested URL can be found at a different location (codes
301 or 302), LWP automatically generates a new request directed at the indicated location. The
response that you receive corresponds to the new request, not the original. If the response requests
authorization (status code 401), and authorization information is available, LWP reissues the re-
quest with the appropriate authorization headers.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Web Clients 200
HTTP::Response headers describe the server, the transaction, and the enclosed content. The most
useful headers include Content-type: and Content-length:, which provide the MIME type and length
of the returned document, if any, Last-modified:, which indicates when the document was last modi-
fied, and Date:, which tells you the server's idea of the time (since client and server clocks are not
necessarily synchronized).
Table 9.1. Common HTTP Status Codes and Messages
Code Message Description
1XX codes: informational
100 Continue Continue with request.
101 Switching Protocols It is upgrading to newer version of HTTP.
2XX codes: success
200 OK The URL was found. Its contents follows.
201 Created A URL was created in response to a POST.
Licensed by
202 Accepted The request was accepted for processing at a later date.
204 No Response The request is successful, but there's no content.
3XX codes: redirection
301 Moved The URL has permanently moved to a new location.
Stjepan Maric
302 Found The URL can be temporarily found at a new location.
4XX codes: client errors
400 Bad Request There's a syntax error in the request.
401 Authorization Required Password authorization is required.
4218908
403 Forbidden This URL is forbidden, and authorization won't help.
404 Not Found It isn't here.
5XX codes: server errors
500 Internal Error The server encountered an unexpected error.
501 Not Implemented Used for unimplemented features.
502 Overloaded The server is temporarily overloaded.
Like the request object, HTTP::Response inherits from HTTP::Message, and delegates unknown
method calls to the HTTP::Headers object contained within it. To access header fields, you can call
header(), content_type(), expires(), and all the other header-manipulation methods de-
scribed earlier.
Similarly, the response content can be accessed using the content() and content_ref()
methods. Because some documents can be quite large, LWP also provides methods for saving the
content directly to disk files and spooling them to subroutines in pieces.
Although HTTP::Response has a constructor, you will not usually construct it yourself, so it isn't
listed here. For brevity, a number of other infrequently used methods are also omitted. See the
HTTP::Response documentation for full API.
$status_code = $response->code
$status_message = $response->message
The code() and message() methods return information about the outcome of the request. code() returns
a numeric status code, and message() returns its human-readable equivalent. You can also provide these
methods with an argument in order to set the corresponding field.
$text = $response->status_line
The status_line() method returns the status code followed by the message in the same format returned
by the Web server.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Web Clients 201
$boolean = $response->is_success
$boolean = $response->is_redirect
$boolean = $response->is_info
$boolean = $response->is_error
These four methods return true if the response was successful, is a redirection, is informational, or is an error,
respectively.
$html = $response->error_as_HTML
If is_error() returns true, you can call error_as_HTML() to return a nicely formatted HTML document
describing the error.
$base = $response->base
The base() method returns the base URL for the response. This is the URL to use to resolve relative links
contained in the returned document. The value returned by base() is actually a URI object, and can be used
to "absolutize" relative URLs. See the URI module documentation for details.
$request = $response->request
The request() method returns a copy of the HTTP::Request object that generated this response. This may
not be the same HTTP::Request that you constructed. If the server generated a redirect or authentication
request, then the request returned by this method is the object generated internally by LWP.
$request = $response->previous
previous() returns a copy of the HTTP::Request object that preceded the current object. This can be used
to follow a chain of redirect requests back to the original request. If there is no previous request, this method
returns undef.
Figure 9.3 shows a simple script named follow_chain.pl that uses the previous() method
to show all the intermediate redirects between the requested URL and the retrieved URL. It begins
just like the get_url.pl script of Figure 9.1, but uses the HEAD method to retrieve information about
the URL without fetching its content. After retrieving the HTTP::Response, we call previous()
repeatedly to retrieve all intermediate responses. Each response's URL and status line is prepended
to a growing list of URLs, forming a response chain. At the end, we format the response chain a bit
and print it out.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Web Clients 202
Here is the result of fetching a URL that has been moved around a bit in successive reorganizations
of my laboratory's Web site:
% follow_chain.pl http://stein.cshl.org/software/WWW
Response chain:
http://stein.cshl.org/software/WWW (302 Found)
-> http://stein.cshl.org/software/WWW/ (301 Moved Permanently)
-> http://stein.cshl.org/WWW/software/ (200 OK)
LWP::UserAgent
The LWP::UserAgent class is responsible for submitting HTTP::Request objects to remote servers,
and encapsulating the response in a suitable HTTP::Response. It is, in effect, a Web browser en-
gine.
In addition to retrieving remote documents, LWP::UserAgent knows how to mirror them so that the
remote document is transferred only if the local copy is not as recent. It handles Web pages that
require password authentication, stores and returns HTTP cookies, and knows how to negotiate
HTTP proxy servers and redirect responses.
Unlike HTTP::Response and HTTP::Request, LWP::UserAgent is frequently subclassed to cus-
tomize the way that it interacts with the remote server. We will see examples of this in a later section.
$agent = LWP::UserAgent->new
The new() method constructs a new LWP::UserAgent object. It takes no arguments. You can reuse one user
agent multiple times to fetch URLs.
$response = $agent->request ($request, [$dest [,$size]])
The request() method issues the provided HTTP::Request, returning an HTTP:: Response. A response is
returned even on failed requests. You should call the response's is_success() or code() methods to de-
termine the exact outcome.
The optional $dest argument controls where the response content goes. If it is omitted, the content is placed
in the response object, where it can be recovered with the content() and content_ref() methods.
If $dest is a scalar, it is treated as a filename. The file is opened for writing, and the retrieved
document is stored to it. Because LWP prepends a > symbol to the filename, you cannot use com-
mand pipes or other tricks. Because the content is stored to the file, the response object indicates
successful completion of the task, but content(), returns undef.
$dest can also be a reference to a callback subroutine. In this case, the content data is passed to
the indicated subroutine at regular intervals, giving you a chance to do something with the data, like
pass it to an HTML parser. The callback subroutine should look something like this:
sub handle_content {
my ($data,$response,$protocol) = @_;
...
}
The three arguments passed to the callback are the current chunk of content data, the current
HTTP::Response object, and an LWP::Protocol object. The response object is provided so that the
subroutine can make intelligent decisions about how to process the content, such as piping data of
type image/jpeg to an image viewer. The LWP::Protocol object implements protocol-specific access
methods that are used by LWP internally. It is unlikely that you will need it.
If you use a code reference for $dest, you can exercise some control over the content chunk size
by providing a $size argument. For example, if you pass 512 for $size, the callback will be called
repeatedly with 512-byte chunks of the content data.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Web Clients 203
Two methods allow you to set time and space limits on requests.
$timeout = $agent->timeout([$timeout])
timeout() gets or sets the timeout on requests, in seconds. The default is 180 seconds (3 minutes). If the
timeout expires before the request completes, the returned response has a status code of 500, and a message
indicating that the request timed out.
$bytes = $agent->max_size([$bytes])
The max_size() method gets or sets a maximum size on the response content returned by the remote server.
If the content exceeds this size, then the content is truncated and the response object contains an X-Content-
Range: header indicating the portion of the document returned. Typically, this header has the format bytes
start-end, where start and end are the start and endpoints of the document portion.
By default, the size is undef, meaning that the user agent will accept content of any length.
$id = $agent->agent([$id])
The agent() method gets or sets the User-Agent: field that LWP will send to HTTP servers. It has the form
name/x.xx (comment), where name is the client software name, x.xx is the version number, and (comment) is
an optional comment field. By default, LWP uses libwww-perl/x.xx, where x.xx is the current module version
number.
You may need to change the agent ID to trigger browser-specific behavior in the remote server. For example,
this line of code changes the agent ID to Mozilla/4.7, tricking the server into thinking it is dealing with a Netscape
version 4.X series browser running on a Palm Pilot:
$agent->agent('Mozilla/4.7 [en] (PalmOS)')
$address = $agent->from([$address])
The from() method gets or sets the e-mail address of the user responsible for the actions of the user agent.
It is incorporated into the From: field used in mail and news postings, and will be issued, along with other fields,
to HTTP servers. You do not need to provide this information when communicating with HTTP servers, but it
can be provided in Web crawling robots as a courtesy to the remote site.
A number of methods control how the agent interacts with proxies, which are commonly used when
the client is behind a firewall that doesn't allow direct Internet access, or in situations where band-
width is limited and the organization wishes to cache frequently used URLs locally.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Web Clients 204
The proxy() method sets or gets the proxy servers used for requests. The first argument, $protocol, is
either a scalar containing the name of a protocol to proxy, such as "ftp", or an array reference that lists several
protocols to proxy, such as ['ftp','http','gopher']. The second argument, $proxy, is the URL of the
proxy server to use. For example:
$agent->proxy([qw(ftp http)] => 'http://proxy.cshl.org:8080')
You may call this method several times if you need to use a different proxy server for each protocol:
$agent->proxy(ftp => 'http://proxy1.cshl.org:8080');
$agent->proxy(http => 'http://proxy2.cshl.org:9000');
As this example shows, HTTP servers are commonly used to proxy FTP requests as well as HTTP requests.
$agent->no_proxy(@domain_list)
Call the no_proxy() method to deactivate proxying for one or more domains. You would typically use this to
turn off proxying for intranet servers that you can reach directly. This code fragment disables proxying for the
"localhost" server and all machines in the "cshl.org" domain:
$agent->no_proxy('localhost','cshl.org')
Calling no_proxy() with an empty argument list clears the list of proxyless domains. It cannot be used to
return the current list.
$agent->env_proxy
env_proxy() is an alternative way to set up proxies. Instead of taking proxy information from its argument
list, this method reads proxy settings from *_proxy environment variables. These are the same environment
variables used by UNIX and Windows versions of Netscape. For example, a C-shell initialization script might
set the FTP and HTTP proxies this way:
setenv ftp_proxy http://proxy1.cshl.org:8080
setenv http_proxy http://proxy2.cshl.org:9000
setenv no_proxy localhost,cshl.org
Lastly, the agent object offers several methods for controlling authentication and cookies.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Web Clients 205
We won't go through the complete HTTP::Cookies API, which allows you to examine and manipulate
cookies, but here is the idiom to use if you wish to accept cookies for the current session, but not
save them between sessions:
$agent->cookie_jar(new HTTP::Cookies);
Here is the idiom to use if you wish to save cookies automatically in a file named .lwp-cookies for
use across multiple sessions:
my $file = "$ENV{HOME}/.lwp-cookies";
$agent->cookie_jar(HTTP::Cookies->new(file=>$file,autosave=>1));
Finally, here is how to tell LWP to use an existing Netscape-format cookies file, assuming that it is
stored in your home directory in the file ~/.netscape/cookies (Windows and Mac users must modify
this accordingly):
my $file = "$ENV{HOME}/.netscape/cookies";
$agent->cookie_jar(HTTP::Cookies::Netscape->new(file=>$file,
autosave=>1));
LWP Examples
Now that we've seen the LWP API, we'll look at some practical examples that use it.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Web Clients 206
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Web Clients 207
We ask the script to retrieve RFCs 2616, 1945 and 11. The status reports indicate that RFC 2616
was retrieved OK, RFC 1945 did not need to be retrieved because the local copy is current, and
that RFC 11 could not be retrieved because no such file exists on the remote server (there is, in
fact, no RFC 11).
The code, shown in Figure 9.5, is only 15 lines long.
Lines 1–8: Load modules and create user agent—The setup of the LWP::UserAgent is identical
to the previous example, except that we modify the usage message and the user agent ID
appropriately.
Lines 9–15: Main loop—We read RFC numbers from the command line. For each RFC, we
construct a local filename of the form rfcXXXX.html, where XXXX is the number of the requested
document. We append this to the RFC server's base URL in order to obtain the full remote URL.
In contrast with the previous example, we don't need to create an HTTP::Request in order to do
mirroring. We simply pass the remote URL and local filename to the agent's mirror() method,
obtaining an HTTP::Response in return. We then print the status message returned by the re-
sponse object's message() method.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Web Clients 208
You can simulate the submission of a fill-out form from within LWP provided that you know what
arguments the remote server is expecting and how it is expecting to receive them. Sometimes the
remote Web site documents how to call its server-side scripts, but more often you have to reverse
engineer the script by looking at the fill-out form's source code.
For example, the Internet FAQ Consortium provides a search page at http://www.faqs.org/rfcs/ that
includes, among other things, a form for searching the RFC archive with text search terms. By
navigating to the page in a conventional browser and selecting the "View Source" command, I ob-
tained the HTML source code for the page. Figure 9.6 shows an excerpt from this page, which
contains the definition for the search form (it's been edited slightly to remove extraneous formatting
tags).
Figure 9.6. Definition of the HTML form used by the FAQ Consortium's RFC seach script
In HTML, fill-out forms start with a <FORM> tag and end with </FORM>. Between the two tags are
one or more <INPUT> tags, which create simple fields like text entry fields and buttons,
<SELECT> tags, which define multiple-choice fields like scrolling lists and pop-up menus, and
<TEXTAREA> tags, which create large text entry fields with horizontal and vertical scrollbars.
Form elements have a NAME attribute, which assigns a name to the field when it is sent to the Web
server, and optionally a VALUE attribute, which assigns a default value to the field. <INPUT> tags
may also have a TYPE attribute that alters the appearance of the field. For example, TYPE="text"
creates a text field that the user can type in, TYPE="checkbox" creates an on/off checkbox, and
TYPE="hidden" creates an element that isn't visible in the rendered HTML, but nevertheless has
its name and value passed back to the server when the form is submitted.
The <FORM> tag itself has two required attributes. METHOD specifies how the contents of the fill-out
form are to be sent to the Web server, and may be one of GET and POST. We'll talk about the
implications of the method later. ACTION specifies the URL to which the form fields are to be sent.
It may be a full URL or an abbreviated form relative to the URL of the HTML page that contains the
form.
Occasionally, the ACTION attribute may be missing entirely, in which case the form fields should be
submitted to the URL of the page in which the form is located. Strictly speaking, this is not valid
HTML, but it is widely used.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Web Clients 209
In the example in Figure 9.6, the RFC search form consists of two elements. A text field named
"query" prompts the user for the text terms to search for, and a menu named "archive" specifies
which part of the archive to search in. The various menu choices are specified using a series of
<OPTION> tags, and include the values "rfcs", "rank", and "rfcindex". There is also a submission
button, created using an <INPUT> tag with a TYPE attribute of "submit". However, because it has
no NAME attribute, its contents are not included in the information to the server. Figure 9.7 shows
what this looks like when rendered by a browser.
When the form is submitted, the browser bundles the current contents of the form into a "query
string" using a MIME format known as application/x-www-form-urlencoded. This format consists of
a series of name= value pairs, where the names and values are taken from the form elements and
their current values. Each pair is separated by an ampersand (&) or semicolon (;). For example, if
we typed "MIME types" into the RFC search form's text field and selected "Search RFC Index" from
the pop-up menu, the query string generated by the browser would be:
query=MIME%20types&archive=rfcindex
Notice that the space in "MIME types" has been turned into the string %20. This is a hexadecimal
escape for the space character (0x20 in ASCII). A number of characters are illegal in query strings,
and must be escaped in this way. As we shall see, the URI::Escape module makes it easy to create
escaped query strings.
The way the browser sends the query string to the Web server depends on whether the form sub-
mission method is GET or POST. In the case of GET, a " ? " followed by the query string is appended
directly to the end of the URL indicated by the <FORM> tag's ACTION attribute. For example:
http://www.faqs.org/cgi-bin/rfcsearch?query=MIME%20types&archive=rfcindex
In the case of a form that specifies the POST method, the correct action is to POST a request to
the URL indicated by ACTION, and pass the query string as the request content.
It is very important to send the query string to the remote server in the way specified by the
<FORM> tag. Some server-side scripts are sufficiently flexible to recognize and deal with both GET
and POST requests in a uniform way, but many do not.
In addition to query strings of type application/x-www-form-urlencoded, some fill-out forms use a
newer encoding system called multipart/form-data. We will talk about dealing with such forms in the
section File Uploads Using multipart/form-data.
Our next sample script is named search_rfc.pl. It invokes the server-side script located at http://
www.faqs.org/cgi-bin/rfcsearch to search the RFC index for documents having some relevance to
the search terms given on the command line. Here's how to search for the term "MIME types":
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Web Clients 210
search_rfc.pl works by simulating a user submission of the fill-out form shown in Figures 9.6
and 9.7. We generate a query string containing the query and archive fields, and POST it to the
server-side search script. We then extract the desired information from the returned HTML docu-
ment and print it out.
To properly escape the query string, we use the uri_escape() function, provided by the LWP
module named URI::Escape. uri_escape() replaces disallowed characters in URLs with their
hexadecimal escapes. Its companion, uri_unescape(), reverses the process.
Figure 9.8 shows the code for the script.
Licensed by
Stjepan Maric
4218908
Lines 1–4: Load modules—We turn on strict syntax checking and load the LWP and URI::Escape
modules. URI::Escape imports the uri_escape() and uri_unescape() functions automat-
ically.
Lines 5–7: Define constants—We define one constant for the URL of the remote search script,
and another for the page on which the fill-out form is located. The latter is needed to properly fill
out the Referer: field of the request, for reasons that we will explain momentarily.
Lines 8–10: Create user agent—This code is identical to the previous examples, except for the
user agent ID.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Web Clients 211
Lines 11–12: Construct query string—We interpolate the command-line arguments into a string
and use it as the value of the fill-out form's query field. We are interested in searching the ar-
chive's RFC index, so we use "rfcindex" as the value of the archive field. These are incorporated
into a properly formatted query string and escaped using uri_escape().
Lines 13–15: Construct request—We create a new POST request on the remote search script,
and use the returned request object's content() method to set the content to the query string.
We also alter the request object's Referer: header so that it contains the fill-out form's URL. This
is a precaution. For consistency, some server-side scripts check the Referer: field to confirm
that the request came from a fill-out form located on their own server, and refuse to service
requests that do not contain the proper value. Although the Internet FAQ Consortium's search
script does not seem to implement such checks, we set the Referer: field here in case they
decide to do so in the future.
As an aside, the ease with which we are able to defeat the Referer: check illustrates why this
type of check should never be relied on to protect server-side Web scripts from misuse.
Lines 16–17: Submit request—We pass the request to the LWP::UserAgent's request()
method, obtaining a response object. We check the response status with is_success(), and
die if the method indicates a failure of some sort.
Lines 18–21: Fetch and parse content—We retrieve the returned HTML document by calling the
response object's content() method and assign it to a scalar variable. We now need to extract
the RFC name and title from the document's HTML. This is easy to do because the document
has the predictable structure shown in Figures 9.9 (screenshot) and 9.10 (HTML source). Each
matching RFC is an item in an ordered list (HTML tag <OL>) in which the RFC number is con-
tained within an <A> tag that links to the text of the RFC, and the RFC title is contained between
a pair of <STRONG> tags.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Web Clients 212
Figure 9.10. HTML code for the RFC Index Search results
We use a simple global regular expression match to find and match all lines referring to RFCs,
extract the RFC name and title, and print the information to standard output.
An enhancement to this script would be to provide an option to fetch the text of each RFC returned
by the search. One way to do this would be to insert a call to $ua->request() for each matched
RFC. Another, and more elegant, way would be to modify get_rfc.pl from Figure 9.4 so as to
accept its list of RFC numbers from standard input. This would allow you to fetch the content of each
RFC returned by a search by combining the two commands in a pipeline:
% fetch_rfc.pl MIME type | get_rfc.pl
Because The Internet FAQ Consortium has not published the interface to its search script, there is
no guarantee that they will not change either the form of the query string or the format of the HTML
document returned in response to searches. If either of these things happen, search_rfc.pl will break.
This is a chronic problem for all such Web client scripts and a compelling reason to check at each
step of a complex script that the remote Web server is returning the results you expect.
This script contains a subtle bug in the way it constructs its query strings. Can you find it? The bug
is revealed in the next section.
Using POST() here's how we could construct a request to the Internet FAQ Consortium's RFC index
search engine:
my $request = POST('http://www.faqs.org/cgi-bin/rfcsearch',
[ query => 'MIME types',
archive => 'rfcindex' ]
);
And here's how to do the same thing but setting the Referer: header at the same time:
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Web Clients 213
my $request =POST('http://www.faqs.org/cgi-bin/rfcsearch',
[ query => 'MIME types',
archive => 'rfcindex' ],
Referer => 'http://www.faqs.org/rfcs');
Notice that the field/value pairs of the request content are contained in an array reference, but the
name/value pairs of the request headers are a simple list.
As an alternative, you may provide the form data as the argument to a pseudoheader field named
Content:. This looks a bit cleaner when setting both request headers and form content:
my $request = POST('http://www.faqs.org/cgi-bin/rfcsearch',
Content => [ query => 'MIME types',
archive => 'rfcindex' ],
Referer => 'http://www.faqs.org/rfcs');
POST() will take care of URI escaping the form fields and constructing the appropriate query string.
Using HTTP::Request::Common, we can rewrite search_rfc.pl as shown in Figure 9.11. The new
version is identical to the old except that it uses POST() to construct the fill-out form submission
and to set the Referer: field of the outgoing request (lines 12–17). Compared to the original version
of the search_rfc.pl script, the new script is easier to read. More significant, however, it is less prone
to bugs. The query-string generator from the earlier versions contains a bug that causes it to gen-
erate broken query strings when given a search term that contains either of the characters "&" or
"=". For example, given the query string "mime&types", the original version generates the string:
query=mime&types&archive=rfcindex
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Web Clients 214
The manual fix would be to replace " & " with " %26 " and " = " with " %3D " in the search terms
before constructing the query string and passing it to uri_escape(). However, the POST()-
based version handles this automatically, and generates the correct content:
query=mime%26types&archive=rfcindex
The POST method is always used with this type of encoding. multipart/form-data uses an encoding
scheme that is extremely similar to the one used for multipart MIME enclosures. Each form element
is given its own subpart with a Content-Disposition: of "form-data", a name containing the field name,
and body data containing the value of the field. For uploaded files, the body data is the content of
the file.
Although conceptually simple, it's tricky to generate the multipart/form-data format correctly. Fortu-
nately, the POST() function provided by HTTP::Request:: Common can also generate requests
compatible with multipart/form-data. The key is to provide POST() with a Content_Type: header
argument of "form-data":
my $request = POST('http://www.faqs.org/cgi-bin/rfcsearch',
Content_Type => 'form-data',
Referer => 'http://www.faqs.org/rfcs',
Content => [ query => 'MIME types',
archive => 'rfcindex' ]
);
This generates a request to the RFC search engine using the multipart/form-data encoding scheme.
But don't try it: the RFC FAQ site doesn't know how to handle this scheme.
To tell LWP to upload a file, the value of the corresponding form field must be an array reference
containing at least one element:
$fieldname => [ $file, $filename, header1=>$value.... ]
The mandatory first element in the array, $file, is the path to the file to upload. The optional
$filename argument is the suggested name to use for the file, and is similar to the MIME::Entity
Filename argument. This is followed by any number of additional MIME headers. The one used
most frequently is Content_Type:, which gives the server script the MIME type of the uploaded file.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Web Clients 215
To illustrate how this works, we'll write a client for the CGI script located at http://stein.cshl. org/
WWW/software/CGI/examples/file_upload.cgi [http://stein.cshl.org/WWW/software/CGI/examples/
file_upload.cgi]. This is a script that I wrote some years ago to illustrate how CGI scripts accept and
process uploaded files. The form that drives the script (Figures 9.12 and 9.14) contains a single file
field named filename, and three checkboxes named count with values named "count lines",
"count words", and "count characters". There's also a hidden field named .cgifields with
a value of "count."
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Web Clients 216
After form submission, the script reads the uploaded file and counts its lines, words, and/or char-
acters, depending on which checkboxes are selected. It prints these statistics, along with the name
of the file and its MIME type, if any (Figure 9.13).
We will now develop an LWP script to drive this CGI script. remote_wc.pl reads a file from the
command line or standard input and uploads it to file_upload.cgi. It parses the HTML result and
prints the word count returned by the remote server:
% remote_wc.pl ~/public_html/png.html
lines = 20; words = 47; characters = 362
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Web Clients 217
This is a pretty difficult way to perform a word count, but it does illustrate the technique! Figure
9.15 gives the code for remote_wc.pl.
Lines 1–4: Load modules—We turn on strict syntax checking and load the LWP and HTTP::Re-
quest::Common modules.
Lines 5–7: Process arguments—We define a constant for the URL of the CGI script and recover
the name of the file to upload from the command line.
Lines 8–21: Create user agent and request—We create the LWP::UserAgent in the usual way.
We then create the request using the POST() function, passing the URL of the CGI script as
the first argument, a Content_Type argument of "form-data", and a Content argument containing
the various fields used by the upload form.
Notice that the count field appears three times in the Content array, once for each of the check-
boxes in the form. The value of the filename field is an anonymous array containing the file path
provided on the command line. We also provide values for the .cgifields hidden field and the
submit button, even though it isn't clear that they are necessary (they aren't, but unless you have
the documentation for the remote server script, you won't know this).
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Web Clients 218
Lines 22–23: Issue request—We call the user agent's request() method to issue the POST,
and get a response object in return. As in earlier scripts, we check the is_success() method
and die if an error occurs.
Lines 24–27: Extract results—We call the response's content() method to retrieve the HTML
document generated by the remote script, and perform a pattern match on it to extract the values
for the line, word, and character counts (this regular expression was generated after some ex-
perimentation with sample HTML output). Before exiting, we print the extracted values to stand-
ard output.
If you wish to try this script with the URL given in the example, the username is "perl" and the
password is "programmer."
Figure 9.16 shows the code for get_url2.pl. Except for an odd little idiom, it's straightforward. We
are going to declare a subclass of LWP::UserAgent, but we don't want to create a whole module
file just to override a single method. Instead, we arrange for the script itself (package "main") to be
a subclass of LWP::UserAgent, and override the get_basic_credentials() method directly in
the main script file. This is a common, and handy, trick.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Web Clients 219
Lines 1–6: Load modules—We turn on strict syntax checking and load LWP. We also load the
PromptUtil module (listed in Appendix A), which provides us with the get_passwd() function
for prompting the user for a password without echoing it to the screen.
We set the @ISA array to make sure that the current package is a subclass of LWP::UserAgent.
Lines 7–12: Issue request, print content—The main section of the script is identical to the original
get_url.pl, with one exception. Instead of calling LWP::User Agent->new() to create a new
user agent object, we call _PACKAGE->new(). The Perl interpreter automatically replaces the
_PACKAGE_ token with the name of the current package ("main" in this case), creating the de-
sired LWP::UserAgent subclass.
Lines 13–20: Override get_basic_credentials() method—This section of the code overrides
get_basic_credentials() with a custom subroutine. The subclass behaves exactly like
LWP::UserAgent until it needs to fetch authentication information, at which point this subroutine
is invoked.
We are called with three arguments, consisting of the user agent object, the authentication realm,
and the URL that has been requested. We prompt the user for a username, and then call
get_passwd() to prompt and fetch the user's password. These are returned to the caller as a
two-element list.
An interesting characteristic of this script is that if the username and password aren't entered cor-
rectly the first time, LWP invokes the get_basic_credentials() once more and the user is
prompted to try again. If the credentials still aren't accepted, the request fails with an "Authorization
Required" status. This nice "second try" feature appears to be built into LWP.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Web Clients 220
Licensed by
In this section, we demonstrate how to use HTML::Formatter to transform HTML into nicely format-
ted plain text or postscript. Then we show some examples of using HTML::Parser for the more
general task of extracting information from HTML files.
Stjepan Maric
Formatting HTML
The HTML::Formatter module is the base class for a family of HTML formatters. Only two members
of the family are currently implemented. HTML::FormatText takes an HTML document and produces
nicely formatted plain text, and HTML::FormatPS creates postscript output. Neither subclass of
HTML::Formatter handles inline images, forms, or tables. In some cases, this can be a big limitation.
4218908
There are two steps to formatting an HTML file. The first step is to parse the HTML into a parse tree,
using a specialized subclass of HTML::Parser named HTML::TreeBuilder. The second step is to
pass this parse tree to the desired subclass of HTML::Formatter to output the formatted text.
Figure 9.17 shows a script named format_html.pl that uses these modules to read an HTML file from
the command line or standard input and format it. If given the —postscript option, the script produces
postscript output suitable for printing. Otherwise, it produces plain text.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Web Clients 221
Lines 1–4: Load modules—We turn on strict syntax checking and load the Getopt::Long and
HTML:TreeBuilder modules. The former processes the command-line arguments, if any. We
don't load any HTML::Formatter modules at this time because we don't know yet whether to
produce plain text or postscript.
Lines 5–7: Process command-line options—We call the GetOptions() function to parse the
command-line options. This sets the global variable $PS to true if the —postscript option is
specified.
Lines 8–15: Create appropriate formatter—If the user requested postscript output, we load the
HTML::FormatPS module and invoke the class's new() method to create a new formatter object.
Otherwise, we do the same thing with the HTML:: FormatText class. When creating an
HTML::FormatPS formatter, we pass the new() method a PaperSize argument of "Letter" in
order to create output compatible with the common 81/2 × 11" letter stock used in the United
States.
Lines 16–18: Parse HTML—We create a new HTML::TreeBuilder parser by calling the class's
new() method. We then read the input HTML one line at a time using the <> operator and pass
it to the parser object. When we are done, we tell the parser so by calling its eof() method.
This series of operations leaves the HTML parse tree in the parser object itself, in a variable
named $tree.
Line 19–20: Format and output the tree—We pass the parse tree to the formatter's format()
method, yielding a formatted string. We print this, and then clean up the parse tree by calling is
delete() method.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Web Clients 222
$formatter = HTML::FormatText->new([leftmargin=>$left,rightmargin=>$right])
HTML::FormatText->new() takes two optional arguments, leftmargin and rightmargin, which set the left
and right page margins, respectively. The margins are measured in characters. If not specified, the left and
right margins default to 3 and 72, respectively. It returns a formatter object ready for use in converting HTML
to text.
$formatter = HTML::FormatPS->new([option1=>$val1, option2=>$val2...])
Similarly, HTML::FormatPS->new() creates a new formatter object suitable for rendering HTML into post-
script. It accepts a larger list of argument/value pairs, the most common of which are listed here:
• PaperSize sets the page height and width appropriately for printing. Acceptable values are A3, A4, A5, B4,
B5, Letter, Legal, Executive, Tabloid, Statement, Folio, 10x14, and Quarto. United States users take note!
The default PaperSize is the European A4. You should change this to Letter if you wish to print on common
81/2 x 11" paper.
• LeftMargin, RightMargin, TopMargin, and BottomMargin control the page margins. All are given in point
units.
• FontFamily sets the font family to use in the output. Recognized values are Courier, Helvetica, and
Times, the default.
• FontScale allows you to increase or decrease the font size by some factor. For example, a value of 1.5 will
scale the font size up by 50 percent.
Once a formatter is created, you can use it as many times as you like to format HTML::TreeBuilder
objects.
$text = $formatter->format($tree)
Pass an HTML parse tree to the format() method. The returned value is a scalar variable, which you can
then print, save to disk, or send to a print spooler.
$tree = HTML::TreeBuilder->new
The new() method takes no arguments. It returns a new, empty HTML::TreeBuilder object.
$result = $tree->parse_file($file)
The parse_file() method accepts a filename or filehandle and parses its contents, storing the parse tree
directly in the HTML::TreeBuilder object. If the parse was successful, the result is a copy of the tree object; if
something went wrong (check $! for the error message), the result is undef.
$result = $tree->parse($data)
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Web Clients 223
With the parse() method, you can parse an HTML file in chunks of arbitrary size. $data is a scalar that
contains the HTML text to process. Typically you will call parse() multiple times, each time with the next
section of the document to process. We will see later how to take advantage of this feature to begin HTML
parsing while the file is downloading. If something goes wrong during parsing, parse() returns undef. If
parse() is successful, it will return a copy of the HTML::TreeBuilder object, undef otherwise.
$tree->eof
Call this method when using parse(). It tells HTML::TreeBuilder that no more data is coming and allows it to
finish the parse.
Figure 9.16 is a good example of using parse() and eof() to parse the HTML file on standard
input one line at a time.
$tree->delete
When you are finished with an HTML::TreeBuilder tree, call its delete() method to clean up. Unlike other
Perl objects, which are automatically destroyed when they go out of scope, you must be careful to call
delete() explicitly when working with HTML::TreeBuilder objects or risk memory leaks. The HTML::Element
POD documentation explains why this is so.
Many scripts combine HTML::TreeBuilder object creation with file parsing using this idom:
$tree = HTML::TreeBuilder->new->parse_file('rfc2010.html');
However, the HTML::TreeBuilder object created this way will never be deleted, and will leak mem-
ory. If you are parsing files in a loop, always create the HTML::TreeBuilder object, call its
parse_file() method, and then call its delete() method.
The parse tree returned by HTML::TreeBuilder is actually a very feature-rich object. You can re-
cursively descend through its nodes to extract information from the HTML file, extract hypertext
links, modify selected HTML elements, and then convert the whole thing back into printable HTML.
However, the same functionality is also available in a more flexible form in the HTML::Parser class,
which we cover later in this chapter. For details, see the HTML::TreeBuilder and HTML::Element
POD documentation.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Web Clients 224
Lines 1–6: Load modules—We bring in LWP, PromptUtil, HTML::FormatText, and the
HTML::TreeBuilder modules.
Lines 7–11: Set up request—We set up the HTTP::Request as we did in earlier iterations of this
script. Again, when required, we prompt the user for authentication information so the script is
made a subclass of LWP::UserAgent so that we can override the get_basic_creden-
tials() method.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Web Clients 225
Lines 12–14: Send the request—We send the request using the agent's request(), method.
However, instead of allowing LWP to leave the returned content in the HTTP::Response object
for retrieval, we give request() a second argument containing a reference to the proc-
ess_document() subroutine. This subroutine is responsible for parsing incoming HTML docu-
ments.
process_document() leaves the HTML parse tree, if any, in the global variable
$html_tree, which we declare here. After the request() is finished, we check the status of
the returned HTTP::Response object and die with an explanatory error message if the request
failed for some reason.
Lines 15–20: Format and print the HTML—If the requested document is HTML, then proc-
ess_document() has parsed it and left the tree in $html_tree. We check to see whether the
tree is nonempty. If so, we call its eof() method to tell the parser to finish, and pass the tree
to a newly created HTML::FormatText object to create a formatted string that we immediately
print. We are now done with the parse tree, so we call its delete() method.
As we shall see, process_document() prints all non-HTML documents immediately, so
there's no need to take further action for non-HTML documents.
Lines 21–29: The process_document () subroutine—LWP::UserAgent invokes call-backs
with three arguments consisting of the downloaded data, the current HTTP::Response object,
and an LWP::Protocol object.
We call the response object's content_type() method to get the MIME type of the incoming
document. If the type is text/html, then we pass the data to the parse tree's parse() method.
If necessary, we create the HTML::TreeBuilder first, using the ||= operator so that the call to
HTML::TreeBuilder->new() is executed only if the $html_tree variable is undefined.
If the content type is something other than text/html, then we immediately print the data. This is
a significant improvement to earlier versions of get_url.pl because it means that non-HTML data
starts to appear on standard output as soon as it arrives from the remote server.
Lines 30–38: The get_basic_credentials() subroutine—This is the same subroutine we
looked at in get_url2.pl.
This script does not check for the case in which the response does not provide a content type. Strictly
speaking it should do so, as the HTTP specification allows (but strongly discourages) Web servers
to omit this field. Run the script with the -w switch to detect and report this case. Useful enhance-
ments to get_url3.pl might include using HTML::FormatPS for printing support, or adapting the script
to use external viewers to display non-HTML MIME types the way we did in the pop_fetch.pl script
of Chapter 8.
has the name img and the two attributes src and alt.
In HTML, tags can be paired or unpaired. Paired tags enclose some content, which can be plain
text or can contain other tags. For example, this fragment of HTML
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Web Clients 226
consists of a paragraph section, starting with the <p> tag and ending with its mate, the </p> tag.
Between the two is a line of text, a portion of which is itself enclosed in a pair of <strong> tags
(indicating strongly emphatic text). HTML and XML both constrain which tags can occur within oth-
ers. For example, a <title> section, which designates some text as the title of a document, can
occur only in the <head> section of an HTML document, which in turn must occur in an <html>
section. See Figure 9.19 for a very minimal HTML document.
In addition to tags, an HTML document may contain comments, which are ignored by rendering
programs. Comments begin with the characters <!-- and end with --> as in:
<!-- ignore this -->
HTML files may also contain markup declarations, contained within the characters <! and >. These
provide meta-information to validators and parsers. The only HTML declaration you are likely to see
is the <!DOCTYPE ...> declaration at the top of the file that indicates the version of HTML the
document is (or claims to be) using. See the top of Figure 9.19 for an example.
Because the "<" and ">" symbols have special significance, all occurrences of these characters in
proper HTML have to be escaped to the "character entities" < and >, respectively. The
ampersand has to be escaped as well, to &. Many other character entities are used to represent
nonstandard symbols such as the copyright sign or the German umlaut.
XML syntax is a stricter and regularized version of HTMLs. Instead of allowing both paired and
unpaired tags, XML requires all tags to be paired. Tag and attribute names are case sensitive
(HTML's are not), and all attribute values must be enclosed by double quotes. If an element is empty,
meaning that there is nothing between the start and end tags, XML allows you to abbreviate this as
an "empty element" tag. This is a start tag that begins with <tagname and ends with />. As an
illustration of this, consider these two XML fragments, both of which have exactly the same meaning:
<img src="/icons/arrow.gif" alt="arrow"></img>
<img src="/icons/arrow.gif" alt="arrow" />
Using HTML::Parser
HTML::Parser is event driven. It parses through an HTML document, starting at the top and tra-
versing the tags and subtags in order until it reaches the end. To use it, you install handlers for
events that you are interested in processing, such as encountering a start tag. Your handler will be
called each time the desired event occurs.
Before we get heavily into the HTML::Parser, we'll look at a basic example. The print_links.pl script
parses the HTML document presented to it on the command line or standard input, extracts all the
links and images, and prints out their URLs. In the following example, we use get_url2.pl to fetch
the Google search engine's home page and pipe its output to print_links.pl:
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Web Clients 227
Lines 1–3: Load modules—After turning on strict syntax checking, we load HTML:: Parser. This
is the only module we need.
Lines 4–5: Create and initialize the parser object—We create a new HTML::Parser object by
calling its new() method. For reasons explained in the next section, we tell new() to use the
version 3 API by passing it the api_version argument.
After creating the parser, we configure it by calling its handler() method to install a handler
for start tag events. The start argument points to a reference to our print_link() subroutine;
this subroutine is invoked every time the parser encounters a start tag. The third argument to
handler() tells HTML:: Parser what arguments to pass to our handler when it is called. We
request that the parser pass print_link() the name of the tag (tagname) and a hash refer-
ence containing the tag's attributes (attr).
Lines 6–7: Parse standard input—We now call the parser's parse() method, passing it lines
read via the <> function. When we reach the end of file, we call the parser's eof() method to
tell it to finish up. The parse() and eof() methods behave identically to the HTML::TreeBuilder
methods we looked at earlier.
Lines 8–15: The print_link() callback—Most of the program logic occurs in
print_link(). This subroutine is called during the parse every time the parser encounters a
start tag. As we specified when we installed the handler, the parser passes the subroutine the
name of the tag and a hash reference containing the tag's attributes. Both the tag name and all
the attribute names are automatically transformed to lowercase letters, making it easier to deal
with the rampant variations in case used in most HTML.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Web Clients 228
We are interested only in hypertext links, the <a> tag, and inline images, the <img> tag. If the tag
name is "a", we print a line labeled "link:" followed by the contents of the href attribute. If, on the
other hand, the tag name is "img", we print "img:" followed by the contents of the src attribute. For
any other tag, we do nothing.
$parser = HTML::Parser->new(@options)
The new() method creates a new HTML::Parser. @options is a series of option/value pairs that change
various parser settings. The most used option is api_version, which can be "2" to create a version 2 parser, or
"3" to create a version 3 parser. For backward compatibility, if you do not specify any options new() creates
a version 2 parser.
Once the parser is created, you will call handler() one or more times to install handlers.
The event name is one of start, end, text, comment, declaration, process, or default. The first three
events are the most common. A start event is generated whenever the parser encounters a start
tag, such as <strong>. An end event is triggered when the parser encounters an end tag, such as
</strong>. text events are generated for the text between tags. The comment event is generated
for HTML comments. declaration and process events apply primarily to XML elements. Last, the
default event is a catchall for anything that is not explicitly handled elsewhere.
$args is a string containing a comma-delimited list of information that you want the parser to pass
to the handler. The information will be passed as subroutine arguments in the exact order that they
appear in the $args list. There are many possible arguments. Here are some of the most useful:
• tagname—the name of the tag
• text—the full text that triggered the event, including the markup delimiters
• dtext—decoded text, with markup removed and entities translated
• attr—a reference to a hash containing the tag attributes and values
• self—a copy of the HTML::Parser object itself
• "string" —the literal string (single or double quotes required!)
For example, this call causes the get_text() handler to be invoked every time the parser pro-
cesses some content text. The argument passed to the handler will be a three-element list that
contains the parser object, the literal string "TEXT", and the decoded content text:
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Web Clients 229
$parser->handler('text'=>\&get_text, "self,'TEXT',dtext");
• tagname is most useful in conjunction with start and end events. Tags are automatically down-
cased, so that <UL>, <ul>, and <Ul> are all given to the handler as "ul". In the case of end
tags, the "/" is suppressed, so that an end handler receives "ul" when a </ul> tag is encountered.
• dtext is used most often in conjunction with text events. It returns the nontag content of the
document, with all character entities translated to their proper values.
• The attr hash reference is useful only with start events. If requested for other events, the hash
reference will be empty.
Passing handler() a second argument of undef removes the handler for the specified event,
restoring the default behavior. An empty string causes the event to be ignored entirely.
$result = $parser->parse_file($file)
$result = $parser->parse($data)
$parser->eof
The parse_file(), parse(), and eof() methods work exactly as they do for HTML::TreeBuilder. A han-
dler that wishes to terminate parsing early can call the parser object's eof() method.
$bool = $parser->unbroken_text([$bool])
When processing chunks of content text, HTML::Parser ordinarily passes them to the text handler one chunk
at a time, breaking text at word boundaries. If unbroken_text() is set to a true value, this behavior changes
so that all the text between two tags is passed to the handler in a single operation. This can make some pattern
matches easier.
$bool = $parser->xml_mode([$bool])
The xml_mode() method puts the parser into a mode compatible with XML documents. This has two major
effects. First, it allows the empty element construct, <tagname/>. When the parser encounters a tag like this
one, it generates two events, a start event and an end event.
Second, XML mode disables the automatic conversion of tag and attribute names into lowercase. This is
because XML, unlike HTML, is case sensitive.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Web Clients 230
<OL>
<LI><A HREF="ref1">rfc name 1</A> - <STRONG>description 1</STRONG>
<LI><A HREF="ref2">rfc name 2</A> - <STRONG>description 2</STRONG>
...
</OL>
We want the parser to extract and print the text located within <A> and <STRONG> elements, but
only those located within an <OL> section. The text from other parts of the document, even those
in other <A> and <STRONG> elements, are to be ignored. The strategy that we will adopt is to have
the start handler detect when an <OL> tag has been encountered, and to install a text handler to
intercept and print the content of any subsequent <A> and <STRONG> elements. An end handler
will detect the </OL> tag, and remove the text handler, so that other text is not printed.
Figure 9.21 shows this new version, named search_rfc3.pl.
Licensed by
Stjepan Maric
4218908
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Web Clients 231
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Web Clients 232
Lines 1–5: Load modules—In addition to the LWP and HTTP::Request::Common modules, we
load HTML::Parser.
Lines 6–18: Set up search—We create an LWP::UserAgent and a new HTTP::Request in the
same way as in the previous incarnation of this script.
Lines 19–20: Create HTML::Parser—We create a new version 3 HTML::Parser object, and install
a handler for the start event. The handler will be the start() subroutine, and it will receive a
copy of the parser object and the name of the tag.
Lines 21–22: Issue request and parse—We call the user agent's request() method to process
the request. As in the print_links.pl script (Figure 9.20), we use a code reference as the second
argument to request() so that we can begin processing incoming data as soon as it arrives.
In this case, the code reference is an anonymous subroutine that invokes the parser's
parse() method.
After the request is finished, we call the parser's eof() method to have it finish up.
Line 23: Warn of error conditions—If the response object's is_success() method returns
false, we die with an error message. Otherwise, we do nothing: The parser callbacks are re-
sponsible for extracting and printing the relevant information from the document.
Lines 24–31: The start() subroutine—The start() subroutine is the callback for the start
event. It is called whenever the parser encounters a start tag. We begin by recovering the parser
object and the tag name from the stack. We need to remember the tag later when we are pro-
cessing text, so we stash it in the parser object under the key last-tag. (The HTML::Parser POD
documentation informs us that the parser is a blessed hash reference, and specifically invites
us to store information there in this manner.)
If the tag is anything other than "ol", we do nothing and just return. Otherwise, we install two new
handlers. One is a handler for the text event. It will be passed the parser object and the decoded
text. The other is a handler for the end event. Like start(), it will be passed the parser object
and the name of the end tag.
Lines 32–38: The end() subroutine—The end() subroutine is the handler for the end event.
It begins by resetting the last_tag key in the parser object. If the end tag isn't equal to "ol",
we just return, doing nothing. Otherwise, we set both the text and the end handlers to undef,
disabling them.
Lines 39–45: The extract() subroutine— extract() is the handler for the text event, and
is the place where the results from the search are extracted and printed. We get a copy of the
parser object and the decoded text on the subroutine call stack. After stripping whitespace from
the text, we examine the value of the last_tag key stored in the parser object. If the last tag is
"a", then we are in the <A> section that contains the name of the RFC. We print the text, followed
by a tab. If the last tag is "strong", then we are in the section of the document that contains the
title of the RFC. We print that, followed by a newline.
The new version of search_rfc.pl is more than twice as long as the original, but it adds no new
features, so what good is it? In this case, a full-blown parse of the search results document is overkill.
However, there will be cases when you need to parse a complex HTML document and regular
expressions will become too cumbersome to use. In these cases, HTML::Parser is a life saver.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Web Clients 233
As the script runs, it prints the local name for the image. For example, here's what happened when
I pointed the script at http://www.yahoo.com:
% mirror_images.pl http://www.yahoo.com
m5v2.gif: OK
messengerpromo.gif: OK
sm.gif: OK
Running it again immediately gives three "Not Modified" messages. Figure 9.22 gives the complete
code listing for the script.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Web Clients 234
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Web Clients 235
Lines 1–7: Load modules—We turn on strict syntax checking and load the LWP, PromptUtil,
HTTP::Cookies, HTML::Parser, and URI modules. The last module is used for its ability to re-
solve relative URLs into absolute URLs.
Lines 8–11: Create the user agent—We again use the trick of subclassing LWP::User Agent to
override the get_basic_credentials() method. The agent is stored in a variable named
$agent. Some of the remote sites we contact might require HTTP cookies, so we initialize an
HTTP::Cookies object on a file in our home directory and pass it to the agent's
cookie_jar() method. This allows the script to exchange cookies with the remote sites au-
tomatically.
Lines 12–15: Create the request and the parser—We enter a loop in which we shift URLs off the
command line and process them. For each URL, we create a new GET request using
HTTP::Request->new(), and an HTML::Parser object to parse the document as it comes in.
We install the subroutine start() as the parse handler for the start event. This handler will
receive a copy of the parser object, the name of the start tag, and a hash reference containing
the tag's attributes and their values.
Lines 16–24: Issue the request—We call the agent's request() method to issue the request,
returning a response object. As in the last example, we provide request() with a code refer-
ence as the second argument, causing the agent to pass the incoming data to this subroutine
as it arrives.
In this case, the code reference is an anonymous subroutine. We first check that the MIME type
of the response is text/html. If it isn't, we die with an error message. This doesn't cause the script
as a whole to die, but does abort processing of the current URL and leaves the error message
in a special X-Died: field of the response header.
Otherwise, the incoming document is parseable as an HTML file. Our handler is going to need
two pieces of extra information: the base URL of the current response for use in resolving relative
URLs, and the user agent object so that we can issue requests for inline images. We use the
same technique as in Figure 9.21, and stash this information into the parser's hash reference.
Lines 25–27: Warn of error conditions—After the request has finished, we check the response
for the existence of the X-Died: header and, if it exists, issue a warning. Likewise, we print the
response's status message if the is_success() method returns false.
Lines 28–37: The start() handler—The start() subroutine is invoked by the parser to han-
dle start tags. As called for by the argument list passed to handler(), the subroutine receives
a copy of the parser object, the name of the current tag, and a hash reference containing tag
attributes.
We check whether we are processing an <IMG> tag. If not, we return without taking further
action. We then check that the tag's src attribute is defined, and if so, copy it to a local variable.
The src attribute contains the URL of the inline image, and may be an absolute URL like
http://www. yahoo.com/images/messengerpromo.gif, or a relative one like images/
messengerpromo.gif. To fetch image source data, we must resolve relative URLs into absolute
URLs so that we can request them via the LWP user agent. We must also construct a local
filename for our copy of the image.
Absolutizing relative URLs is an easy task thanks to the URI module. The URI->new_abs()
method constructs a complete URL given a relative URL and a base. We obtain the base URL
of the document containing the image by retrieving the "base" key from the parser hash where
we stashed it earlier. This is passed to new_abs() along with the URL of the image (line 33),
obtaining an absolute URL. If the URL was already absolute, calling new_abs() doesn't hurt.
The method detects this fact and passes the URL through unchanged.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Web Clients 236
Constructing the local filename is a matter of extracting the filename part of the path (line 34),
using a pattern match to extract the rightmost component of the image URL.
We now call the user agent's mirror() method to copy the remote image to our local filesystem
and print the status message. Notice how we obtain a copy of the user agent from the parser
hash reference. This avoids having to create a new user agent.
Lines 38–46: The get_basic_credentials() method—This is identical to earlier versions.
There is a slight flaw in mirror_images.pl as it is now written. All images are mirrored to the same
directory, and no attempt is made to detect image name clashes between sites, or even within the
same site when the image paths are flattened (as might occur, for example, when mirroring remote
images named /images/whats_new.gif and /news/hot_news/whats_new.gif).
To make the script fully general, you might want to save each image in a separate subdirectory
named after the remote hostname and the path of the image within the site. We can do this relatively
painlessly by combining the URI host() and path() methods with the dirname() and
mkpath() functions imported from the File::Path and File::Basename modules. The relevant sec-
tion of start() would now look like this:
...
use File::Path 'mkpath';
use File::Basename 'dirname';
...
sub start {
...
my $remote_name = URI->new_abs($url,$parser->{base});
my $local_name = $remote_name->host . $remote_name->path;
mkpath(dirname($local_name),0,0711);
...
}
For the image URL http://www.yahoo. com/images/whats_new.gif, this will mirror the file
into the subdirectory http://www.yahoo.com/images.
Summary
The LWP module allows you to write scripts that act as World Wide Web clients. You can retrieve
Web pages, simulate the submission of fill-out forms, and easily negotiate more obscure aspects
of the HTTP protocol, such as cookies and user authentication.
The HTML-Formatter and HTML-Parser modules enhance LWP by giving you the ability to format
and parse HTML files. These modules allow you to transform HTML into text or postscript for printing,
and to extract interesting information from HTML files without resorting to error-prone regular ex-
pressions. As an added benefit, HTML::Parser can parse XML.
There's more to LWP than can be covered in a single book chapter. A good way to learn more about
the package is to examine the lwp-request, lwp-download, and lwp-rget scripts, and other examples
that come with the package.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
237
The next seven chapters lead you through the process of creating novel TCP-based network serv-
ices. We develop a variety of real-life applications and illustrate the tradeoffs to consider when
choosing server architectures.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
238
Although the simple TCP servers developed in Chapters 4 and 5 (Figures 4.2 and 5.4) are straight-
forward, they actually suffer a significant deficiency. Both of these servers work by servicing one
client at a time. While they are working on one client, other clients can't connect.1
Connection-oriented servers must overlap their I/O by providing some sort of concurrency among
the multiple sessions. This chapter discusses various techniques for doing so.
Forking Server
The server spends its time in an accept() loop. Each time a new incoming connection is accepted,
the server forks, creating an identical child process. The task of handling the child connection's I/O
is handed off to the child, and the parent goes back to listening for new connections (Figure 10.1).
When the child is finished handling the connection, it simply exits.
1Technically, they do connect, but the operating system queues them until the script calls accept (). No I/O can occur until the server has
finished servicing the previous connection.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Forking Servers and the inetd Daemon 239
In a forking server, the multitasking nature of the operating system allows parent and child to run
simultaneously. At any point in time there is a single parent process and multiple child processes,
each child dedicated to handling a different client connection.
This technique is available on platforms that implement fork(), all UNIX versions of Perl, and
version 5.6 and higher on Win32 platforms. The Macintosh port of Perl does not currently support
fork().
A special case of the multitasked server is the Inetd "super daemon," which can be used to write
simple concurrent servers without worrying too much about the details. We look at Inetd at the end
of this chapter.
Multithreaded Server
Next in complexity is multithreading. Conceptually similar to the previous solution, the server calls
accept() in a tight loop. Each time accept() returns a connected socket, the server launches a
new thread of execution to handle the client session. Threads are similar to processes, but threads
share the same memory and other resources. When the thread is done, it exits. In this model, there
are multiple simultaneous threads of execution, one handling the main accept() loop, and the
others handling client sessions.
This technique is available in Perl versions 5.005 and higher, and only on platforms that support
threads. The Windows version of Perl supports threads, as do many (but not all) UNIX versions.
MacPerl does not currently support multithreading. We discuss multithreading in Chapter 11.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Forking Servers and the inetd Daemon 240
Multiplexed Server
The most complex technique uses the select() function to interweave communications sessions.
This technique takes advantage of the fact that the network is slower than the CPU, and that most
of the time in a network server is spent waiting for a socket to become ready for reading or writing.
In this technique, the server creates and maintains a pool of filehandles, one for the listen socket,
and one for each connected client. Each time through the loop, the server checks the sockets using
the select() function to ascertain whether any is ready for reading and writing. If so, the server
handles the I/O for that socket and then goes back to waiting with select().
select() is available on all major Perl platforms. Unfortunately, the technique is also the trickiest
to use correctly. We discuss multiplexing in Chapters 12 and 13.
Built on top of these basic techniques are a number of variations, including preforking, thread pools,
and nonblocking I/O. There are also more esoteric methods for achieving concurrency, including
signal-driven I/O, asynchronous I/O, and others. We don't cover those here; for further information
Licensed by
on these techniques, see [Stevens 1998].
Stjepan Maric
Our running example for this and the next two chapters uses the Chatbot::Eliza module, John No-
lan's marvelous pure-Perl clone of Joseph Weizenbaum's classic psychotherapist simulation. Chat-
bot::Eliza works very simply. It accepts a line of input from the user, transforms it according to a
clever set of transformational rules, and echoes it back to the user in the form of a question. The
effect is a lot like talking to a particularly robotlike Freudian psychoanalyst.
4218908
The Chatbot::Eliza module is available on CPAN. It is simple to use. You load it, create a new
Chatbot::Eliza object with the new() method, and then invoke the object's command_inter-
face() method. This method conducts a conversation with the user, reading from standard input
and writing to standard output. The lines of code in Figure 10.2 are all that are needed to run a
psychotherapist session.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Forking Servers and the inetd Daemon 241
% eliza.pl
Eliza: Hello, I am a computer program.
you: I am a human being.
Eliza: How long have you been a human being?
you: About 39 years.
Eliza: That is interesting. Please continue.
you: It started when I was born...
Eliza: Why do you tell me you were born now?
you: Because you asked.
Eliza: We were discussing you -- not me.
you: bye
Eliza: Goodbye. It was nice talking to you.
The psychotherapist application calls for a connection-oriented interface because the interaction
between client and server is long-lived and because there is state associated with each connection.
The Chatbot::Eliza object keeps track of the history of the conversation in order to construct its
utterances based on the user's previous statements.
Zombies
We've already used fork(): In Chapter 2 we used it in a toy example to distribute the load of a
computation across two child processes (Figure 2.5), and in Chapter 5 we used it to avoid synchro-
nization and deadlock problems in the gab2.pl script (Figure 5.8).
One difference between those examples and the forking server examples in this chapter is the
relative longevity of the parent and the child processes. In the earlier examples, the parent process
does not survive the demise of its children for any significant length of time. The parent exits soon
after its children do.
In forking servers, however, the parent process is very long-lived. Web servers, for example, run
for months at a time. The children, however, live only as long as a client connection, and a server
may spawn thousands of children during its lifetime. Under this scenario, the issue of "zombie pro-
cesses" becomes important.
Once fork() is called, parent and child processes are almost, but not quite, free to go their own
ways. The UNIX system maintains a tenuous connection between the two processes. If the child
exits before the parent does, the child process does not disappear, but instead remains in the system
process table in a mummified form known as a "zombie." The zombie remains in the process table
for the sole purpose of being able to deliver its exit status code to the parent process when and if
the parent process asks for it using the wait() or waitpid() call, a process known as "reaping."
This is a limited form of IPC that allows the parent to find out whether a process it launched exited
successfully, and if not, why.
If a parent process forks a lot of children and does not reap them in a timely manner, zombie pro-
cesses accumulate in the process table, ultimately creating a virtual Night of the Living Dead, in
which the table fills up with defunct processes. Eventually, the parent process hits a system-imposed
limitation on the number of subprocesses it can launch, and subsequent calls to fork() fail.
To avoid this eventuality, any program that calls fork() must be prepared to reap its children by
calling wait() or waitpid() at regular intervals, preferably immediately after a child exits.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Forking Servers and the inetd Daemon 242
UNIX makes it convenient to call wait() or waitpid() at the appropriate time by providing the
CHLD signal. The CHLD signal is sent to a parent process whenever the state of any of its children
changes. Possible state changes include the child exiting (which is the event we're interested in)
and the child being suspended by a STOP signal. The CHLD signal does not provide information
beyond the bare-bones fact that some child's state changed. The parent must call wait() or
waitpid() to determine which child was affected, and if so, what happened to it.
$pid = wait ()
This function waits for any child process to exit and then returns the PID of the terminated child. If no child is
immediately ready for reaping, the call hangs (block) until there is one.
If you wish to determine whether the child exited normally or because of an error, you may examine the special
$? variable, which contains the child's exit status code. A code of 0 indicates that the child exited normally.
Anything else indicates an abnormal termination. See the perlvar POD page for information on how to interpret
the contents of $?.
$pid = waitpid ($pid, $flags)
This version waits for a particular child to exit and returns its PID, placing the exit status code in $?. If the child
named by $pid is not immediately available for reaping, waitpid() blocks until it is. To wait for any child to
be available as wait() does, use a $pid argument of -1.
The behavior of waitpid() can be modified by the $flags argument. There are a number of
handy constants defined in the :sys_wait_h group of the standard POSIX module. These con-
stants can be bitwise ORed together to combine them. The most frequently used flag is WNOHANG,
which, if present, puts waitpid() into nonblocking mode. waitpid() returns the PID of the child
process if available; if no children are available, it returns -1 and waitpid() blocks waiting for
them. Another occasionally useful flag is WUNTRACED, which tells waitpid() to return the PIDs of
stopped children as well as terminated ones.
The effect of this is to call wait() every time the server receives a CHLD signal, immediately reaping
the child and ignoring its result code. This code works most of the time, but there are a number of
unusual situations that will break it. One such event is when a child is stopped or restarted by a
signal. In this case, the parent gets a CHLD signal, but no child has actually exited. The wait() call
stalls indefinitely, bringing the server to a halt—not at all a desirable state of affairs.
Another event that can break this simple signal handler is the nearly simultaneous termination of
two or more children. The UNIX signal mechanism can deal with only one signal of a particular type
at a time. The two termination events are bundled into a single CHLD event and delivered to the
server. Although two children need to be reaped, the server calls wait() only once, leaving an
unreaped zombie. This "zombie leak" becomes noticeable after a sufficiently long period of time.
The last undesirable situation occurs when the parent process makes calls that spawn subpro-
cesses, including the backtick operator (`), the system() function, and piped open()s. For these
functions Perl takes care of calling wait() for you before returning to the main body of the code.
On some platforms, however, extraneous CHLD signals leak through even though there's no
unreaped child to wait for. The wait() call again hangs.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Forking Servers and the inetd Daemon 243
The solution to these three problems is to call waitpid() with a PID of -1 and a flag of WNO-
HANG. The first argument tells waitpid() to reap any available child. The second argument pre-
vents the call from hanging if no children are available for reaping. To avoid leaking zombies, you
should call waitpid() in a loop until it indicates, by returning a result code of -1, that there are no
more children to reap.
Here's the idiom:
use POSIX 'WNOHANG';
$SIG{CHLD} = \&reaper;
sub reaper {
while ((my $kid = waitpid(-1,WNOHANG)) > 0) {
warn "Reaped child with PID $kid\n";
}
}
In this case we print the PID of the reaped child for the purpose of debugging. In many cases you
will ignore the child PID, but in others you'll want to examine the child PID and status code and
perform some action in case of a child that exited abnormally. We'll see examples of this in later
sections.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Forking Servers and the inetd Daemon 244
Lines 1–5: Bring in modules—We begin by loading the Chatbot::Eliza and IO::Socket modules,
and importing the WNOHANG constant from the POSIX module. We also define the port our server
will listen to, in this case 12000.
Lines 6–7: Define constants and variables—We define the default port to bind to, and initialize
a global variable, $quit to false. When this variable becomes true, the main server loop exits.
Lines 8–11: Install signal handlers—We install a signal handler for CHLD events using a variant
of the waitpid() idiom previously discussed.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Forking Servers and the inetd Daemon 245
We want the server to clean up gracefully after interruption from the command line, so we create
an INT handler. This handler just sets $quit to true and returns.
Lines 12–19: Create listening socket—We create a new listening socket by calling
IO::Socket::INET->new() with the LocalPort and Listen arguments. We also specify a
PROTO argument of "tcp" and a true value for Reuse, allowing this server to be killed and
relaunched without the otherwise mandatory wait for the port to be freed.
In addition to these standard arguments, we declare a Timeout of 1 hour. As we did in the reverse
echo server of Figure 5.4, this is done in order to make accept() interruptable by signals. We
want accept() to return prematurely when interrupted by INT so that we can check the status
of $quit.
Lines 20–21: Accept incoming connections—We now enter a while() loop. Each time through
the loop we call accept() to get an IO::Socket object connected to a new client.
Lines 22–27: Fork: child handles connection—Once accept() returns, instead of talking di-
rectly to the connected socket, we immediately call fork() and save the result code in the
variable $child. If $child is undefined, then the fork() failed for some reason and we die
with an error message.
Otherwise, if the value of $child is equal to numeric 0, then we know we are inside the child
process and will be responsible for handling the communications session. As the child, we will
not call accept() again, so we close our copy of the listening socket. This closing is not strictly
necessary, but it's always a good idea to tidy up unneeded resources, and it avoids the possibility
of the child inadvertently trying to perform operations on the listen socket.
We now call a subroutine named interact(), passing it the connected socket object. inter-
act() manages the Eliza conversation and returns when the user terminates the connection
(by typing "bye" for example). After interact() returns, the child terminates by calling
exit().
Lines 28–29: Parent cleans up—If $child was nonzero, then we are the parent process. In this
case, we just close our copy of the connected socket and go back to the top of the loop to
accept() another connection. While we are waiting for a new connection, the child is taking
care of the old one.
Lines 30–38: interact() subroutine—The interact() subroutine is called by the child
process to handle all interaction between the client and the server. The Chatbot::Eliza com-
mand_interface() method is hardwired to read from STDIN and write to STDOUT. But we
want it to read and write to the socket instead.
This is actually an easy problem to finesse. When we loaded IO::Socket, it also brought in methods
from its parent class, IO::Handle. Among these methods is fdopen(), which we looked at in
Chapter 1 (The IO::Handle and IO::File Modules). The fdopen() method closes an existing file-
handle and then reopens it using information from another filehandle that you give it. It works with
any filehandle, including the standard three. We call fdopen() three times, once each for STDIN,
STDOUT, and STDERR. Each time we call fdopen(), we pass it the socket object and a symbolic
access mode. STDIN is reopened for reading with a mode of <, while STDOUT and STDERR are each
reopened for writing with a mode of >. Now, almost as if by magic, writing to STDOUT and
STDERR send data flying down the connected socket, and reading from STDIN performs a read on
the socket.
The remainder of this subroutine is identical to the command-line psychotherapist from Figure
10.2. We turn on autoflush mode so that data sent to STDOUT by Chatbot::Eliza is immediately visible
at the remote end, create a new Chatbot::Eliza object, and call its command_interface() method.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Forking Servers and the inetd Daemon 246
With the server running, you may confirm that you can telnet to port 12000 and regale the psychiatrist
simulation with your personal hang-ups and pet peeves. Without closing the previous session, open
a new window and telnet to port 12000 again. You should be able to conduct multiple simultaneous
sessions. Run the ps program to confirm that each active session creates a new copy of the server,
and that when you close a session, the corresponding process disappears immediately without
leaving a zombie. When you are done, kill the server with ^C.
The following is an example of the server in action:
% telnet localhost 12000
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
Eliza: How do you do. Please tell me your problem.
you: hey this really works!
Eliza: I'm not sure I understand you fully.
you: you are a server
Eliza: Perhaps you would like to be a server.
you: no
Eliza: Why not?
you: quit
Eliza: Goodbye. It was nice talking to you.
Connection closed by foreign host.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Forking Servers and the inetd Daemon 247
is appropriate (which happens to be the logical newline " \n " character). For another, the "you:"
prompt that the server transmits after each utterance does not end with a newline. The combined
effect of these problems is that when we point gab2.pl at the psychotherapist server's port, we see
no output.
What we need is a more general bytestream-oriented client that reads and writes its data in arbitrary
chunks as they become ready, rather than waiting for complete lines. As it turns out, very few mod-
ifications to gab2.pl are needed to turn it into this type of client.
Figure 10.4 shows the revised script, gab3.pl. The significant changes are in the
user_to_host() and host_to_user() subroutines. Instead of the line-oriented read and write
calls of the earlier version, these subroutines now consist of tight loops using sysread() and
syswrite(). For example, here is the code fragment from host-to-user() that reads from the
socket and writes to STDOUT:
Similarly, the user_to_host() subroutine, which is responsible for copying user data to the
socket, is modified to look like this:
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Forking Servers and the inetd Daemon 248
The BUFSIZE parameter is relatively arbitrary. For performance it should be roughly as large as the
largest chunk of text that can be emitted by the psychotherapist server, but it will work just fine if it's
smaller or larger. In this case, I chose 1024 for the constant, which seems to work pretty well.
The significance of using sysread() here rather than read(), its buffered alternative, is that
sysread() allows partial reads. If there are no BUFSIZE bytes ready to be read, sysread()
returns whatever is available. read(), in contrast, blocks while waiting to satisfy the request, de-
laying the psychotherapist's responses indefinitely. The same argument doesn't apply to sys-
write(), versus print(), however. Since IO::Socket objects are autoflushed by default, sys-
write() and print() have exactly the same effect.
Notes on gab3.pl
After developing gab3.pl, I was interested in how it performed relative to the send-wait-read version
of Figure 5.6 (gab1.pl) and the forking line-oriented version of Figure 5.7 ( gab2.pl). To do this, I
timed the three scripts while transmitting a large text file to a conventional echo server. This test
allowed both the line-oriented scripts and the byte-stream script to function properly.
Relative to gab1.pl, I found an approximately 5-fold increase in speed, and relative to gab2.pl, a 1.5-
fold increase. The big efficiency gain when switching from the single to the multitasking design was
dramatic, and represents the fact that the multitasking design keeps the network pipe full and running
in both directions simultaneously, while the send-wait-read design uses only half the bandwidth at
any time, and waits to receive the entire transmission before sending a response.
Another interesting benchmark result is that when I tried replacing the built-in calls to sys-
write() and sysread() in gab3.pl with their object-oriented wrappers, I found a 20 percent
decrease in efficiency, reflecting Perl's method call overhead. This probably won't make a significant
difference in most networking applications, which are dominated by network speeds, but is worth
keeping in mind for those tight inner loops where efficiency is critical.
As an aside, while testing gab3.pl with eliza_server.pl, I discovered an apparent bug in the Eliza
module's command_interface() method. When it reads a line of input from STDIN, it never
checks for end of file. As a result, if you terminate the connection at the client side, com-
mand_interface() goes into a very unattractive infinite loop that wastes CPU time.
The easy solution is to replace the Chatbot::Eliza_testquit() method, which checks the
input string for words like "quit" and "bye." By checking whether the string is undefined, _test-
quit() can detect end of file. Insert this definition somewhere near the bottom of the Eliza server:
sub Chatbot::Eliza::_testquit {
my ($self,$string) = @_;
return 1 unless defined $string; # test for EOF
foreach (@{$self->{quit}}) { return 1 if $string =~ /\b$_\b/i };
}
The server will now detect and respond correctly to the end-of-file condition.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Forking Servers and the inetd Daemon 249
Autobackgrounding
In this section, we develop a routine for autobackgrounding network daemons and performing tasks
1 through 4. In Chapter 16 we discuss techniques for implementing items 5 through 7.
Figure 10.5 lists the become_daemon() subroutine, which a server process should call very early
during its initialization phase. This subroutine uses a standard UNIX trick for backgrounding and
dissociating from the controlling terminal. It forks itself (line 2) and the parent process exits, leaving
only the child in control.
2This discussion relies heavily on the UNIX process model, and will not translate to Macintosh or Windows systems. Windows NT and 2000
users can turn Perl scripts into background services using a utility called srvany.exe. See the section Background on Windows and Macintosh
Systems later in this chapter.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Forking Servers and the inetd Daemon 250
Licensed by
The child process now starts a new process session by calling the setsid() function, provided by
the POSIX module (line 4). A session is a set of processes that share the same terminal. At any
given time only one member of the set has the privilege of reading and writing to the terminal and
is said to be in the foreground, while other members of the group remain in the background (and if
Stjepan Maric
they try to do I/O to the terminal, are suspended until they are brought to the foreground). This
system is used by command shells to implement job control.
A session group is related, but not identical, to a process group. A process group is the set of
subprocesses that have been launched by a single parent, and is an integer corresponding to the
4218908
PID of the group's shared ancestor. You can use the Perl getpgrp() function to fetch the process
group for a particular process, and pass kill() the negative of a process group to send the same
signal simultaneously to all members of the group. This is how the shell does it when sending a
HUP signal to all its subprocesses just prior to exiting. A newly forked child belongs to the same
session group and process group as its parent.
setsid() does several things. It creates both a new session and a new process group, and makes
the current process the session leader. At the same time, it dissociates the current process from
the controlling terminal. The effect is to make the child process completely independent of the shell.
setsid() fails if the process is a session leader at the time the function is called (i.e., is in the
foreground), but the earlier fork ensures that this is not the case.
After calling setsid(), we reopen the STDIN and STDOUT filehandles onto the "do nothing" special
device, /dev/null, and make STDERR a copy of STDOUT (lines 5–7). This maneuver prevents output
from the daemon from appearing on the terminal. It then calls chdir() to change the current work-
ing directory to the root filesystem, resets the file creation mask to 0, and sets the PATH environment
variable to a small number of standard directories (line 10). We return the new process ID from the
$$ global. Because we forked, the process ID is now different from its value when the subroutine
was called, and returning the new PID explicitly in this way is a good way to remind ourselves of
that fact.
There are a number of variations on the become_daemon() subroutine. Stevens [1998] recom-
mends forking not once but twice, warning that otherwise it is possible for the first child to reacquire
a controlling terminal by deliberately reopening the /dev/tty device. However, this event is unlikely,
and few production servers do this.
Instead of reopening the standard filehandles onto /dev/null, you may want to simply close them:
close $_ foreach (\*STDIN,\*STDOUT,\*STDERR);
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Forking Servers and the inetd Daemon 251
However, this strategy may confuse subprocesses that expect the standard filehandles to be open,
so it is best avoided.
Finally, a few older UNIX systems, such as ULTRIX, do not have a working setsid(). On such
systems, the call to setsid() returns a run-time error. On such systems, you can use the
Proc::Daemon module, available on CPAN, which contains the appropriate workarounds.
PID Files
Another feature we can add at this time is a PID file for the psychotherapist server. By convention,
servers and other system daemons write their process IDs into a file named something like /var/run/
servername.pid. Before exiting, the server removes the file. This allows the system administrator
and other users to send signals to the daemon via this shortcut:
kill -TERM `cat /var/run/servername.pid`
A clever daemon checks for the existence of this file during startup, and refuses to run if the file
exists, which might indicate that the server is already running. Very clever daemons go one step
further, and check that the process referred to by the PID file is still running. It is possible that a
previous server crashed or was killed before it had a chance to remove the file. The
open_pid_file() subroutine listed in Figure 10.6 implements this strategy.
Lines 1–3: Check whether old PID file exists— open_pid_file() is called with the path to the
PID file. Our first action is to apply the -e file test to the file to determine whether it already exists.
Lines 4–6: Check whether old PID file is valid—If the PID file exists, we go on to check whether
the process it indicates is still running. We use IO::File to open the old PID file and read the
numeric PID from it. To determine if this process is still running, we use kill() to send signal
number 0 to the indicated process. This special signal number 0 doesn't actually send a signal,
but instead returns true if the indicated process (or process group) can receive signals. If
kill() returns true, we know that the process is still running and exit with an error message.
Otherwise, if kill() returns false, then we know that the previous server process either exited
uncleanly without cleaning up its PID file, or that it is running under a different user ID and the
current process lacks the privileges to send the signal. We ignore this latter case, assuming that
the server is always launched by the same user. If this assumption is false, then our attempt to
unlink the old PID file in the next step will fail and no harm will be done.
Lines 7–9: Unlink old PID file—We write a warning to standard error and attempt to unlink the
old PID file, first checking with the -w file test operator that it is writable. If either the -w test or
the unlink() fail, we abort.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Forking Servers and the inetd Daemon 252
Lines 10–12: Create new PID file—The last two steps are to create a new PID file and open it
for writing. We call IO::File->new() with a combination of flags that creates the file and
opens it, but only if it does not previously exist. This prevents the file from being clobbered in
the event that the server is launched twice in quick succession, both instances check for the PID
file and find it absent, and both try to create a new PID file at about the same time. If successful,
we return the open filehandle to the caller.
open_pid_file() should be invoked before autobackgrounding the server. This gives it a chance
to issue error messages before standard error is closed. The caller should then call become_dae-
mon() to get the new process ID, and write that PID to the PID file using the filehandle returned by
open_pid_file(). Here's the complete idiom:
use constant PID_FILE => '/var/run/servername.pid';
$SIG{TERM} = $SIG{INT} = sub { exit 0; }
my $fh = open_pid_file(PID_FILE);
my $pid = become_daemon();
print $fh $pid;
close $fh;
By convention, the /var/run directory is used by many UNIX systems to write PID files for running
daemons. Solaris systems use /etc or /usr/local/etc.
The END{} block guarantees that the server will remove the PID file before it exits. The file is un-
linked only if the current process ID matches the process ID returned by become_daemon(). This
prevents any of the server's children from inadvertently unlinking the file.
The reason for installing signal handlers for the TERM and INT signals is to ensure that the program
exits normally when it receives these signals. Otherwise, the END{} block would not be executed
and the PID file would remain around after the server had exited.
Figure 10.7 puts all these techniques together in a new and improved forking server, eliza_dae-
mon.pl. There should be no surprises in this code, with the minor exception that instead of placing
the PID file inside the standard /var/run directory, the example uses /var/tmp/eliza.pid. /var/run is a
privileged directory, and to write into it we would have to be running with root privileges. However,
this carries security implications that are not discussed until Chapter 16. It is not a particularly good
idea for a root process to write into a world-writable directory such as /var/tmp for reasons discussed
in that chapter, but there's no problem doing so as an unprivileged user. This script also incorporates
the fix to the Chatbot::Eliza::_testquit() subroutine discussed earlier.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Forking Servers and the inetd Daemon 253
Figure 10.7. The Eliza server (forking version) with daemon code
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Forking Servers and the inetd Daemon 254
Another point is that we create the listen socket before calling become_daemon(). This gives us
a chance to die with an error message before become_daemon() closes standard error. Chapter
16 discusses how daemons can log errors to a file or via the syslog system.
This is a bit of Bourne shell-scripting language which says that if the file /usr/sbin/xntpd exists and
is executable, then echo a message to the console and run the program.
To start our eliza_daemon.pl script at boot time, we would add a section like this one:
# start psychotherapist server
if [ -x /usr/local/bin/eliza_daemon.pl ]; then
echo "Starting psychotherapist server..."
/usr/local/bin/eliza_daemon.pl
fi
This assumes that eliza_daemon.pl has been installed into the /usr/local/bin directory. Before you
reboot your system, you should try executing this fragment of the shell script a few times to make
sure you've got it right.
The other boot script organizational style found on UNIX systems is derived from the AT&T family
of UNIX. In this style, startup scripts are sorted into subdirectories with names like rc0.d, rc1.d, and
so on. Depending on the operating system, these directories may be located in /etc, /etc/rc.d, or /
sbin. Each subdirectory is named after a runlevel, which controls the level of service the system will
provide. For example, in runlevel 1 (corresponding to directory rc1.d) the system may provide single-
user services only, blocking all network logins, and in runlevel 3 (rc3.d) it may allow full network
login, filesharing, and a host of other multiuser services.
You will need to determine what runlevel your system commonly runs at. This can be done by
examining /etc/inittab for the initdefault entry, or by running the runlevel command if your system
provides one.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Forking Servers and the inetd Daemon 255
The next step is to enter the rc*.d directory that corresponds to this runlevel. You will see a host of
scripts with names that begin with either "S" or "K"— for example, S15nfs.server and K20lp. The
scripts that begin with "S" correspond to services that are started when the system enters that
runlevel; those that begin with "K" are services that the system kills when it leaves the runlevel. On
Solaris systems, S15nfs.server starts up NFS filesharing services, and K20lp shuts down line print-
ing. The numbers in the script name are used to control the order in which the startup scripts are
executed, since the boot system sorts the scripts alphabetically before invoking them.
Frequently, the scripts are just symbolic links to general-purpose shell scripts that can start and stop
the service. The real script is located in a directory named init.d, located variously in /etc/init.d, /etc/
rc.d/init.d, or /sbin/init.d. When the boot system wants to launch or kill the service, it follows the link
to its location, and then invokes the script with the arguments 'start' or 'stop'.
On systems with AT&T-style boot scripts, the strategy again is to see how another service already
does it, clone and rename its startup script, and then modify it to invoke your script at startup time.
Here is an extremely simple script that can be used on many systems to start and stop the psycho-
therapist daemon. Name it eliza, make it executable, and store it in /etc/init.d (or whatever is the
proper location for such scripts on your system). Then create a link from /etc/r3.d/S20eliza to this
script, again modifying the exact path as appropriate for your operating system.
#!/bin/sh
# psychotherapist startup script
case "$1" in
'start')
if [ -x /usr/local/bin/eliza_daemon.pl ]; then
echo -n "Starting psychotherapist: "
/usr/local/bin/eliza_daemon.pl
fi
;;
'stop')
if [ -e /var/tmp/eliza.pid ]; then
echo -n "Shutting down psychotherapist"
kill -TERM `cat /var/tmp/eliza.pid`
fi
;;
*)
echo "usage: $0 {start|stop}"
;;
esac
Again, it's a good idea to test this script from the command line before committing it to your boot
scripts directory.
One thing to watch for is that the boot scripts run as the superuser, so your network application also
runs with superuser privileges. This is generally an undesirable feature. Chapter 14 describes how
scripts started with superuser privileges can relinquish those privileges to become an ordinary user.
Alternatively, you can use the su command to launch the script using the privileges of an ordinary
user. In the two shell scripts mentioned, replace the calls to /usr/local/bin/eliza_daemon.pl with:
su nobody -c /usr/local/bin/eliza_daemon.pl
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Forking Servers and the inetd Daemon 256
(Under MacPerl, the $^E global returns Macintosh-specific error information.) To show the appli-
cation again, launch MacPerl, bringing it to the foreground.
Microsoft Windows offers a more generic way to turn applications into background daemons, using
its system of "services." Services are available only on Windows NT and 2000 systems. To do this,
you need two utilities: instsrv.exe and srvany.exe. These utilities are not part of the standard Win-
dows NT/2000 distributions, but are add-ons provided by the Windows NT/2000 Resource Kits.
There are two steps to the process. In the first step, you use instsrv.exe to define the name of the
new service. In the second step, you use the registry editor to associate the newly defined service
with the name and command-line arguments of the Perl script.
The first step is to define the new service using instsrv.exe. From the DOS command window, type
the following:
% C:\rkit\instsrv.exe PSYCHOTHERAPIST C:\rkit\srvany.exe
Replace C:\rkit with the path of the actual instsrv.exe and srvany.exe files, and PSYCHOTHERA-
PIST with the name that you wish to use when referring to the network service. The next step is to
edit the registry using the Registry Editor. The usual caveats and dire warnings apply to this process.
Launch regedt32.exe and locate the following key:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\PSYCHOTHERAPIST
Modify this as appropriate for the service name you selected earlier. Now you add a key named
Parameters, and two subkeys named Application and AppParameters. Application contains the path
to the Perl executable, and AppParameters contains the arguments passed to Perl, including the
script name and any script arguments.
Click on the PSYCHOTHERAPIST key and choose Add Key from the Edit menu. When prompted,
enter a key name of Parameters and leave the class field blank. Now select the newly created
Parameters key and invoke Add Value from the Edit menu. When prompted, enter a value name of
Application, a data type of REG_SZ (a null-terminated string), and a string containing the correct
path to the Perl executable, such as C:\Perl\bin\perl5.6.0.exe.
Select the Parameters key once again and invoke Add Value. This time enter a value name of
AppParameters, a data type of REG_SZ, and a value containing the complete path of the script and
any arguments you wish to pass to it, for example C:\scripts\eliza_server.pl.3
3Don't use the version of the server that autobackgrounds itself and dissociates from the session group, because these tricks are UNIX specific.
Use the forking server from Figure 10.3 with the interact () subroutine modified for Windows systems.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Forking Servers and the inetd Daemon 257
Close the Registry Editor. You should now be able to go to the Services control panel and set it to
start automatically at system startup time. From the list of NT/2000 services, select the psycho-
therapist server, and press the button labeled Startup. When prompted, change the startup type to
Automatic, and set the LogOnAs field to the name of the user you wish the server to run as. A
common choice is "System Acount." Also clear the checkbox labeled "Allow service to interact with
users."
The Services control panel allows you to manually start and stop the server. If you prefer, you can
use the DOS commands NET START PSYCHOTHERAPIST and NET STOP PSYCHOTHERA-
PIST to the same effect.
The psychotherapist daemon is pretty generic in its handling of incoming connections and forking.
In fact, interact() is the only place where application-specific code appears.
Now consider this version of interact():
sub interact {
my $sock = shift;
STDIN->fdopen($sock,"<") or die "Can't reopen STDIN: $!";
STDOUT->fdopen($sock,">") or die "Can't reopen STDOUT: $!";
STDERR->fdopen($sock,">") or die "Can't reopen STDERR: $!";
exec "eliza.pl";
}
After reopening STDIN, STDOUT, and STDERR onto the socket, we simply exec() the original com-
mand-line eliza.pl script from Figure 10.2. Assuming that eliza.pl is on the command path, Perl
launches it and replaces the current process with the new one. The command-line version of
eliza.pl runs, reading user input from STDIN and sending the psychotherapist's responses to
STDOUT. But STDIN, STDOUT, and STDERR are inherited from the parent process, so the program
is actually reading and writing to the socket. We've converted a command-line program into a server
application without changing a line of source code!
In fact, we can make this even more general by adding arguments to interact() that contain the
name and command-line arguments of a command to execute:
sub interact {
my ($sock,@command) = @>_;
STDIN->fdopen($sock,"<") or die "Can't reopen STDIN: $!";
STDOUT->fdopen($sock,">") or die "Can't reopen STDOUT: $!";
STDERR->fdopen($sock,">") or die "Can't reopen STDERR: $!";
exec @command;
}
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Forking Servers and the inetd Daemon 258
Now any program that reads from STDIN and writes to STDOUT can be run as a server. For example,
on UNIX systems, you could rig up a simple echo server just by passing /bin/cat as the argument
to interact(). Since cat reads from STDIN and writes a copy to STDOUT, it will echo everything
it reads from the socket back to the peer.
This simple way of creating network servers has not gone unnoticed by operating system designers.
UNIX (and Linux) systems have a standard daemon called inetd, which is little more than a config-
urable version of this generic server capable of launching and running a variety of network services
on demand.
inetd is launched at boot time. It reads a configuration file named /etc/inetd.conf, which is essentially
a list of ports to monitor and programs to associate with each port. When a connection comes in to
one of its monitored ports, inetd calls accept() to get a new connected socket, forks, remaps the
three standard filehandles to the socket, and finally launches the appropriate program.
The advantage of this system is that instead of launching a dozen occasionally used services man-
ually or at boot time, inetd launches them only when they are needed. Another nice feature of
inetd is that it can be reconfigured on the fly by sending it a HUP signal. When such a signal arrives,
it rereads its configuration file and reconfigures itself if needed. This allows you to add services
without rebooting the machine.
Unfortunately, inetd is not standard on Win32 or Macintosh machines. For Windows, you can get
inetd lookalikes at the following locations:
• Cygnus Win32 tools
ftp://go.cygnus.com/pub/ftp.cygnus.com/gnu-win32
http://www.cygnus.com/misc/gnu-win32/ [http://www.cygnus.com/misc/gnu-win32]
• Ockham Technology inetd for Windows NT (commercial)
http://www.ockham.be/inetd.html
Many years ago I used an inetd lookalike for the Macintosh, which used Apple Events to simulate
a true inetd daemon, but it no longer seems to be available on the Web.
Using inetd
With inetd we can turn the command-line psychotherapist program of Figure 10.2 into a server
without changing a line of code. Just add the following line to the bottom of the /etc/inetd.conf con-
figuration file:
12000 stream tcp nowait nobody /usr/local/bin/eliza.pl eliza.pl
You must have superuser access to edit this file. If there is no account named nobody, replace it
with your login name (or another of your choosing). Adjust the path to the eliza.pl script to reflect its
actual location (I suggest you use a version of the script that includes the _testquit() patch
described earlier). When you're done editing the file, restart the inetd daemon by sending it a HUP
signal. You can do this by finding its process ID (PID) using the ps command and then using the
kill command to send the signal. For example:
% ps aux | grep inetd
root 657 0.0 0.8 1220 552 ? S 07:07 0:00 inetd
lstein 914 0.0 0.5 948 352 pts/1 S 08:07 0:00 grep inetd
% kill -HUP 657
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Forking Servers and the inetd Daemon 259
Now you can use either the standard telnet program or the gab3.pl client developed earlier in this
chapter to talk to the psychotherapist server.
Let's look at the inetd.conf entry in more detail. It's divided into seven fields delimited by whitespace
(tabs or spaces):
12000 —This is the service name or port number that the server will listen to. Be sure to check that
your system isn't already using a port before you take it (you can use the netstat program for this
purpose).
Some versions of inetd require you to use a symbolic service name in this field, such as eliza rather
than 12000. On such systems, you must manually edit the file /etc/services, add the name and port
number you desire, and then use that symbolic name in inetd.conf. For the psychotherapist daemon,
an appropriate /etc/services line would be:
eliza 12000/tcp
We would then use eliza instead of the port number as the first field in inetd.conf.
stream —This field specifies the server type, and can be either stream for connection-oriented
services that send and receive data as continuous streams of data, or dgram for services that send
and receive connectionless messages. Any program that reads STDIN and writes STDOUT is a
stream-based service, so we use stream here.
tcp —This specifies the communications protocol, and may be either tcp or udp (many systems
also support more esoteric protocols, but we won't discuss them here). Stream-based services use
tcp.
nowait —This tells inetd what to do after launching the server program. It can be wait, to tell
inetd to wait until the server is done before launching the program again to handle a new incoming
connection, or nowait, which allows inetd to launch the program multiple times to handle several
incoming connections at once. The most typical value for stream-based services is nowait, which
makes inetd act as a forking server. If multiple clients connect simultaneously, inetd launches a copy
of the program to deal with each one. Some versions of inetd allow you to put a ceiling on the number
of processes that can run simultaneously.
/usr/local/bin/eliza.pl—This is the full path to the program.
eliza.pl—The seventh and subsequent fields are command-line arguments to pass to the script.
This can be any number of space-delimited command-line arguments and switches. By convention,
the first argument is the name of the program. You can use the actual script name, as shown here,
or make up a different name. This value shows up in the script in the $0 variable. Other command-
line arguments appear in the @ARGV array in the usual manner.
The main "gotcha" with inetd-launched programs is that stdio buffering may cause the data to flow
unpredictably. For example, you might not see the psychotherapist's initial greeting until the program
has output a few more lines of text. This is solved by turning on autoflush, as we did in Figure 10.2.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Forking Servers and the inetd Daemon 260
A nice compromise between convenience and performance is to use inetd in wait mode. In this
mode it launches your server when the first incoming connection arrives, and waits for the server
to finish. Your server will do everything an ordinary server does, including forking to handle new
connections. The only difference is that it will not create the listening socket itself, but inherit it from
one created by inetd. Since inetd duplicates the socket onto the three standard filehandles, you can
recover it from any one of them, typically STDIN.
inetd thus nicely relieves you of the responsibility of launching the server by hand without incurring
a performance penalty. In addition, you can write the server to exit under certain conditions—for
example, if it has been idle for a certain number of minutes, or after servicing a set number of
connections. After it exits, inetd will relaunch it when it is next needed. This means that you need
not keep an occasionally used server running all the time.
A new version of the psychotherapist server designed to be run from inetd in wait mode is given in
Figure 10.8. The corresponding entry in /etc/inetd.conf is almost identical to the original, except that
it uses wait in the fourth field and has a different script name in the sixth field:
Licensed by
Stjepan Maric
4218908
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Forking Servers and the inetd Daemon 261
In addition to inheriting its listening socket from inetd, this server differs from previous versions in
having a one-minute timeout on the call to accept(). If no new connections arrive within the timeout
period, the parent process exits. inetd will relaunch the server again if needed. The changes required
to the basic forking server are small.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Forking Servers and the inetd Daemon 262
Lines 1–7: Define timeout values—We recover the timeout from the command line, or default to
one minute if no value is supplied. Notice that we no longer read the port number from the
command line; it is supplied implicitly by inetd.
Lines 10–13: Recover the listening socket—We recover the listening socket from STDIN. First
we check that we are indeed running under inetd by testing that STDIN is a socket using the -S
file test. If STDIN passes this test, we turn it back into an IO::Socket object by calling IO::Socket's
new_from_fd() method. This method, inherited from IO::Handle, is similar to fdopen() ex-
cept that instead of reopening an existing handle on the specified filehandle, it creates a new
handle that is a copy of the old one. In this case, we create a new IO::Socket object that is a
copy of STDIN, opened for reading and writing with the "+<" mode.
Lines 15–21: Call accept() with a timeout—We now enter a standard accept() loop, except
that the call to accept() is wrapped in an eval{} block. Within the eval, we create a local
ALRM signal handler that calls die(), and use alarm() to set a timer that will go off after
$timeout minutes have expired. We then call the listen socket's accept() method. If an in-
coming connection is received before the timeout expires, then the result from the eval{} block
is the connected socket. Otherwise, the ALARM signal handler is called, and the eval{} block
is aborted, returning an undefined value. In the latter case, we call exit(), terminating the
whole server. Otherwise, we call alarm(0) to cancel the timeout.
Lines 22–43: The remainder of the server is unchanged. We also include the Chat-
bot::Eliza::_testquit() workaround in order to avoid problems when the user closes
the connection unexpectedly.
When I first wrote this program, I thought that I could simply use IO::Socket's built-in timeout mech-
anism, rather than roll my own ALRM-based timeout. However, there turned out to be a problem.
With the built-in timeout activated, accept() returned undef both when the legitimate timeout
occurred and when it was interrupted by the CHLD signal that accompanies every child process's
termination. After some trial and error, I decided there was no easy way to distinguish between the
two events, and went with the technique shown here.
inetd can also be used to launch UDP applications. In this case, when the program is launched, it
finds STDIN already opened on an appropriate UDP socket. recv() and send() can then be used
to communicate across the socket in the normal way. See Chapters 18 and 19 for more details.
Summary
Concurrent I/O is essential for connection-oriented servers to service multiple clients. Concurrency
can also be useful in client code to avoid deadlock situations.
This chapter introduced forking, the most common technique for achieving concurrency. Forking is
generally easy to use, but it has a few things to watch for, the most important being the need to wait
on exited child processes. It is also common for production servers to detach themselves from the
controlling terminal and autobackground themselves. In this chapter we developed a become_dae-
mon() subroutine to do this.
On UNIX systems, the inetd superdaemon provides a simple way to turn ordinary command-line
applications into forking servers. You can also use it as a handy way to launch a conventional forking
server when needed, thus avoiding having to start the daemon manually.
The next chapters look at other techniques for handling concurrent connections, beginning with
multithreading and continuing with multiplexing.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
263
This chapter discusses network application development using Perl's lightweight thread API.
Threads provide a program architecture that in many ways is easier to use than multiprocessing.
About Threads
Multithreading is quite different from multiprocessing. Instead of there being two or more processes,
each with its own memory space, signal handlers, and global variables, multithreaded programs
have a single process in which run several "threads of execution." Each thread runs independently;
it can loop or perform I/O without worrying about the other threads that are running. However, all
threads share global variables, filehandles, signal handlers, and other resources.
While this sharing of resources enables threads to interact in a much more intimate way than the
separate processes created by fork(), it creates the possibility of resource contention. For ex-
ample, if two threads try to modify a variable at the same time, the result may not be what you expect.
For this reason, resource locking and control becomes an issue in threaded programs. Although
multithreaded programming simplifies your programming in some ways, it complicates it in others.
The Thread module was introduced in Perl 5.005. To use it, you must run an operating system that
supports threads (including most versions of UNIX and Microsoft Windows) and have compiled Perl
with threading enabled. In Perl 5.005, you do this by running the Configure installation program with
the option -Dusethreads. With Perl 5.6.0 and higher, the option becomes -
Duse5005threads. No precompiled Perl binaries come with threading support activated.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Multithreaded Applications 264
The API threads, which is described in the Thread, Thread::Queue, Thread:: Semaphore, and
attrs manual pages, seems simple but hides many complexities. Each program starts with a single
thread, called the main thread. The main thread starts at the beginning of the program and runs to
the end (or until exit() or die() is called).
To create a new thread, you call Thread->new(), passing it a reference to a subroutine to execute
and an optional set of arguments. This creates a new concurrent thread, which immediately exe-
cutes the indicated subroutine. When the subroutine is finished, the thread exits. For example, here's
how you might launch a new thread to perform a time-consuming calculation:
my $thread = Thread->new(\&calculate_pi, precision => 190);
The new thread executes calculate_pi(), passing it the two arguments " precision " and "
190." If successful, the call immediately returns with a new Thread object, which the calling thread
usually stashes somewhere. The Thread object can now call detach(), which frees the main
thread from any responsibility for dealing with it.
Alternatively, the thread can remain in its default attached state, in which case the main thread (or
any other thread) should at some point call the Thread object's join() method to retrieve the
subroutine's return value. This is sometimes done just before exiting the program, or at the time the
return value is needed. If the thread has not yet finished, join() blocks until it does. To continue
with the previous example, at some point the main thread may wish to retrieve the value of pi com-
puted by the calculate_pi() subroutine. It can do this by calling:
my $pi = $thread->join;
Unlike the case with parent and children processes where only a parent can wait() on its children,
there is no strict familial relationship between threads. Any thread can call join() on any other
thread (but a thread cannot join() itself).
For a thread to exit, it need only return() from its subroutine, or just let control fall naturally through
to the bottom of the subroutine block. Threads should never call Perl's exit() function, because
that would kill both the current thread and all other threads (usually not the intended effect!). Nor
should any thread other than the main one try to install a signal handler. There's no way to ensure
that a signal will be delivered to the thread you intend to receive it, and it's more than likely that Perl
will crash.
A thread can also exit abnormally by calling die() with an error message. However, the effect of
dying in a thread is not what you would expect. Instead of raising some sort of exception immediately,
the effect of die() is postponed until the main thread tries to join() the thread that died. At that
point, the die() takes effect, and the program terminates. If a non-main thread calls join() on a
thread that has died, the effect is postponed until that thread itself is joined.
You can catch this type of postponed death and handle using eval(). The error message passed
to die() will be available in the $@ global.
my $pi = eval {$thread->join} || warn "Got an error: $@";
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Multithreaded Applications 265
#!/usr/bin/perl
use Thread;
my $thread1 = Thread->new(\&hello, "I am thread 1",3);
my $thread2 = Thread->new(\&hello, "I am thread 2",6);
$_->join foreach ($thread1,$thread2);
sub hello {
my ($message,$loop) =@_;
for (1..$loop) { print $message,"\n"; sleep 1; }
}
When you run this program, you'll see output like this:
% perl hello.pl
I am thread 1
I am thread 2
I am thread 1
I am thread 2
I am thread 1
I am thread 2
I am thread 2
I am thread 2
I am thread 2
Locking
The problem with threads appears as soon as two threads attempt to modify the same variable
simultaneously. To illustrate the problem, consider this deceptively simple bit of code:
my $bytes_sent = 0;
my $socket = IO::Socket->new(....);
sub send_data {
my $data = shift
my $bytes = $socket->syswrite($data);
$bytes_sent += $bytes;
}
The problem occurs in the last line of the subroutine, where the $bytes_sent variable is incre-
mented. If there are multiple simultaneous connections running, then the following scenario can
occur:
1. Thread 1 fetches the value of $bytes_sent and prepares to increment it.
2. A context switch occurs. Thread 1 is suspended and thread 2 takes control. It fetches the value
of $bytes_sent and increments it.
3. A context switch again occurs, suspending thread 2 and resuming thread 1. However, thread
1 is still holding the value of $bytes_sent it fetched from step 1. It increments the original
value and stores it back into $bytes_sent, overwriting the changes made by thread 2.
This chain of events won't happen every time but will happen in a rare, nondeterministic fashion,
leading to obscure bugs that are hard to track down.
The fix for this is to use the lock() call to lock the $bytes_sent variable before trying to use it.
With this small modification, the example now works properly:
my $bytes_sent = 0;
my $socket = IO::Socket->new(....);
sub send_data {
my $data = shift
my $bytes = $socket->syswrite($data);
lock($bytes_sent);
$bytes_sent += $bytes;
}
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Multithreaded Applications 266
lock() creates an "advisory" lock on a variable. An advisory lock prevents another thread from
calling lock() to lock the variable until the thread that currently holds the lock has relinquished it.
However, the lock doesn't prevent access to the variable, which can still be read and written even
if the thread doesn't hold a lock on it. Locks are generally used to prevent two threads from trying
to update the same variable at the same time.
If a variable is locked and another thread tries to lock it, that thread is suspended until such time as
the lock is available. A lock remains in force until the lock goes out of scope, just like a local variable.
In the preceding example, $bytes_sent is locked just before it's incremented, and the lock remains
in force throughout the scope of the subroutine.
If a number of variables are changed at the same time, it is common to create an independent
variable that does nothing but manage access to the variables. In the following example, the
$ok_to_update variable serves as the lock for two related variables, $bytes_sent and
$bytes_left:
my $ok_to_update;
sub send_data {
my $data = shift
my $bytes = $socket->syswrite($data);
lock($ok_to_update);
$bytes_sent += $bytes;
$bytes_left -= $bytes;
}
It is also possible to lock an entire subroutine using the notation lock(\&subroutine). When a
subroutine is locked, only one thread is allowed to run it at one time. This is recommended only for
subroutines that execute very quickly; otherwise, the multiple threads serialize on the subroutine
like cars backed up at a traffic light, obliterating most of the advantages of threads in the first place.
Variables that are not shared, such as the local variables $data and $bytes in the preceding
example, do not need to be locked. Nor do you need to lock object references, unless two or more
threads share the object.
When using threads in combination with Perl objects, object methods often need to lock the object
before changing it. Otherwise, two threads could try to modify the object simultaneously, leading to
chaos. This object method, for example, is not thread safe, because two threads might try to modify
the $self object simultaneously:
sub acknowledge { # NOT thread safe
my $self = shift;
print $self->{socket} "200 OK\n";
$self->{acknowledged}++;
}
You can lock objects within object methods explicitly, as in the previous example:
sub acknowledge { # thread safe
my $self = shift;
lock($self);
print $self->{socket} "200 OK\n";
$self->{acknowledged}++;
}
Since $self is a reference, you might wonder whether the call to lock() is locking the $self
reference or the thing that $self points to. The answer is that lock() automatically follows ref-
erences up one level (and one level only). The call to lock($self) is exactly equivalent to calling
lock(%$self), assuming that $self is a hash reference.
Threading versions of Perl provide a new syntax for adding attributes to subroutines. With this syn-
tax, the subroutine name is followed by a colon and a set of attributes:
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Multithreaded Applications 267
To create a locked method, use the attributes locked and method. If both attributes are present, as
in the preceding example, then the first argument to the subroutine (the object reference) is locked
on entry into the method and released on exit. If only locked is specified, then Perl locks the sub-
routine itself, as if you had specifically written lock(\&acknowledge). The key difference here is
that when the attributes are set to locked method, it's possible for multiple threads to run the sub-
routine simultaneously so long as they're working with different objects. When a subroutine is
marked locked only, then only one thread can gain access to the subroutine at a time, even if they're
working with different objects.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Multithreaded Applications 268
You do not need to explicitly import the Thread module to use lock(). It is built into the core of all
versions of Perl that support multithreading. On versions of Perl that don't support multithreading,
lock() has no effect. This allows you to write thread-safe modules that will work equally well on
threading and nonthreading versions of Perl.
The next five items are functions that must be imported explicitly from the Thread module:
use Thread qw(async yield cond_wait cond_signal cond_broadcast);
We will use cond_wait() and cond_broadcast() in Chapter 14, when we develop an adaptive
prethreaded server.
However, you should be aware that Thread::Signal changes the semantics of signals so that they
can no longer be used to interrupt long-running system calls. Hence, this trick will no longer work:
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Multithreaded Applications 269
alarm (10);
my $bytes =
eval {
local $SIG{ALARM} = sub { die };
sysread($socket,$data,1024);
};
In some cases, you can work around this limitation by replacing the eval{} section with a call to
select(). We use this trick in Chapter 15.
In practice, Thread::Signal sometimes seems to make programs less stable rather than more so,
depending on which version of Perl and which threading libraries you are using. My advice for
experimenting with threading features is to first write the program without Thread::Signal and add
it later if unexpected crashes or other odd behavior occurs.
Lines 1–5: Load modules—We begin by loading IO::Socket and the Thread module. We also
bring in a specialized version of Chatbot::Eliza in which the command_interface() method
has been rewritten to work well in a multithreaded environment (Figure 11.2).
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Multithreaded Applications 270
Licensed by
Stjepan Maric
4218908
Figure 11.2. The Chatbot::Eliza::Server class
Lines 6–12: Create listening socket—As in the previous examples, we create a new listening
socket with IO::Socket::INET->new(). If a listening socket can't be created, we die with
the error message IO::Socket leaves in $.
Lines 12–15: Accept loop—We now enter the server's main loop. Each time through the loop
we call accept(), yielding a new socket connected to the incoming client. We launch a new
thread to handle the connection by calling Thread->new() with a reference to the inter-
act() subroutine and the connected socket as its single argument. We then go back to waiting
on accept().
Notice that there is no need to close the listen or accept socket, as we did in the forking server
examples. This is because duplicate socket handles are never created.
Lines 16–31: The interact() subroutine—This subroutine handles the conversation with the
user and runs in a separate thread. Since the main server never checks the return value of the
connection threads it launches, there's no need to keep this status information; so we begin by
detaching ourselves from the main thread.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Multithreaded Applications 271
A Multithreaded Client
To go along with the multithreaded psychiatrist server, this section develops a multithreaded client
named gab4.pl (Figure 11.3). It is similar to the byte stream-oriented forking client gab3.pl in Chapter
10 (Figure 10.4); but instead of forking a child process to read from the remote server, the read loop
is done inside a thread running the do_read() subroutine.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Multithreaded Applications 272
The other major difference between this client and the previous version is the termination process.
In both clients, when do_write() detects that standard input has been closed, the subroutine
closes the transmission half of the socket by calling shutdown(1). This sends an end of file to the
server, causing it to close its side of the socket, and this event propagates back to the
do_read() thread.
So far so good, but what happens when the server is the one to initiate the disconnection? The
do_read() thread detects the end-of-file condition and exits. However, the do_write() loop
running in the main thread is usually blocked waiting for data from standard input and will not be
notified of anything untoward until it tries to write a line of text to the socket and triggers a PIPE
signal. In the forking client, we finessed this by having the CHLD handler call exit(). In the thread-
ing example, there is no CHLD signal to catch, and so the easiest course of action is just to have
the host_to_user() thread call exit().
Summary
Multithreading provides an elegant way to achieve concurrency in connection-oriented network ap-
plications. It is unfortunate that the Perl implementation is not entirely reliable and that the API is
subject to change.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Multithreaded Applications 273
Perl 6 is expected to provide robust multithreading, and although the API will likely be different in
detail from what is presented here, the basic concepts of thread creation, destruction, and locking
will not change.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
274
The forking and threading techniques discussed in the last two chapters allow a program to handle
multiple concurrent connections. The last general technique that we cover is I/O multiplexing. Mul-
tiplexing doesn't take advantage of any operating system tricks to achieve the illusion of concur-
rency. Instead, multiplexed applications handle all connections in one main loop. For example, a
server that is currently servicing ten clients reads from each connected socket in turn, handles the
request, and then services the next client.
The big problem with interleaving I/O in this way is the risk of blocking. If you try to read from a
socket that doesn't have data ready, the read() and sysread() calls will block until new data is
received. When you're serving multiple connections, this is unacceptable because it causes all the
connections to stall until the connection you're waiting on becomes ready. Another potential problem
is that if the client on the other end isn't ready to read, then calls to syswrite() or print() will
also block. The performance of the server is held hostage to the performance of the slowest client.
The key to multiplexing is a built-in function called select() and its object-oriented equivalent, the
IO::Select module. With select() you can check whether an I/O operation on a filehandle will
block before performing the operation. This chapter discusses how to use these facilities.
A Multiplexed Client
Before addressing the details of how select() works, let's rewrite our "gab" client to use multi-
plexing. gab5.pl, like its previous incarnations, accepts lines from standard input, transmits them to
a remote server, and then relays the response from the server to standard output. Figure 12.1 shows
the code.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Multiplexed Applications 275
Lines 1–9: Load modules and process command-line arguments—We turn on strict type check-
ing and load the IO::Select and IO::Socket modules. We read the host and port to connect to
from the command line, or if a host is not specified, we assume the echo service on the local host.
Line 10: Create a new connected socket—We create a socket connected to the specified peer
by calling IO::Socket::INET->new() with the one-argument shortcut form.
Lines 11–13: Create a new IO::Select set—We will multiplex our reads on standard input and
on the socket. This means that we will read from standard input only when the user has some
data for us and read from the socket only when there's server data to be read.
To do this, we create a new IO::Select object by calling IO::Select->new(). An IO::Select
object holds one or more filehandles that can be monitored for their readiness to do I/O. After
creating the select object, we add STDIN and the socket by calling the select object's add()
method.
Lines 14–17: Main I/O loop—We now enter a while() loop. Each time through, we call the
select object's can_read() method to return the list of handles ready for reading. This list may
contain the socket handle, STDIN, or both. Our task is to loop through the list of ready handles
and take the appropriate action for each. If STDIN is ready for reading, we copy data from it to
the socket. If the socket is ready, we copy data from it to STDOUT.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Multiplexed Applications 276
Lines 18–24: Handle data on STDIN—If STDIN is ready to be read, we use sysread() to read
up to 2K bytes of data into a string variable named $buffer. If sysread() returns a positive
value, we write a copy of what we received to the socket. Otherwise, we have encountered an
end of file on standard input. We shutdown() the write half of the socket, sending the remote
server an end of file.
Lines 25–32: Handle data on the socket—If there is data to be read from the connected socket,
then we call sysread() on the socket to read up to 2K bytes. If the read is successful, we
immediately print it to STDOUT. Otherwise, the remote host has closed the connection, so we
write a message to that effect and exit.
You can use gab5.pl to talk to a variety of network servers, including those that are line oriented
and those that produce less predictable output. Because this script doesn't rely on either forking or
threading, it runs on practically all operating systems where Perl is available, including the Macin-
tosh.
$select = IO::Select->new([@handles])
The IO::Select new() class method creates a new IO::Select object. It can be called with a list of handles, in
which case they will be added to the set that IO::Select monitors, or it can be called with an empty argument
list, in which case the monitor set will be initially empty.
The handle list can be composed of any type of filehandle including IO::Handle objects, globs, and
glob references. You may also add and remove handles after the object is created.
$select->add(@handles)
This adds the list of handles to the monitored set and returns the number of unique handles successfully added.
If you try to add the same filehandle multiple times, the redundant entries are ignored.
$select->remove(@handles)
This removes the list of filehandles from the monitored set. IO::Select indexes its handles by file number, so
you can refer to the handle one way when you add it (e.g., STDOUT) and another way when you remove it (e.g.,
\*STDOUT).
$value = $select->exists($handle)
$count = $select->count
These are utility routines. The exists() method returns a true value if the handle is currently a member of
the monitored set. The count() method returns the number of handles contained in the IO::Select set.
The can_read(), can_write(), and has_exceptions() methods monitor the handle list for
changes in status.
@readable = $select->can_read([$timeout])
@writable = $select->can_write([$timeout])
@exceptional = $select->has_exception([$timeout])
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Multiplexed Applications 277
The can_read(), can_write(), and has_exception() methods each call select() on your behalf,
returning an array of filehandles that are ready for reading or writing or have exceptional conditions pending.
The call blocks until one of the handles in the IO::Select object becomes ready, or until the optional timeout
given by $timeout is reached. In the latter case, the call returns an empty list. The timeout is expressed in
seconds, and can be fractional.
Any of these methods can return an empty list if the process is interrupted by a signal. Therefore, always check
the returned list for filehandles, even if you have provided no timeout.
If you wish to select for readers and writers simultaneously, use the select() method.
Exceptional conditions on sockets are not what this term may imply. An exceptional condition occurs
when a TCP socket receives urgent data (we talk about how to generate and handle urgent data in
Chapter 17). An I/O error on a socket does not generate an exceptional condition, but instead makes
the socket both readable and writable. The nature of the error can then be detected by performing
a read or write on the socket and checking the $! variable.
IO::Select->select() can be used to put the current process to sleep for a fractional period
of seconds. Simply call the method using undef for all three IO::Select sets and the number of
seconds you wish to sleep. This code fragment causes the program to pause for 0.25 seconds:
@dis3:IO::Select->select(undef,undef,undef,0.25);
As with sleep(), select() returns prematurely if it is interrupted by a signal. Also don't count on
getting a pause of exactly 250 milliseconds, because select() is limited by the underlying reso-
lution of the system clock, which might not provide millisecond resolution. To get a version of sleep
that has microsecond resolution, use the Time::HiRes module, available from CPAN.
Sadly, select()'s argument-passing scheme is archaic and unPerlish, in volving complex manip-
ulation of bit vectors. You may see it in older scripts, but IO::Select is both easier to use and less
prone to error. However, you might want to use select() to achieve a fractional sleep without
importing IO::Select:
select(undef,undef,undef,0.25);
Don't confuse the four-argument version of select() with the one-argument version discussed in
Chapter 1. The latter is used to select the default filehandle used with print().
The perlfunc manual pages give full details on the built-in select() function.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Multiplexed Applications 278
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Multiplexed Applications 279
2. A nonblocking connect has been initiated, and the attempt completes. When a TCP socket is
nonblocking and you attempt to connect it, the call to connect() returns immediately and
the connection attempt continues in the background. When the connect attempt eventually
completes (either successfully or with an error), select() indicates that the socket is ready
for writing. This is discussed in more detail in Chapter 13.
Exceptional conditions apply only to sockets. There is only one common exception, which occurs
when a connected TCP socket has urgent data to be read. We discuss how urgent data works in
Chapter 17.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Multiplexed Applications 280
4. When the select() loop returns, examine the list of sockets ready for reading. If the listen
socket is among them, call accept() and add the resulting connected socket to the IO::Select
set.
5. If other sockets are ready for reading, perform I/O on them.
6. As client connections finish, remove them from the IO::Select set.
This version of the Eliza server illustrates how this works in practice.
Licensed by
Stjepan Maric
4218908
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Multiplexed Applications 281
Compared to the previous versions of this server, the major design change is the need to break up
the Chatbot::Eliza object's command_interface() method. The reason is that command_inter-
face() has its own I/O loop, which doesn't relinquish connection until the conversation with the
client is done. We can't allow this, because it would lock out other clients.
Instead, we again subclass Chatbot::Eliza to create a "polite" version, which adds three new meth-
ods named welcome(), one_line(), and done(). The first method returns a string containing
the greeting that the user sees when he or she first connects. The second takes a line of user input,
transforms it, and returns the string containing the psychiatrist's response, along with a new prompt
for the user. done() returns a true value if the user's previous line consisted of one of the quit
phrases such as "bye," "goodbye," or "exit."
Another change is necessary to keep track of multiple Chatbot::Eliza instances. Because each ob-
ject maintains an internal record of the user's utterances, we have to associate each connected
socket with a unique Eliza object. We do this by creating a global hash named %SESSIONS in which
the indexes are the socket objects and the values are the associated Chatbot::Eliza objects. When
can_read() returns a socket that is ready for I/O, we use %SESSIONS to look up the corresponding
Chatbot::Eliza object.
We'll walk through the main part first.
Lines 1–6: Load modules—We bring in IO::Socket, IO::Select, and Chatbot::Eliza::Polite. We
also declare the %SESSIONS hash for mapping IO::Socket instances to Chatbot objects.
Lines 7–12: Create listen socket—We create a listen socket on our default port.
Lines 13–15: Add listen socket to IO:: Select object—We create a new IO::Select object and add
the listen socket to it.
Lines 16–18: Main select() loop—We now enter the main loop. Each time through the loop,
we call the IO::Select object's can_read() method. This blocks indefinitely until the listen
socket becomes ready for accept() or a connected socket (none of which have yet been
added) becomes ready for reading.
Line 18: Loop through ready handles—When can_read() returns, its result is a list of handles
that are ready for reading. It's now our job to loop through this list and figure out what to do with
each one.
Lines 19–24: Handle the listen socket—If the handle is the listen socket, then we call its
accept() method, returning a new connected socket. We create a new Chatbot::Eliza::Polite
object to handle the connection and add the socket and the Chatbot object to the %SESSIONS
hash. By indexing the hash with the unique name of the socket object, we can recover the
corresponding Chatbot object whenever we need to do I/O on that particular socket.
After creating the Chatbot object, we invoke its welcome() method. This returns a welcome
message that we syswrite() to the newly connected client. After this is done, we add the
connected socket to the IO::Select object by calling IO::Select->add(). The connected
socket will now be monitored for incoming data the next time through the loop.
Lines 25–27: Handle I/O on connected socket—If a handle is ready for reading, but it is not the
listen socket, then it must be a connected socket accepted during a previous iteration of the
loop. We recover the corresponding Chatbot object from the %SESSIONS hash. If the lookup is
unsuccessful (which logically shouldn't happen), we just ignore the socket and go on to the next
ready socket.
Otherwise, we want to read a line of input from the client. Reading a line of input from the user
is actually a bit of a nuisance because Perl's line-oriented reads, including the socket's get-
line() method, use stdio buffering and are thus incompatible with calls to select().
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Multiplexed Applications 282
We'll see in the next chapter how to roll our own line-reading function that is compatible with
select(), but in this case we punt on the issue by doing a byte-oriented sysread() of up to
1,024 characters and treating that as if it were a full line. This is usually the case if the user is
interactively typing messages, so it's good enough for this server.
Lines 28–32: Send response to client— sysread() returns either the number of bytes read, or
0 on end of file. Because the call is unbuffered, the number of bytes returned may be greater
than 0 but less than the number we requested.
If $bytes is positive, then we have data to process. We clean the data up and pass it to the
Eliza object's one_line() method, which takes a line of user input and returns a response.
We call syswrite() to send this response back to the client. If $bytes is 0 or undef, then
we treat it as an end of file and allow the next section of code to close the session.
Lines 33–38: Handle session termination—The last part of the loop is responsible for closing
down sessions.
A session should be closed when either of two things occur. First, a result code of 0 from
sysread() signifies that the client has closed its end of the connection. Second, the user may
enter one of several termination phrases that Eliza recognizes, such as "bye," "quit," or "good-
bye." In this case, Eliza's done() method returns true.
We check for both eventualities. In either case, we remove the socket from the list of handles
being monitored by the IO::Select object, close it, and remove it from the %SESSIONS hash.
Note that we treat a return code of undef from sysread(), which indicates an I/O error of some
sort, in the same way as an end of file. This is often sufficient, but a server that was processing
mission-critical data would want to distinguish between a client that deliberately shut down the con-
nection and an error. In this case, you could pass $bytes to defined() to distinguish the two
possibilities.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Multiplexed Applications 283
Lines 1–3: Module setup—We load the Chatbot::Eliza module and declare the current package
to be its subclass by placing the name of the parent in the @ISA array.
Lines 4–14: The welcome() method—The welcome() method is copied from the top part of
the old command_interface() method. It sets the two prompts (the one printed before the
psychiatrist's utterances and the one printed in front of the user's inputs) to reasonable defaults
and then returns a greeting randomly selected from an internally defined list. The user prompt
is appended to this string.
Lines 15–34: The one_line() method—The one_line() method takes a string as input and
returns a response. We start by checking the user input for one of the quit phrases. If there is a
quit phrase, then we generate an exiting remark from a random list of final phrases, set an
internal flag that the user is done, and return the reply. Otherwise, we invoke our inherited
transform() method to turn the user's input into a suitably cryptic utterance and return the
response along with the next prompt.
Lines 35–36: The done() method—In this method we simply check the internal exit flag set in
one_line() and return true if the user wants to exit.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Multiplexed Applications 284
Win32 Issues
Another problem arises when using multiplexed applications on Microsoft Windows platforms. When
I originally developed the scripts in these chapters, I tested them on Windows98 using ActiveState
Perl, and everything seemed to work fine. Later, however, I discovered that the gab5.pl script (Figure
12.1) was consuming a large amount of CPU time, even when it was apparently doing nothing but
waiting for keyboard input.
When I tracked down this problem, I learned that the Win32 port of Perl does not support
select() on non-socket filehandles, including STDIN and STDOUT. So client-side scripts that
multiplex across STDIN do not wait for the filehandle to be ready, but just loop. This problem affects
the scripts in Figure 13.1 and 13.7. The IO: Poll call is affected as well (Chapter 16), and Figure
16.1 will exhibit the same problem.
Multiplexing across sockets works just fine on Win32 platforms, and so all the server examples work
as expected. The Macintosh port of Perl has no problem with select().
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Multiplexed Applications 285
Summary
This chapter completes our survey of all three of the major techniques for handling concurrent
stream-oriented connections. Each has advantages and disadvantages.
Multitasking using fork() is currently available only on UNIX and Win32 platforms (using Perl 5.6
or higher). It has more overhead than the other methods due to the necessity of launching a new
process to deal with each connection, but programs written using the technique tend to be simple
and reliable.
Multithreading is available on Win32 and many UNIX platforms and requires a version of Perl that
is built with threading support. Threaded applications must be careful to lock and release shared
variables and other resources, adding to the complexity of the code. We were able to get away
without using locking in the examples so far, but most real network applications have to deal with
this issue. Unfortunately, threading is not stable in current versions of Perl, and its API is likely to
change.
Multiplexing is available on all platforms on which Perl runs, including the Macintosh. Its drawback
is that it makes the program logic more difficult to follow due to the necessity of interleaving the I/O
from multiple sessions. Furthermore, it isn't bulletproof unless combined with nonblocking I/O, which
adds significant complexity to the code.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
286
By default, on most operating systems I/O is blocking. If a request to read or write some data can't
be satisfied immediately, the operating system puts the program to sleep until the call completes or
generates an error. For most programming tasks, this does not cause a problem, because disk
drives, terminal windows, and other I/O devices are relatively fast, at least in human terms. As we
saw in the last chapter, however, blocking presents a problem for client/server programming be-
cause a single blocked network call can make the whole program hang while other requests wait.
To mitigate the effect of blocking in read or write calls, one can either have several concurrent
threads of execution, as with forking and multithreading servers, or use select() to determine
which filehandles are ready for I/O. The latter strategy presents a problem, however, because a
socket or other filehandle may still block on syswrite() if you attempt to write more data than it
is ready to accept. At this point the write attempt blocks and the program stalls. To avoid this, you
may use nonblocking I/O.
This chapter describes how to set up and use nonblocking I/O. In addition to avoiding blocking during
reads and writes, nonblocking I/O can also be used to avoid long waits during the connect() call.
As we will see, nonblocking I/O avoids the problems associated with managing threads and pro-
cesses but introduces its own complexities.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Nonblocking I/O 287
sysopen() works only with local files and cannot be used to open pipes or sockets. Therefore,
unless you are dealing with slow local devices such as tape drives, you'll probably never create a
nonblocking filehandle in this way. More typically, you'll take a handle that has been opened with
socket() or open() and mark it as nonblocking after the fact. This is what the fcntl() function
can do.
The call takes three arguments. The first two arguments are a previously opened handle and a numeric con-
stant specifying a command to perform on the handle. The third argument is a numeric parameter to pass to
the command. Some commands don't need additional data, in which case passing a third argument of 0 will
do. If successful, fcntl() returns a true value. Otherwise, it sets $! and returns undef.
The Fcntl module provides constants for all the fcntl() commands. The two commands relevant
to nonblocking handles are F_GETFL and F_SETFL, which are used to retrieve and modify a han-
dle's flags after creation. When you call fcntl() with a command of F_GETFL, it returns a bitmask
containing the handle's current flags. Call fcntl() with the F_SETFL command to change the
handle's flags to the value set in $operand. You will want $operand to include the O_NON-
BLOCK flag. The result code will indicate success or failure in changing the flags.
There is a subtlety to using fcntl() to set the nonblocking status of a filehandle. Because non-
blocking behavior is just one of several options that can be set in the flag bitmap, you should call
F_GETFL first to find out what options are already set, set the O_NONBLOCK bit using a bitwise OR,
and then call F_SETFL to apply the modified flags to the handle.
Here's a small subroutine named blocking() that illustrates this. The routine's first argument is
a handle, and its optional second argument is a Boolean value that can be used to turn blocking
behavior on or off. If called without a second argument, the subroutine returns true if the handle is
blocking; otherwise, it returns false:
use Fcntl;
sub blocking {
my ($handle,$blocking) = @_;
die "Can't fcntl(F_GETFL)" unless my $flags =
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Nonblocking I/O 288
fcntl($handle,F_GETFL,0);
my $current = ($flags & O_NONBLOCK) == 0;
if (defined $blocking) {
$flags &= ~O_NONBLOCK if $nonblocking;
$flags |= O_NONBLOCK unless $blocking;
die "Can't fcntl(F_SETFL)" unless fcntl($handle,F_SETFL,$flags);
}
return $current;
}
Notice that sockets start blocking by default. To make them nonblocking, you need to call block-
ing() with a 0 argument.
warn "making socket nonblocking";
blocking($sock,0);
The Perl perlfunc POD pages contain more information on the fcntl() function.
$blocking_status = $handle->blocking([$boolean])
Called without an argument, blocking() returns the current status of blocking I/O for the handle. A true value
indicates that the handle is in normal blocking mode; a false value indicates that nonblocking I/O is active. You
can change the blocking status of a handle by providing a Boolean value to blocking(). A false value makes
the socket nonblocking; a true argument restores the normal blocking behavior.
Remember that socket objects start blocking by default. To make a socket nonblocking, you must
call blocking() with a false argument:
$socket->blocking(0);
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Nonblocking I/O 289
When reading from nonblocking handles, you must correctly distinguish the end-of-file condition
from EWOULDBLOCK. The first will return numeric 0 from sysread(), and the latter will return
undef. Code to read from a nonblocking socket should look something like this:
my $rc = sysread(SOCK,$data,$bytes);
This code fragment calls sysread() and stores its result code into the variable $rc. We first check
whether the result code is defined. If so, we know that the call was successful. A positive result code
indicates some number of bytes was read, while a numeric 0 flags the end-of-file condition. We
handle both cases in whatever way is appropriate for the application.
If the result code is undefined, then some error occurred and the specific error code can be found
in $!. We first check whether $! is numerically equal to EWOULDBLOCK, and if so handle the error.
In most cases, we just jump back to the top of the program's main loop and try the read again later.
In the case of other errors, we die with an error message.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Nonblocking I/O 290
We call syswrite() to write out the contents of the scalar variable $data and check the call's
result code. If at least one byte was written, then we truncate the variable using this trick:
substr($data,0,$rc) = ' ';
substr() is one of several Perl functions that can be used on the left side of an assignment state-
ment. Everything from the beginning of the number of bytes written is replaced with an empty string,
leaving the variable containing just the data that wasn't written. In the case in which syswrite()
was able to write the entire contents of the variable, this substr() expression leaves $data empty.
If the result code is 0 or undef, then we again compare the error code to EWOULDBLOCK and take
appropriate action, typically returning to the program's main loop and trying the write again later. On
other errors, we die with an error message.
This code fragment needs to be executed repeatedly until $data is entirely written. You could just
put a loop around the whole thing:
while (length $data > 0) {
}
Licensed by
my $rc = syswrite(SOCK,$data);
# ... etc....
Stjepan Maric
However, this is not terribly efficient because repeated writes to the same socket may just result in
the same EWOULDBLOCK error. It's best to incorporate the syswrite() call into a select() loop
and to do other work while waiting for the socket to become ready to accept more data. The next
section shows how to do this.
4218908
Using Nonblocking Handles with Line-Oriented I/O
As explained in Chapter 12, it's dangerous to mix line-oriented reads with select() because the
select call doesn't know about the contents of the stdio buffers. Another problem is a line-oriented
read blocks if there isn't a complete line to read; as soon as any I/O operation blocks, a multiplexed
program stalls.
What we would like to do is to change the semantics of the getline() call so that we can distin-
guish among three distinct conditions:
1. A complete line was successfully read from the filehandle.
2. The filehandle has an EOF or an error.
3. The filehandle does not yet have a complete line to read.
The standard Perl <> operator and getline() functions handle conditions 1 and 2 well, but they
block on condition 3. Our goal is to change this behavior so that getline() returns immediately if
a complete line isn't ready for reading but distinguishes this event from an I/O error.
The IO::Getline module that we develop here is a wrapper around a filehandle or IO::Handle object.
It has a constructor named new() and a single object method named getline().
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Nonblocking I/O 291
This returns the last I/O error on the wrapper, or 0 if no error has been encountered.
$wrapped->flush
This returns the object to a known state, discarding any partially buffered data.
$fh = $wrapped->handle
This returns the filehandle used to construct the wrapper.
Notice that getline() acts more like read() or sysread() than the traditional <> operator.
Instead of returning the read line directly, it copies the line into the $data argument and returns a
result code. Table 13.1 gives the possible result codes from getline().
getline() returns the length of the line (including the newline) if it successfully read a line of text,
0 if it encountered the end of file, and undef on other errors. However, there is an additional result
code returned when getline() detects that the operation would block. In this case, the method
returns the string 0E0.
As described in Chapter 8, when evaluated in a numeric context, 0E0 acts like 0 (it is treated as the
floating point number 0. 0E0). However, when used in a logical context, 0E0 is true. You can
interpret this result code as meaning "Zero but true." In other words, "no error yet; try again later."
In addition to the getline() method, you can call any method of the wrapped filehandle object.
IO::Getline simply passes the method call to the underlying object. This lets you call methods such
as sysread() and close() directly on the getline object.
Using IO::Getline
IO::Getline is designed to be used in conjunction with select(). Because it never blocks, you can't
use it simply as a plug-in replacement for the <> operator.
To illustrate the intended use of IO::Getline, Figure 13.1 shows a small program that combines
select() with IO::Getline to read from STDIN in a line-oriented way. We load the IO::Getline and
IO::Select modules and create an IO::Select set containing the STDIN filehandle. We then call
IO::Getline->new() to create a new nonblocking getline object wrapped around STDIN.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Nonblocking I/O 292
We now enter a select loop. Each time through the loop we call the select object's can_read()
method, which returns true when STDIN has data to read from.
Rather than read from STDIN with <>, we call the getline object's getline() method to read the
line into $data. getline() may return a false value, in which case we exit the loop because we
have reached the end of file. Or it may return a true result. If the result code is greater than 0, then
we have a line to print, so we copy it to standard output. Otherwise, we know that a complete line
hasn't yet been read, so we go back to the top of the select loop.
At the end of the loop, we call the wrapper's error() method to see if the loop terminated abnor-
mally. If so, we die with an error message that contains the error code.
IO::Getline objects can also be used in blocking fashion. To do this is simply a matter of calling the
object's blocking() method. The method is automatically passed down to the underlying filehan-
dle:
stdin->blocking(1); # turn blocking behavior back on
We use this module in more substantial programs in Chapter 17's TCP Urgent Data section, and in
Chapter 18's The UDP Protocol section.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Nonblocking I/O 293
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Nonblocking I/O 294
Lines 1–9: Set up module —We load the IO::Handle and Carp modules and bring in the
EWOULDBLOCK error code from Errno. Another constant sets the size of the chunks that we will
sysread() from the underlying filehandle. The Carp module provides error messages that
indicate the location of the error from the caller's point of view, and is therefore preferred for use
inside modules.
Lines 10–22: The new() method—This is the constructor for new objects. We take the handle
passed to us from the caller, mark it nonblocking, and incorporate it into a new blessed hash
under the handle field. In addition, we define an internal area for buffering incoming data, stored
in the buffer field, an index to use when searching for the end-of-line sequence stored in the
index field, and two flags. The eof flag is set when we encounter an end-of-file condition, while
error is set when we get an error.
Lines 23–30: The AUTOLOAD method— AUTOLOAD is a subroutine that Perl invokes automati-
cally when the caller tries to invoke a method that isn't defined in the module. We define this as
a courtesy. The code just passes on the method call and arguments to the wrapped filehandle
and returns an error if the method call fails.
Line 31: The handle() accessor—This method returns the wrapped filehandle if the caller
wishes to gain low-level access to it.
Line 32: The error() accessor—If an error occurs during a getline() operation, this method
returns its error number.
Lines 33–37: The flush() method—The flush() method returns the object to a known state,
emptying any partially buffered lines in the buffer field and setting index to 0.
Lines 38–77: The getline() method—This is the interesting part of the module. At the time
of entry, $_[0] (the first argument in the @_ array of subroutine arguments) contains the
scalar variable that will receive the read line. To change the variable in the caller's code, we
refer to $_[0] directly rather than copy it into a local variable in the usual way.
Because we operate in a buffered way, we must be prepared to report to the caller conditions
that occurred earlier. We start by checking our eof and error flags. If we encountered the EOF
on the last call, we return numeric 0. Otherwise, if there was an error, we return undef.
There may already be a complete line in our internal buffer left over from a previous read. We
use Perl's built-in index() function to find the next end-of-line sequence in the buffer, returning
its position. Instead of hard coding the newline character, we use the current contents of the
$/ global. In addition, we can optimize the search somewhat by remembering where we left off
the previous time. This information will be stored in the index field. We store the result of
index() into a local variable, $i.
Lines 49–59: Read more data and handle errors —If the end-of-line sequence isn't in our buf-
fered data, then $i will be −1. In this case, we need to read more data from the filehandle and
try again. We remember in index where the line-end search left off the previous time, and invoke
sysread(), using arguments that cause the newly read data to be appended to the end of the
buffer.
If sysread() returns undef, it may be for any of a variety of reasons. Because it is nonblocking,
one possibility is that we got an EWOULDBLOCK error. In this case, we cannot return a complete
line at the current time, so we return 0E0 to the caller.
Otherwise, we've encountered some other kind of I/O error. In this case, we return whatever is
left in the buffer, even if it isn't a complete line. This is identical to the behavior of the <> operator,
which returns a partial line on an error. We set our error flag and return the length of the result.
Note that the caller won't actually see this undef result until the next call to getline().
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Nonblocking I/O 295
Lines 54–59: Handle EOF—We take a similar strategy on EOF. In this case the sysread()
result code is defined by 0. We return what we have left in the buffer, remember the condition
in our eof flag, and return the size of the buffer contents.
Lines 65–77: Try for the end of line again—If we get to this point, then sysread() appended
one or more new bytes of data to our buffer. We now call index() again to see if an end-of-
line sequence has appeared. If not, we remember where we stopped the search the last time
and return 0E0 to the caller.
Otherwise, we've found the end of line. We copy everything from the beginning of the buffer up
through and including the end-of-line sequence into the caller's scalar, and then delete the part
of the buffer we've used. We reset the index field to 0 and return the length of the line.
1This server does not reverse lines, as previous echo server examples did, because it is byte stream rather than line-oriented. We discuss a
line-oriented example in the next section.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Nonblocking I/O 296
Lines 1–4: Load modules—We begin by loading IO::SessionSet. IO::SessionSet loads IO::Ses-
sionData automatically.
Lines 5–9: Create listen socket—We create a listen socket in the normal way, dying in case of
error.
Line 10: Create new IO::SessionSet object—We create an IO::SessionSet object by calling
IO::SessionSet->new(), using the listen socket as its argument. This tells IO::SessionSet
to perform an automatic accept() on the socket whenever a new client tries to connect.
Lines 11–13: Main loop—The rest of the server is about a dozen lines of code. The body of the
server is an infinite loop. Each time through the loop, we call the IO::SessionSet object's
wait() method, which returns a list of handles that are ready for reading. It is roughly equivalent
to IO::Select's can_read() method, but returns a list of IO::SessionData objects rather than
raw IO::Socket objects.
wait() handles the listening socket completely internally. If an incoming connection is detected,
wait() calls the listen socket's accept() method, turns the returned connected socket into a
new IO::SessionData object, and adds the object to its list of monitored sockets. This new ses-
sion object is now returned to the caller along with any other IO::SessionData objects that are
ready for I/O.
Internally, wait() also finishes partial writes that may have occurred during previous iterations
of the loop. If no sessions are ready for reading, wait() blocks indefinitely.
Lines 14–21: Handle sessions—We now loop through each of the SessionData objects returned
by wait() and handle each object in turn.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Nonblocking I/O 297
For each session object, we call its read() method, which returns up to 4K bytes of data into
a local variable. If read() returns a true value, we immediately send the data to the session's
write() method, writing it back to the client.
If read() returns a false value, we treat it as an end of file. We close the session by calling its
close() method and continue looping.
Although IO::SessionData->read() looks and acts much like IO::Socket->read(), there
is a crucial difference. Whereas the IO::Socket method will return either the number of bytes read
or undef on failure, IO::SessionData->read() can also return 0E0 if the call would block, in
the same way as the Getline module's getline() method.
In the main loop of Figure 13.3, we first test the result code in a logical if() statement. In this
context, an EWOULDBLOCK result code is treated as a true value, telling us that no error occurred.
Then, before we call write(), we treat the result code as a byte count and look to see whether it
is greater than 0. In this case, 0E0 is used in a numeric context and so evaluates to a byte count of
0. We skip the write and try to read from the object later.
The IO::SessionData->write() method has the same syntax as IO::Socket->write().
The method sends as much of the data as it can, and buffers whatever data is leftover from partial
writes. The remainder of the queued data is written out automatically during subsequent calls to
wait().
The write() method returns the number of bytes written on success, 0E0 if the operation would
block, or undef on error. Since the vast majority of I/O errors encountered during writes are unre-
coverable, write() also automatically closes the IO::SessionData object and removes it from the
session set when it encounters an error. (If you don't like this, you can subclass IO::SessionData
and override the method that does this.) Check $! to learn which specific error occurred.
Because it's possible that there is buffered outgoing data in the session at the time we call its
close() method, the effect of close() may be delayed. Subsequent calls to wait() will attempt
to send the remaining queued data in the SessionData object and only close the socket when the
outgoing buffer is empty. However, even if there is buffered data left, close() immediately removes
the session from the IO::SessionSet so that it is never returned.
Another important difference between IO::Socket and IO::SessionData is that IO::SessionData ob-
jects are not filehandles. You cannot directly call sysread() or syswrite() using a SessionData
object as the target. You must always go through the read() and write() method calls.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Nonblocking I/O 298
Lines 1–14: Initialize script—The script begins in much the way other members of the family do.
The major difference is that we import the IO::LineBufferedSet module and create a new session
set using this class.
Lines 15–17: Main loop—The main loop starts with a call to the session set's wait() method.
This returns a list of SessionData objects that are ready for reading. Some of them are Ses-
sionData objects that we have seen on previous iterations of the loop; others are new sessions
that were created when wait() called accept() on a new incoming connection.
Lines 18–23: Create new Chatbot objects—We distinguish between new and old sessions by
consulting the %SESSIONS hash.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Nonblocking I/O 299
If this is a new incoming connection, then it lacks an entry in %SESSIONS, in which case we
create a fresh Chatbot::Eliza::Polite object and store it into %SESSIONS indexed by the Ses-
sionData object. We call Eliza's welcome() method to get the greeting and pass it to the Ses-
sionData object's write() method, queuing the message to be written to the client.
Lines 24–30: Handle old sessions—If %SESSIONS indicates that this is a session we have seen
before, then we retrieve the corresponding Eliza object.
We read a line of input by calling the SessionData's getline() method. This method acts like
the IO::Getline->getline() method that we developed earlier, returning a result code that
indicates the number of bytes read and placing the data into its scalar argument.
If the number of bytes read is positive, then we got a complete line. We remove the terminal
newline, pass the user input to the Eliza object's one_line() method, and hand the result off
to the session object's write() method.
Line 31: Close defunct sessions—If getline() returns a false value, then it indicates that the
client has closed its end of the connection. We call the current session's close() method,
removing it from the list of sessions monitored by the IO::LineBufferedSet object. We do the
same in the case that the user terminated the session by typing "goodbye" or another exit word.
Just like IO::SessionData->read(), IO::LineBufferedSet->getline() returns 0 in
case of end of file, 0E0 if the read would block, and undef for various error conditions.
Notice that we never explicitly check for the 0E0 result code on the reads. If getline() is unsuc-
cessful, it returns a false value (0 for end of file and undef for an error). "Would block" is treated
as a true value that just happens to result in a read of 0 bytes. The easiest strategy is to do nothing
in this case and just go back to waiting for IO in IO::SessionSet->wait().
Similarly, we don't check the result code from write(), because the SessionData object handles
blocked write calls by queuing the data in an internal buffer and writing it bit by bit whenever the
socket can accept it.
When IO::SessionData->read() is used in the way shown in these two examples, it is unlikely
that it will ever return 0E0. This is because IO::SessionSet->wait uses select() to ensure
there will be at least 1 byte to read from any SessionData object it returns. The exception to this
rule occurs when the SessionData object has just been created as the result of an incoming con-
nection. In this case, there may very well be no data to read from it immediately. This is why we skip
the getline() attempt when dealing with a new session (lines 19–23).
If you were to call the read() method several times without an intervening IO::SessionSet-
>wait(), the "would block" condition might very well occur. It is good practice to check that
read() or getline() returns a positive byte count before trying to work with the data returned.
2These modules use many object-oriented tricks and other Perl idioms. If you find the code hard to follow, look at the implementation of the
gab7.pl client in Chapter 16 ( Figure 16.1). Although it uses IO::Poll rather than IO::Select, this code handles the problems of nonblocking
I/O using the same strategy as the more general modules presented here.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Nonblocking I/O 300
IO::SessionData is a wrapper around a single IO::Socket object. In addition to the socket, it main-
tains an internal buffer called the outbuffer, which holds data that has been queued for sending but
has not yet been sent across the socket. Other internal data includes a pointer to the SessionSet
that manages the current SessionData object, a write-only flag, and some variables that manage
what happens when the outgoing buffer fills up. IO::SessionData calls the associated SessionSet
to tell it when it is ready to accept new data from the remote socket and when it has outgoing data
to write.
Because the outgoing data is buffered, there is a risk of the outbuffer ballooning if the remote side
stops reading data for an extended period. IO::SessionData deals with this problem by defining a
choke method that is called whenever the outbuffer exceeds its limit, and called again when the
buffer returns to an acceptable size.
choke() is application specific. In some applications it might be appropriate to discard the extra
buffered data, while in others the program might want to terminate the connection to the remote
host. IO::SessionData allows the application to determine what choke() does by setting a callback
Licensed by
subroutine that is invoked when outbuffer fills up. If no callback is set, then choke()'s default action
is to flag the session so that it will no longer accept incoming data. When the write buffer returns to
a reasonable size, the session is allowed to accept incoming data again. This is appropriate for
many server applications in which the server reads some data from the session, processes it, and
writes information back to the session.
Stjepan Maric
IO::SessionData also allows you to create write-only sessions. This is designed to allow you to wrap
write-only filehandles like STDOUT inside an IO::SessionData and use it in a nonblocking fashion.
At the end of this chapter we give an example of how this works.
4218908
To summarize, the public API for IO::Session Data is as follows:
$bytes = $session->read($scalar,$length[$offset])
Like sysread(), except that on EWOULDBLOCK errors, it returns 0E0.
$bytes = $session->write($scalar)
Like syswrite(), except that on EWOULDBLOCK errors, it returns 0E0.
$bytes = $session->pending
Returns the number of unsent bytes pending in outbuffer.
$bytes = $session->write_limit([$limit])
Gets or sets the write limit, which is the maximum number of unsent bytes that can be queued in outbuffer.
$coderef = $session->set_choke([$coderef])
Gets or sets a coded reference to be invoked when outbuffer exceeds the write limit. The code will also be
invoked when outbuffer returns to an allowed size.
$result = $session->close()
Closes the session, forbidding further reads. The actual filehandle will not be closed until all pending output
data is written.
$fh = $session->handle()
Returns the underlying file handle.
$session_set = $session->session
Returns the associated IO::SessionSet.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Nonblocking I/O 301
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Nonblocking I/O 302
Lines 1–7: Initialize module—We begin by importing the EWOULDBLOCK constant from Errno and
loading code from the IO::SessionSet module. We also define a constant default value for the
maximum size of the outgoing buffer.
Lines 11–29: The new() method —The new() method constructs a new IO::SessionData ob-
ject. This method is intended to be called from IO::SessionSet, not directly.
The new() method takes three arguments: the IO::SessionSet that's managing it, an IO::Handle
object (typically an IO::Socket), and an optional flag that indicates whether the handle is to be
treated as write-only. This last feature makes it possible to manage one-way filehandles such
as STDOUT.
We put the handle into nonblocking mode by calling its blocking() method with an argument
of 0 and set up our state variables in a hash reference. This reference is now blessed with the
bless() function. The effect is that the reference is turned into an object that can invoke any
of our methods. When our methods are invoked, the blessed reference is returned to us as the
first argument. By convention, our methods store the returned object in a variable named $self.
Unless the handle is marked write-only, we now call our internal readable() method with a
true argument to tell the associated IO::SessionSet that the handle is ready for reading. The
object is returned to the caller.
Lines 30–46: The handle(), sessions(), pending(), and write_limit() methods
—The next part of the module consists of a series of methods that provide access to the object's
internal state. The handle() method returns the stored filehandle object; the sessions()
method returns the associated IO::SessionSet object; pending() returns the number of bytes
that are queued to be written; and write_limit() gets or sets the size limit on the outbuffer.
The code for write_limit() may look a bit cryptic, but it is a common Perl idiom for getting
or setting a state variable in a Perl object. If the method is called with no arguments, then it
returns the value of the write_limit state variable. Otherwise it uses the passed argument to
update the value of write_limit.
Lines 47–51: The set_choke() method—The set_choke() method retrieves or sets the
callback subroutine that is invoked whenever the outgoing buffer exceeds its limit. The structure
of this method is identical to write_limit().
We expect to get a code reference as the argument, and a more careful implementation of this
method would check that this is the case.
Lines 52–60: The write() method, queuing data —Now we come to the more interesting part
of the module. The write() method is responsible for sending data over the handle. If part or
all of the data can't be sent immediately, then it is queued in outbuffer for a later attempt.
write() can be called with just a single argument that contains data to be written, as in $ses-
sion->write($data), or called with no arguments, as in $session->write(). In the latter
case, the method tries to send any queued data it has from previous attempts.
We begin by recovering the object from the subroutine stack and sanity checking that the file-
handle and outbuffer are defined. If these checks pass, and if the caller asked for more data to
be queued for output, we append the new data to outbuffer. Notice that outbuffer is allowed to
grow as large as the data to be passed to write(). The write limit only comes into play when
marking the IO::SessionData object as ready for reading or writing additional data.
Lines 61–79: The write() method, writing data—The next section of the write() method
tries to do I/O. If data is pending in the outbuffer, then we call syswrite() with the handle and
the contents of outbuffer and save the result code. However, before calling syswrite(), we
localize $SIG{PIPE} and set it to IGNORE. This prevents the program from getting a fatal signal
if the filehandle is closed prematurely. After the method exits, the PIPE handler is automatically
restored to its previous state so that this adjustment does not interfere with user code.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Nonblocking I/O 303
If syswrite() returns a defined result code, then it was at least partially successful, and the
result code holds the number of bytes written. We use substr() to truncate the outbuffer by
the number of bytes written. This might leave outbuffer empty if all bytes were written, or might
leave it containing the unwritten remainder if syswrite() reported a partial write.
Otherwise, the result code is undef, indicating an error of some sort. We check the error code
stored in $! and take appropriate action.
If the error code is EWOULDBLOCK, then we return 0E0. Otherwise, some other type of write error
occurred, most likely a pipe error. We deal with this situation by deferring to an internal method
named bail_out(). In the current implementation, bail_out() simply closes the handle and
returns undef. To get more sophisticated behavior (such as logging or taking different actions
depending on the error), create a subclass of IO::SessionData and override bail_out().
If we happen to be called when outbuffer is empty and there is no data to queue, then we just
return 0E0. This won't ordinarily happen.
Finally, before we exit, we call an internal method named adjust_state(). This synchronizes
the IO::SessionData object with the IO::SessionSet object that manages it. We finish by returning
our result code.
Lines 80–90: The read() method—In contrast, the read() method is short. This method has
the same syntax as Perl's built-in read() and sysread() functions. It is, in fact, a simple
wrapper around sysread() that intercepts the result code and returns 0E0 on an EWOULD-
BLOCK error.
The only tricky feature is that we reference elements in the subroutine argument list directly (as
$_[0], $_[1], etc.) rather than copy them into local variables. This allows us to pass these
values directly to sysread() so that it can modify the caller's data buffer in place.
Lines 91–102: The close() method—The close() method is responsible for closing our fil-
ehandle and cleaning up. There's a slight twist here because of the potential for pending data
in the outgoing write buffer, in which case we can't close the filehandle immediately, but only
mark it so that the true close happens after all pending data is written.
We call the pending() method to determine if there is still data in the write buffer. If not, then
we immediately close the filehandle and alert the IO::SessionSet that manages this session to
delete the object from its list. Otherwise, we flag this session as no longer readable by calling
the readable() method with a false argument (we will see more of readable() later) and
set a delayed close flag named closing.
Lines 103–116: The adjust_state() method—The next method, adjust_state(), is the
way the session communicates with its associated IO::SessionSet.
We begin by calling two internal methods that are named writable() and readable(), which
alert the IO::SessionSet that the session is ready to write data and read data, respectively. Our
first step is to examine the outgoing buffer by calling the pending() method. If there is data
there, we call our writable() method with a true flag to indicate that we have data to write.
Our second step is to call the choke() method if a nonzero write_limit has been defined. We
pass choke() a true flag if the write buffer limit has been exceeded. The default choke() action
is to disallow further reading on us by setting readable() to false.
Finally, if the closing flag is set, we attempt to close the session by invoking the close()
method. This may actually close the session, or may just result in deferring the closing if there
is pending outgoing data.
Lines 117–130: The choke() method —The next method is choke(), which is called when
the amount of data in the outgoing buffer exceeds write_limit or when the amount of data in the
buffer has shrunk to below the limit.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Nonblocking I/O 304
We begin by looking for a callback code reference. If one is defined, we invoke it, passing it a
reference to the current SessionData object and a flag indicating whether the session should be
choked or released from choke.
If no callback is defined, we simply call the session's readable() method with a false flag to
disallow further input on this session until the write buffer is again an acceptable length.
Lines 131–145: The readable() and writable() methods—The next two methods are
readable() and writable(). They are front ends to the IO::SessionSet object's acti-
vate() method. As we will see in the next section, the first argument to activate() is the
current IO::SessionData object; the second is one of the strings "read" or "write"; and the third
is a flag indicating whether the indicated type of I/O should be activated or inactivated.
The only detail here is that if our session is flagged write only, then readable() does not try
to activate it.
Lines 146–157: The bail_out() method—The final method in the module is bail_out(),
which is called when a write error occurs. In this implementation, bail_out() drops all buffered
outgoing data and closes the session. The reason for dropping pending data is so that the close
will occur immediately, rather than wait indefinitely for a write that we know is likely to fail.
bail_out() receives a copy of the error code that occurred during the unsuccessful write. The
current implementation of this method ignores it, but you might wish to use the error code if you
subclass IO::SessionData.
That's a lot of code! But we're not finished yet. The IO::SessionData module is only half of the picture.
The other half is the IO::SessionSet module, which manages a set of nonblocking sessions.
$set = IO::SessionSet->new([$listen])
Creates a new IO::SessionSet. If a listen socket is provided in $listen, then the module automatically accepts
incoming connections.
$session = $set->add($handle[,$writeonly])
Adds the filehandle to the list of handles monitored by the SessionSet. If the optional $writeonly flag is true,
then the handle is treated as a write-only filehandle. This is suitable for STDOUT and other output-only filehan-
dles. add() wraps the filehandle in an IO::SessionData object and returns the object as its result.
$set->delete($handle)
Deletes the filehandle or IO::SessionData object from the monitored set.
@sessions = $set->wait([$timeout])
select()s over the set of monitored filehandles and returns the corresponding sessions that are ready for
reading. Incoming connections on the listen socket, if provided, are handled automatically, as are queued
writes. If $timeout is provided, wait() returns an empty list if the timeout expires before any handles are
ready for reading.
@sessions = $set->sessions()
Returns all the IO::SessionData objects that have been registered with this set.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Nonblocking I/O 305
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Nonblocking I/O 306
Lines 1–7: Initialize module—We begin by bringing in the necessary modules and by defining a
global variable, $DEBUG, that may be set to enable verbose debugging. This facility was invalu-
able to me while I was developing this module, and you may be interested in activating it to see
what exactly the module is doing.
To activate debugging, simply place the statement $IO::SessionSet::DEBUG=1 at the top
of your program.
Lines 8–27: The new() constructor—The new() method is the constructor for this class. We
define three state variables, each of which is a key in a blessed hash. One, named sessions,
holds the set of sessions. The other two, readers and writers, hold IO::Select objects that will
be used to select handles for reading and writing, respectively.
If the new() method was called with a listening IO::Socket object, then we store the socket in
a fourth state variable and call IO::Select's add() method to add the listen socket to the list of
handles to be monitored for reading. This allows us to make calls to accept() behind the
scenes.
Lines 28–30: The sessions() method—The sessions() method returns the list of IO::Ses-
sionData objects that have been registered with this module. Because this class needs to in-
terconvert between IO::SessionData objects and the underlying handles that they wrap around,
the session state variable is actually a hash in which the keys are IO::Handle objects (typically
sockets) and the values are the corresponding IO::SessionData wrappers. sessions() returns
the values of the hash.
Lines 31–39: The add() method —The add() method is called to add a handle to the monitored
set. It takes a filehandle and an optional write-only flag.
We call IO::SessionData->new() to create a new session object, and add the handle and
its newly created session object to the list of handles the IO::SessionSet monitors. We then
return the session object as our function result.
This method has one subtle feature. Because we want to be able to subclass IO::SessionData
in the future, add() doesn't hard code the session class name. Instead it creates the session
indirectly via an internal method named SessionDataClass(). This method returns the string
that will be used as the session object class, in this case "IO::SessionData." To make IO::Ses-
sionSet use a different wrapper, subclass IO::SessionSet and override (redefine) the Ses-
sionDataClass() method. We use this feature in the line-oriented version of this module
discussed in the next section.
Lines 40–52: The delete() method—Next comes the delete() method, which removes a
session from the list of monitored objects. In the interests of flexibility, this method accepts either
an IO::SessionData object to delete or an IO::Handle. We call two internal methods, to_han-
dle() and to_session(), to convert our argument into a handle or a session, respectively.
We then remove all references to the handle and session from our internal data structures.
Lines 53–61: The to_handle() method—The to_handle() method accepts either an
IO::SessionData object or an IO::Handle object. To distinguish these possibilities, we use Perl's
built-in isa() method to determine whether the argument is a subclass of IO::SessionData. If
this returns true, we call the object's handle() method to fetch its underlying filehandle and
return it.
If isa() returns false, we test whether the argument is a filehandle by testing the return value
of fileno(), and if so, return the argument unmodified. If neither test succeeds, we throw up
our hands in despair and return undef.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Nonblocking I/O 307
Lines 62–70: The to_session() method—The to_session() method performs the inverse
function. We check to see whether the argument is an IO::Session, and if so, return it unchanged.
Otherwise, we test the argument with fileno(), and if it looks like a filehandle, we use it to
index into our sessions hash, fetching the IO::Session object that corresponds to the handle.
Lines 71–92: The activate() method—The activate() method is responsible for adding
a handle to the appropriate IO::Select object when the handle's corresponding IO::SessionData
object indicates that it wants to do I/O. The method can also be used to deactivate an active
handle.
Our first argument is either an IO::SessionData object or a filehandle, so we begin with a call to
to_handle() to turn the argument—whatever it is—into a filehandle. Our second argument is
either of the strings "read" or "write." If it's "read," we operate on the readers IO::Select object.
Otherwise, we operate on the writers object. The appropriate IO::Select object gets copied into
a local variable.
Depending on whether the caller wants to activate or inactivate the handle, we either add or
delete the filehandle to the IO::Select set. In either case, we return the previous activation setting
for the filehandle.
Lines 93–110: The wait() method: handle pending writes—Finally we get to the guts of the
module, the wait() method. Our job is to call IO::Select->select() for the handles whose
sessions have declared them ready for I/O, to call write() for those sessions that have queued
outgoing data, and to call accept() on the listening handle if the IO::Select object indicates
that it is ready for reading. Any other filehandles that are ready for reading are used to look up
the corresponding IO::SessionData objects and returned to the caller.
The first part of this subroutine calls IO::Select->select(), returning a two-element list of
readers and writers that are ready for I/O. Our next task is to handle the writers with queued
data. We now loop through each of the writable handles, finding its corresponding session and
calling the session object's write() method to syswrite() as much pending data as it can.
The IO::SessionData->write() method, as you recall, will remove itself from the list of
writable handles when its outgoing buffer is empty.
Lines 111–127: The wait() method: handle pending reads —The next part of wait() deals
with each of the readable filehandles returned by IO::Select->select(). If one of the read-
able filehandles is the listen socket, we call its accept() method to get a new connected socket
and add this socket to our session set by invoking the add() method. The resulting IO::Ses-
sionData object is added to the list of readable sessions that we return to the caller.
If, on the other hand, the readable handle corresponds to any of the other handles, we look up
its corresponding session and add it to the list of sessions to be returned to the caller.
Lines 128–132: The SessionDataClass() method—The last method is SessionData-
Class(), which returns the name of the SessionData class that the add() method will create
when it adds a filehandle to the session set. In this module, SessionDataClass() returns the
string "IO::SessionData."
There's a small but subtle semantic inconsistency in IO::SessionSet->wait(). The new ses-
sion that is created when an incoming connection comes in is returned to the caller regardless of
whether it actually has data to read. This gives the caller a chance to write outgoing data to the
handle—for example, to print a welcome banner when the client connects.
If the caller invokes the new session object's read() method, it may have nothing to return. How-
ever, because the socket is nonblocking, this doesn't pose a practical problem. The read() method
will return 0E0, and the caller should ignore the read and try again later.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Nonblocking I/O 308
$set = IO::LineBufferedSet->new([$listen])
Creates a new IO::LineBufferedSet object. As in IO::SessionSet->new(), optional listen socket will be
monitored for incoming connections.
@sessions = $set->wait([$timeout])
As in IO::SessionSet->wait(), select() accesses the monitored filehandles and returns those ses-
sions that are ready for reading. However, the returned sessions are IO::LineBufferedSessionData objects that
support line-oriented I/O.
$bytes = $session->getline($data)
Reads a line of data from the associated filehandle, placing it in $data and returning the length of the line. On
end of file, it returns 0. On EWOULDBLOCK, it returns 0E0. On other I/O errors, it returns undef.
The code for these modules is essentially an elaboration of the simpler IO::Getline module that we
discussed earlier in this chapter. Because it doesn't add much to what we have already learned, we
won't walk through the code in detail. Appendix A shows the full code listing for these two modules.
As IO::Getline did, IO::LineBufferedSessionData uses a strategy of maintaining an internal buffer
of data to hold partial lines. When its getline() method is called, we look here first for a full line
of text. If one is found, then getline() returns it. Otherwise, getline() calls sysread() to add
data to the end of the buffer and tries again.
However, maintaining this internal buffer leads to the same problem that standard I/O has when
used in conjunction with select(). The select() call may indicate that there is no new data to
read from a handle when in fact there is a full line of text saved in the buffer. This means that we
must modify our select() strategy slightly. This is done by IO::LineBufferedSet, a subclass of
IO::SessionSet modified to work correctly with IO::LineBufferedSessionData. IO::LineBufferedSet
overrides its parent's wait() method to look like this:
sub wait {
my $self = shift;
# look for old buffered data first
my @sessions = grep {$_->has_buffered_data} $self->sessions;
return @sessions if @sessions;
return $self->SUPER::wait(@_);
}
The wait() method calls sessions() to return the list of session objects being monitored. It now
filters this list by calling a new has_buffered_data() method, which returns true if the get-
line() method's internal data buffer contains one or more complete lines to read.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Nonblocking I/O 309
If there are sessions with whole lines to read, wait() returns them immediately. Otherwise, it falls
back to the inherited version of wait() (by invoking its superclass's method, SUPER::wait()),
which checks select() to see if any of the low-level filehandles has new data to read.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Nonblocking I/O 310
Licensed by
Stjepan Maric
4218908
Lines 1–8: Initialize script and process the command-line arguments—We begin by bringing in
the appropriate modules. To see status messages from IO::SessionSet as it manages the flow
of data, try setting $IO::SessionSet::DEBUG to a true value.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Nonblocking I/O 311
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Nonblocking I/O 312
If read() returns a false result, however, this indicates that STDIN has been closed. We pro-
ceed by calling the socket's shutdown() method to close the write side of the connection. This
causes the remote server to see an end-of-file condition and shut down its side of the socket,
causing $connection->read() to return a false result on a subsequent iteration of the loop.
This is a similar strategy to previous versions of this client.
This version of gab is 45 lines long, compared with 28 lines for the forking version of Figure 10.3
and 27 lines for the multithreaded version of Figure 11.3. This might not seem to be a large increase
in complexity, but it is supported by another 300 lines of code in the IO::SessionData and IO::Ses-
sionSet modules! This increase in size and complexity is typical of what happens when moving from
a blocking, multithreaded, or multitasking architecture to a nonblocking single-threaded design.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Nonblocking I/O 313
The timeout for accepts is applied by IO::Socket at the time that accept() is called. The following
bit of code creates a listening socket with a timeout of 5 seconds and then enters a loop awaiting
incoming connections. Because of the timeout, accept() waits at most 5 seconds for an incoming
connection, returning either the connected socket object, if one is available, or undef. In the latter
case, the loop prints a warning and returns to the top of the loop. Otherwise, it processes the con-
nected socket as usual.
$sock = IO::Socket::INET->new( LocalPort => 8000,
Listen => 20,
Reuse => 1,
Timeout => 5 );
while (1) {
my $connected = $sock->accept();
unless ($connected) {
warn "timeout! ($@)\n";
next;
}
# otherwise process connected socket
...
}
If accept() times out before returning a connection, $@ will contain "IO::Socket::INET: Operation
now in progress."
Nonblocking Connect()
In this section we look at how IO::Socket implements timeouts on the connect() call. This will help
you understand how to use nonblocking connect() in more sophisticated applications.
To accomplish a nonblocking connect using the IO::Socket module, you need to create an
IO::Socket object without allowing it to connect automatically, put it into nonblocking mode, and then
make the connect() call manually. This code fragment illustrates the idiom:
use IO::Socket;
use Errno qw(EWOULDBLOCK EINPROGRESS);
use IO::Select;
Because we're going to do the connect manually, we don't pass PeerAddr or PeerHost arguments
to the IO::Socket new() method, either of which would trigger a connection attempt. Instead we
provide Proto and Type arguments to ensure that a TCP socket is created. If the socket was created
successfully, we put it into nonblocking mode by passing a false argument to the blocking()
method. We now need to connect it explicitly by passing it to the connect() function. Because
connect() doesn't accept any of the naming shortcuts that the object-oriented new() method
does, we must explicitly create a packed Internet address structure using the sockaddr_in() and
inet_aton() functions discussed in Chapter 3 and use that as the second argument to con-
nect().
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Nonblocking I/O 314
Recall that connect() will return a result code indicating whether the connection was successful.
In a few cases, such as when connecting to the loopback address, a nonblocking connect succeeds
immediately and returns a true result. In most cases, however, the call returns a variety of nonzero
result codes. The most likely result is EINPROGRESS, which indicates simply that the nonblocking
connect is in progress and should be checked periodically for completion. However, various failure
codes are also possible; ECONNREFUSED, for instance, indicates that the remote host has refused
the connection.
If the connect() is immediately successful, we can proceed to use the socket without further ado.
Otherwise, we check the result code. If it is anything other than EINPROGRESS, the connect was
unsuccessful and we die:
unless ($result) { # potential failure
die "Can't connect: $!" unless $! == EINPROGRESS;
Otherwise, if the result code indicates EINPROGRESS, the connect is still in progress. We now have
to wait until the connection completes. Recall from Chapter 12 that select() will indicate that a
socket is marked as writable immediately after a nonblocking connect completes. We take advant-
age of this feature by creating a new IO::Select object, adding the socket to it, and calling its
can_write() method with a timeout. If the socket completes its connect before the timeout,
can_write() returns a one-element list containing the socket. Otherwise, it returns an empty list
and we die with an error message:
my $s = IO::Select->new($sock);
die "timeout!" unless $s->can_write($TIMEOUT);
If can_write() returns the socket, we know that the connect has completed, but we don't know
whether the connection was actually successful. It is possible for a nonblocking connect to return a
delayed error such as ECONNREFUSED. We can determine whether the connect was successful by
calling the socket object's connected() method, which returns true if the socket is currently con-
nected and false otherwise:
unless ($sock->connected) {
$! = $sock->sockopt(SO_ERROR);
die "Can't connect: $!"
}
}
If the result from connected() is false, then we probably want to know why the connect failed.
However, we can't simply check the contents of $!, because that will contain the error message
from the most recent system call, not the delayed error. To get this information, we call the socket's
sockopt() method with an argument of SO_ERROR to recover the socket's delayed error. This
returns a standard numeric error code, which we assign to $!. Now when we die with an error
message, the magical behavior of $! ensures that the error code will be displayed as a human-
readable message when used in a string context.
At the end of this block, we have a connected socket. We turn its blocking mode back on and proceed
to work with it as usual:
$sock->blocking(1);
# handle IO on the socket, etc.
...
Figure 13.8 shows the complete code fragment in the form of a subroutine named con-
nect_with_timeout(). You can call it like this:
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Nonblocking I/O 315
my $socket = connect_with_timeout($host,$port,$timeout);
If you examine the source code for IO::Socket, you will see that a very similar technique is used to
implement the Timeout option.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Nonblocking I/O 316
Because it isn't fancy, we won't do any rendering or browsing, but instead just retrieve a series of
URLs specified on the command line and store copies to disk. You might use this application to
mirror a set of pages locally. The program has the following structure:
1. Parse URLs specified on the command line, retrieving the hostnames and port numbers.
2. Create a set of nonblocking IO::Socket handles.
3. Initiate nonblocking connects to each of the handles and deal with any immediate errors.
4. Add each handle to an IO::Select set that will be monitored for writing, and select() across
them until one or more becomes ready for writing.
5. Send the request for the appropriate Web document and add the handle to an IO::Select set
that will be monitored for reading.
6. Read the document data from each of the handles in a select() loop, and write the data to
local files as the sockets become ready for reading.
In practice, steps 4, 5, and 6 can be combined in a single select() loop to increase parallelism
even further.
The script is basically an elaboration of the web_fetch.pl script that we developed in Chapter 5
(Figure 5.5). In addition to the nonblocking connects and the parallel downloads, we improve on the
first version by storing each retrieved document in a directory hierarchy based on its URL. For
example, the URL http://www.cshl.org/meetings/index.html will be stored in the current directory in
the file http://www.cshl.org/meetings/index.html.
In addition to generating the appropriate GET request, we will perform minimal parsing of the re-
turned HTTP header to determine whether the request was successful. A typical response looks
like this:
HTTP/1.1 200 OK
Date: Wed, 01 Mar 2000 17:00:41 GMT
Server: Apache/1.3.6 (UNIX)
Last-Modified: Mon, 31 Jan 2000 04:28:15 GMT
Connection: close
Content-Type: text/html
The important part of the response is the topmost line, which indicates the success or the failure
status of the request. The line begins with a protocol version code, in this case HTTP/1.1, followed
by the status code and the status message.
The status code is a three-digit integer indicating the outcome of the request. As described in
Chapter 9, there are a large number of status codes, but the one that we care about is 200, which
indicates that the request was successful and the requested document follows. If the client sees a
200 status code, it will read to the end of the header and copy the document body to disk. Otherwise,
it treats the response as an error. We will not attempt to process redirects or other fancy HTTP
features.
The script, dubbed web_fetch_p.pl, comes in two parts. The main script reads URLs from the com-
mand line and runs the select() loop. A helper module, named HTTPFetch, is used to track the
status of each URL fetch. It creates the outgoing connection, reads and parses the HTTP header,
and copies the returned document to disk. We'll look at the main script first (see Figure 13.9).
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Nonblocking I/O 317
Figure 13.9. The web_fetch script uses nonblocking connects to parallelize URL fetches
Lines 1–6: Initialize script—We begin by bringing in the IO::Socket, IO::Select, and HTTPFetch
modules. We also declare a global hash named %CONNECTIONS, which will be responsible for
maintaining the correspondence between sockets and HTTPFetch objects.
Lines 7–9: Create IO::Select objects—We now create two IO::Select sets, one for monitoring
sockets for reading and the other for monitoring sockets for writing.
Lines 10–15: Create the HTTPFetch connection objects—In the next section of the code, we
read a set of URLs from the command line. For each one, we create a new HTTPFetch object
by calling HTTPFetch->new() with the URL to fetch.
Behind the scenes, HTTPFetch->new() does a lot. It parses the URL, creates a TCP socket,
and initiates a nonblocking connection to the corresponding Web server host. If any of these
steps fail, new() returns undef and we skip to the next URL. Otherwise, new() returns a new
HTTPFetch object.
Each HTTPFetch object has a method called socket() that returns its underlying IO::Socket.
We will monitor this socket for the completion of the nonblocking connect. We add the socket to
the $writers IO::Select set, and remember the association between the socket and the
HTTPFetch object in the %CONNECTIONS array.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Nonblocking I/O 318
Line 16: Start the select loop—The remainder of the script is a select() loop. Each time
through the loop, we call IO::Select->select() on the $readers and $writers select
sets. Initially $readers is empty, but it becomes populated as each of the sockets completes
its connection.
Lines 17–22: Handle sockets that are ready for writing—We first deal with the sockets that are
ready for writing. This comprises those sockets that have either completed their connections or
have tried and failed. We index into %CONNECTIONS to retrieve the corresponding HTTPFetch
object and invoke the object's send_request() method.
This method checks first to see that its socket is connected, and if so, submits the appropriate
GET request. If the request was submitted successfully, send_request() returns a true result,
and we add the socket to the list of sockets to be monitored for reading. In either case, we don't
need to write to the socket again, so we remove it from the $writers select set.
Lines 23–30: Handle sockets that are ready for reading—The next section handles readable
sockets. These correspond to HTTPFetch sessions that have successfully completed their con-
nections and submitted their requests to the server.
Again, we use the socket as an index to recover the HTTPFetch object and call its read()
method. Internally, read() takes care of reading the header and body and copying the body
data to a local file. This is done in such a way that the read never blocks, preventing one slow
Web server from holding all the rest up.
The read() call returns a true value if it successfully read from the socket, or false in case of
a read error or an end of file. In the latter case, we're done with the socket, so we remove it from
$readers set and delete the socket from the %CONNECTIONS array.
Line 31: Finish up—The loop is done when no more handles remain in the $readers or
$writers sets. We check for this by calling the select objects' count() methods.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Nonblocking I/O 319
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Nonblocking I/O 320
Lines 1–7: Load modules—We begin by bringing in the IO::Socket, IO::File, and Carp modules.
We also import the EINPROGRESS constant from the Errno module and load the File::Path and
File::Basename modules. These import the mkpath() and dirname() functions, which we
use to create the path to the local copy of the downloaded file.
Lines 8–31: The new() constructor—The new() method creates the HTTPFetch object. Its
single argument is the URL to fetch. We begin by parsing the URL into its host, port, and path
parts using an internal routine named parse_url(). If the URL can't be parsed, we call an
internal method called error(), which sends an error message to STDERR and returns undef.
If the URL was successfully parsed, then we call our connect() method to initiate the non-
blocking connect. If an error occurs at this point, we again issue an error message and return
undef.
The next task is to turn the URL path into a local filename. In this implementation, we create a
local path based on the remote hostname and remote path. The local path is stored relative to
the current working directory. In the case of a URL that ends in a slash, we set the local filename
Licensed by
to index.html, simulating what Web servers normally do. This local filename ultimately becomes
an instance variable named localpath.
We now stash the original URL, the socket object, and the local filename into a blessed hash.
We also set up an instance variable named status, which will keep track of the state of the
Stjepan Maric
connection. The status starts out at "waiting." After the completion of the nonblocking connect,
it will be set to "reading header," and then to "reading body" after the HTTP header is received.
Line 32: The socket() accessor—The socket() method is a public routine that returns the
HTTPFetch object's socket.
4218908
Lines 33–41: The parse_url() method—The parse_url() method breaks an HTTP URL
into its components in two steps, first splitting the host:port and path parts, and then splitting the
host:port part into its two components. It returns a three-element list containing the host, port
number, and path.
Lines 42–55: The connect() method—The connect() method initiates a nonblocking con-
nect in the manner described earlier. We create an unconnected IO::Socket object, set its block-
ing status to false, and call its connect() method with the desired destination address. If
connect() indicates immediate success, or if connect() returns undef but $! is equal to
EINPROGRESS, we return the socket. Otherwise, some error has occurred and we return false.
Lines 56–68: The send_request() method—The send_request() method is called when
the socket has become writable, either because it has completed the nonblocking connect or
because an error occurred and the connection failed.
We first test the status instance variable and die if it isn't the expected "waiting" state—this would
represent a programming error, not that this could ever happen ;-). If the test passes, we check
that the socket is connected. If not, we recover the delayed error, stash it into $!, and return an
error message to the caller.
Otherwise the connection has completed successfully. We put the socket back into blocking
mode and attempt to write an appropriate GET request to the Web server. In the event of a write
error, we issue an error message and return undef. Otherwise, we can conclude that the request
was sent successfully and set the status variable to "reading header."
Lines 69–74: The read() method—The read() method is called when the HTTPFetch object's
socket has become ready for reading, indicating that the server has begun to send the HTTP
response. We look at the contents of the status variable. If it is "reading header," we call the
read_header() method. Otherwise, we call read_body().
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Nonblocking I/O 321
Nonblocking accept()
Aside from its use in implementing timeouts, nonblocking accept() is infrequently used. One ap-
plication of nonblocking accept() is in a server that must listen on multiple ports. In this case, the
server creates multiple listening sockets and select()s across them. select() indicates that
the socket is ready for reading if accept() can be called without blocking.
This code fragment indicates the idiom. It creates three sockets, bound to ports 80, 8000, and 8080,
respectively (these ports are typically used by Web servers):
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Nonblocking I/O 322
The main loop calls the IO::Select can_read() method, returning the list of sockets that are ready
to accept(). We call each ready socket's accept() method, and handle the connected socket
that is returned by turning on blocking again and passing it to some routine that handles the con-
nection.
It is possible for accept() to return undef and an error code of EWOULDBLOCK even if
select() indicates that it is readable. This can happen if the remote host terminated the connection
between the time that select() returned and accept() was called. In this case, we simply skip
back to the top of the loop and try again later.
while (1) {
my @ready = $listeners->can_read;
foreach (@ready) {
next unless my $connected = $_->accept();
$connected->blocking(1);
handle_connection($connected);
}
}
Summary
Nonblocking I/O is a double-edged sword. On the one hand, it makes it possible to write servers
that can process multiple simultaneous connections without spawning new processes or threads.
Compared to the multiprocessing solutions, nonblocking I/O has a slight performance edge and
consumes fewer system resources. Nonblocking I/O is also the only viable solution for creating
multiconnection servers that will run on platforms that do not support the fork() or thread APIs,
such as the Macintosh.
On the other hand, nonblocking I/O significantly increases the complexity of networking software.
Most of this complexity comes from the overhead of keeping track of partial writes and handling
EWOULDBLOCK errors from syswrite() and sysread() calls. The example programs presented
in this chapter are among the longest in this book and took a significant amount of time to develop
and debug. In my own development efforts, I almost always prefer multiprocessing or thread-based
solutions to nonblocking I/O.
Nonblocking I/O can also be used to avoid blocking during calls to connect() and accept().
These techniques allow you to implement timeouts on these calls and to parallelize connection
attempts without incurring a substantial increase in the size or complexity of your software.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
323
In Chapter 10 we developed subroutines that perform some of the startup time tasks that are com-
mon among production servers in the UNIX environment, including disconnecting from the control-
ling terminal, autobackgrounding, and writing a copy of the server's PID to a run-time file. Together,
these help to make network servers more manageable.
Because of their position as a gateway to entry to the host, network daemons are particularly prone
to opening security holes. There is much more that we can do to make network daemons bullet-
proof. In addition to the techniques already discussed, a production server often implements one or
more of the following useful features:
1. Log status messages to the system error log.
2. Change its UID to that of an unprivileged user.
3. Activate taint checking.
4. Use the chroot() call to isolate itself in a safe subdirectory.
5. Handle the HUP signal by reinitializing itself.
We cover these techniques in this chapter and talk more generally about security problems with
network daemons and how to avoid introducing them into your scripts.
Most of the techniques discussed here are UNIX-specific. However, users of the Windows and
Macintosh ports should read the subsection Direct Logging to a File in the first part of this chapter
and the Taint Mode section, which discusses security issues that are common to all platforms.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Bulletproofing Servers 324
Here we see messages from four daemons. Each message consists of a time stamp, the name of
the host the daemon is running on (pesto), the name and optional PID of the daemon, and a one-
line log status message.
The syslog system is a standard part of all common UNIX and Linux distributions. Windows NT/
2000 has a similar facility known as the Event Log, but it is less straightforward to use because its
log files use a binary format. However, the Win32::EventLog module, available from CPAN, makes
it possible for Perl scripts to read and write NT event logs. Alternatively, the free NTsyslog package
is a Windows interface to the UNIX syslog service. It is available at http://www.sabernet.net/soft-
ware/ntsyslog.html
A Facility
The "facility" describes the type of program that is sending the message. The facility is used to sort
the message into one or more log files or other destinations. Syslog defines these facilities:
• auth—user authorization messages
• authpriv—privileged user authorization messages
• cron—messages from the cron daemon
• daemon—messages from miscellaneous system daemons
• ftp—messages from the FTP daemon
• kern—kernel messages
• local0-local7—facilities reserved for local use
• lpr—messages from the printer system
• mail—messages from the mail system
• news—messages from the news system
• syslog—internal syslog messages
• uucp—messages from the uucp system
• user—messages from miscellaneous user programs
Network daemons generally use the daemon facility or one of the local0 through local7 facilities.
A Priority
Each syslog message is associated with a priority, which indicates its urgency. The syslog daemon
can sort messages by priority as well as by facility, with the intent that urgent messages get flagged
for immediate attention. The following priorities exist:
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Bulletproofing Servers 325
Message Text
Each message to syslog also carries a human-readable text message describing the problem. For
best readability in the log file, messages should not contain embedded newlines, tabs, or control
characters (a single newline at the end of the message is OK).
The syslog daemon can accept messages from either of two sources: local programs via a UNIX
domain socket (Chapter 22) and remote programs via an Internet domain socket using UDP (Chap-
ter 18). The former strategy is more efficient, but the latter is more flexible, because it allows several
hosts on the local area network to log to the same logging host. The syslog daemon may need to
be configured explicitly to accept remote connections; remote logging has been a source of security
breaches in the past.
Sys::Syslog
You can send messages to the syslog daemon from within Perl using the Sys::Syslog module, a
standard part of the Perl distribution.1 When you use Sys::Syslog, it imports the following four func-
tions:
The openlog() options consist of a space- or comma-separated list of the following key words:
• cons—write directly to the system console if the message can't be sent to the syslogd
• ndelay—open connection to syslogd immediately, rather than waiting for the first message to
be logged
• pid—include the process ID of the program in the log entry
• nowait—do not wait for log message to be delivered; return immediately
For example, to log entries under the name of " eliza," with PIDs printed and a facility of
local0, we would call openlog() this way:
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Bulletproofing Servers 326
openlog('eliza','pid','local0');
The syntax of the format string is identical to the format strings used by printf() and
sprintf() with the exception that the %m format sequence will be automatically replaced with the
value of $! and does not need an argument. The POD documentation for sprintf() explains the
syntax of the format string.
For example, this sends a message to the system log using the err priority:
syslog('err',"Couldn't open %s for writing: %m",$file);
If successful, syslog() returns the number of bytes written. Otherwise, it returns undef.
closelog()
This function severs the connection to syslogd and tidies up. Call it when you are through sending log mes-
sages. It is not strictly necessary to call closelog() before exiting.
setlogsock($socktype)
The setlogsock() function controls whether Sys::Syslog will connect to the syslog daemon via an Internet
domain socket or via a local UNIX domain socket. The $socktype argument may be either "inet," the default,
or "unix." You may need to call this function with the "unix" argument if your version of syslogd is not configured
to allow network messages.
setlogsock() is not imported by default. You must import it along with the default Sys::Syslog
functions in this manner:
use Sys::Syslog qw(:DEFAULT setlogsock);
For best results, call setlogsock() before the first call to openlog() or syslog().
In addition to these four subroutines, there is a fifth one called setlogmask(), which allows you
to set a mask on outgoing messages so that only those of a certain priority will be sent to syslogd.
Unfortunately, this function requires you to translate priority names into numeric bitmasks, which
makes it difficult to use.
There is also an internal variable named $Sys::Syslog::host, which controls the name of the
host that the module will log to in "inet" mode. By default, this is set to the name of the local host. If
you wish to log to a remote host, you may set this variable manually before calling openlog().
However, because this variable is undocumented, use it at your own risk.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Bulletproofing Servers 327
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Bulletproofing Servers 328
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Bulletproofing Servers 329
Lines 1–12: Module initialization —We load the Sys::Syslog module, along with POSIX, Carp,
IO::File, and File::Basename. The latter will be used to generate the program name used for
logging error messages. The rest of this section exports five functions: init_server(),
log_debug(), log_notice(), log_warn(), and log_die(). init_server() auto-
backgrounds the daemon, opens syslog, and does other run-time initialization. log_debug()
and its brethren will write log messages to the syslog at the indicated priority.
Lines 13–15: Define constants —We choose a default path for the PID file and a log facility of
local0.
Lines 16–24: init_server() subroutine —The init_server() subroutine performs server
initialization. We get a path for the PID file from the subroutine argument list, or if no path is
provided, we generate one internally. We then call open_pid_file() to open a new PID file
for writing, or abort if the server is already running.
Provided everything is successful so far, we autobackground by calling become_daemon() and
write the current PID to the PID file. At this point, we callinit_log() to initialize the syslog
system. We then return the current PID to the main program.
Lines 25–37: become_daemon() subroutine —This is almost the same subroutine we looked
at in Chapter 10, Figure 10.4. It autobackgrounds the server, closes the three standard filehan-
dles, and detaches from the controlling TTY. The only new feature is that the subroutine now
installs the CHLD signal handler for us, rather than relying on the main program to do so. The
CHLD handler is a subroutine named reap_child().
Lines 38–42 init_log() subroutine —This subroutine is responsible for initializing the syslog
connection. We begin by setting the connection type to a local UNIX-domain socket; this may
be more portable than the default "inet" type of connection. We recover the program's base
filename and use it in a call to openlog().
Lines 43–55: log_* subroutines —Rather than use the syslog() call directly, we define four
shortcut functions called log_debug(), log_notice(), log_warn(), and log_die().
Each function takes one or more string arguments in the manner of warn(), reformats them,
and calls syslog() to log the message at the appropriate priority. log_die() is slightly dif-
ferent. It logs the message at the crit level and then calls die() to exit the program.
The _msg() subroutine is used internally to format the log messages. It follows the conventions
of warn() and die(). The arguments are first concatenated using the current value of the
output record separator variable, $ \, to create the error message. If the message does not end
in a newline, we append the phrase " at $filename line $line " to it, where the two
variables are the filename and line number of the line of the calling code derived from the built-
in caller() function.
Lines 56–59: getpidfilename() subroutine —This subroutine returns a default name for the
PID file, where we store the PID of the server while it is running. We invoke basename to remove
the directory and " .pl " extension from the script, and concatenate it with the PIDPATH direc-
tory.
Lines 60–71: open_pid_file() subroutine —This subroutine is identical to the original ver-
sion that we developed in Chapter 10, Figure 10.5.
Lines 72–74: reap_child() subroutine —This is the now-familiar CHLD handler that calls
waitpid() until all children have been reaped.
Line 75: END{} block —The package's END{} block unlinks the PID file automatically when the
server exits. Since the server forks, we have to be careful to remove the file only if its current
PID matches the PID saved during server initialization.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Bulletproofing Servers 330
With the Daemon module done, we can simplify the psychotherapist daemon code and add event
logging at the same time Figure 14.2).
Licensed by
Stjepan Maric
4218908
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Bulletproofing Servers 331
Lines 1–6: Load modules —We load the Chatbot::Eliza and IO::Socket modules, as well as the
new Daemon module. We also define the default port to listen on.
Lines 7–8: Install signal handlers —We install a signal handler for the TERM and INT signals,
which causes the server to shut down normally. This gives the Daemon module time to unlink
the PID file in its END{} block.
Note that we no longer install a CHLD handler, because this is now done in the
init_server() subroutine.
Lines 9–15: Open listening socket and initialize server —We open a listening TCP socket on the
port indicated on the command line and die on failure. We then call init_server() to initialize
logging and autobackground, and store the returned PID into a global variable. Once this sub-
routine returns, we are in the background and can no longer write to standard error.
Line 16: Log startup message —We call log_notice() to write an informational message to
the system log.
Lines 17–28: Accept loop —We now enter the server's accept loop. As in previous iterations of
this server, we accept an incoming connection and fork a new process to handle it. A new feature,
however, is that we log each new incoming connection using this fragment of code:
my $host = $connection->peerhost;
log_notice("Accepting a connection from %s\n",$host);
We call the connected IO::Socket object's peerhost() method to return the dotted-quad form
of the remote host's IP address and send syslog a message indicating that we've accepted a
connection from that host. Later, after the child process finishes processing the connection with
interact(), using a similar idiom we log a message indicating that the connection is complete.
The other change from the original version of the server is that we indicate a failure of the
fork() call by invoking log_die() to log a critical message and terminate the process.
Lines 29–42: The interact() and _testquit() subroutines —These are identical to the
subroutines introduced in Chapter 10.
Lines 43–45: END{} block —At shutdown time, we log an informational message indicating that
the server is exiting. As in the earlier versions, we must be careful to check that the process ID
matches the parent's. Otherwise, each child process will invoke this code as well and generate
confusing log messages. The Daemon module's END{} block takes care of unlinking the PID file.
When we run this program, we see log entries just like the following:
Jun 2 23:12:36 pesto eliza_log.pl [14893]:
Server accepting connections on port 12005
Jun 2 23:12:42 pesto eliza_log.pl [14897]:
Accepting a connection from 127.0.0.1
Jun 2 23:12:48 pesto eliza_log.pl[14897]:
Connection from 127.0.0.1 finished
Jun 2 23:12:49 pesto eliza_log.pl[14899]:
Accepting a connection from 192.168.3.5
Jun 2 23:13:02 pesto eliza_log.pl[14901]:
Accepting a connection from 127.0.0.1
Jun 2 23:13:19 pesto eliza_log.pl[14899]:
Connection from 192.168.3.5 finished
Jun 2 23:13:26 pesto eliza_log.pl[14801]:
Connection from 127.0.0.1 finished
Jun 2 23:13:39 pesto eliza_log.pl[14893]:
Server exiting normally
Notice that the log messages indicating that the server is starting and stopping are logged with the
parent's PID, while the messages about individual connections are logged with various child PIDs.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Bulletproofing Servers 332
This represents a warning from Perl regarding a missing DESTROY subroutine in the Chatbot::Eliza
object. The fix is trivial; I just add a dummy DESTROY definition to the bottom of the server script file:
sub Chatbot::Eliza::DESTROY { }
I hadn't been aware of this warning in the earlier incarnations of the server because the standard
error was closed and the diagnostic was lost. This illustrates the perils of not logging everything!
EventType indicates the type and severity of the error. It should be one of the following constants:
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Bulletproofing Servers 333
$log->Close()
The Close() method closes and cleans up the EventLog object.
This example writes an informational message to the Application log on the local machine.
use Win32::EventLog;
$boolean = flock(FILEHANDLE,$how);
The first argument is a filehandle open on the file you wish to lock, and the second is a numeric constant
indicating the locking operation you wish to perform (Table 14.1).
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Bulletproofing Servers 334
Shared locks created with LOCK_SH can be held by several processes simultaneously and are used
when there are multiple readers on a file. LOCK_EX is the type of lock we will use, because it can
only be held by a single process at a time and is suitable for locking a file that you wish to write to.
These three constants can be imported from the Fcntl module using the :flock tag.
We rewrite the log_debug(), log_notice(), and log_warn() functions to write suitably
formatted messages to the filehandle. As an added frill, we'll make these functions respect an in-
ternally defined $PRIORITY package variable so that only those messages that equal or exceed
the priority are written to the log. This allows you to log verbosely during development and debugging
but restricts logging to error messages after deployment.
An example of this scheme is shown in Figure 14.3, which defines a small module called LogFile.
Here is a synopsis of its use:
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Bulletproofing Servers 335
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Bulletproofing Servers 336
#!/usr/bin/perl
use LogFile;
init_log('/usr/local/logs/mylog.log') or die "Can't log!";
log_priority(NOTICE);
log_debug("This low-priority debugging statement will not be
seen.\n");
log_notice("This will appear in the log file.\n");
log_warn("This will appear in the log file as well.\n");
die "This is an overridden function.\n";
After loading LogFile, we call init_log() and pass it the pathname of the log file to use. We then
call log_priority with an argument of NOTICE, to suppress all messages of a lower priority. We
then log some messages at different priorities, and finally die() to demonstrate that warn() and
die() have been overridden. After running this test program, the log file shows the following entries:
Wed Jun 7 09:09:52 2000 [notice] This will appear in the log file.
Wed Jun 7 09:09:52 2000 [warning] This will appear in the log file
as well.
Wed Jun 7 09:09:52 2000 [critical] This is an overridden function.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Bulletproofing Servers 337
• To open a socket on a privileged port, you want the application to bind a well-known port in the
reserved 1–1023 range, for example the HTTP (Web) port number 80. On UNIX systems the
application must be running as root to dothis.
• To open a log or PID file, you want to create a log or PID file in a privileged location, such as /
var/run. The application must be running as root to create the file and open it for writing.
Even though a particular network application must start as root in order to open privileged ports or
files, it generally isn't desirable to remain running as root. Because of their accessibility to the out-
side, network servers are extremely vulnerable to exploitation by untrusted individuals. Even minor
bugs, if exploited in the proper way, can lead to security breaches. For example, the server can be
fooled into executing system commands on the untrusted user's behalf or inadvertently passing
information about the system back to the remote user.
The severity of these breaches increases dramatically if the server is running as root. Now the
remote user can exploit the server to run system commands with root privileges or to read and write
files that the nonprivileged user would not ordinarily have access to, such as the system password
file.
In general, it is a good idea to relinquish root privileges as soon as possible, and at the very least
before processing any incoming data. Once the socket or file in question is opened, the application
can relinquish its privileges, becoming an ordinary user. However, the socket or filehandle opened
during initialization will continue to be functional.
This allows setuid programs to switch back and forth between the real UIDs and EUIDs. A setuid
program may relinquish its ability to switch between real UIDs and EUIDs by doing a simple as-
signment of its EUID to its real UID. Then the program is no longer allowed to change its EUID:
$< = $>;
The previous discussion of swapping the real and effective UIDs is valid only for UNIX variants that
support the setreuid() C library call. In addition, the setuid bit is only effective when Perl has
been configured to recognize and honor it.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Bulletproofing Servers 338
There is a similar distinction between real and effective group IDs. The root user is free to change
the effective group ID to anything it pleases. Anything it does thereafter will take place with the
privileges of the effective GID. An unprivileged user cannot, in general, change the effective group
ID. However, setgid programs, which take on the effective group of their group ownership by virtue
of the setgid permission bit being set, can swap their real and effective group IDs.
Most modern UNIX systems also support the idea of supplementary groups, which are groups to
which the user has privileges, but which are not the user's primary group. On such systems, when
you retrieve the value of $( or $), you get a space-delimited string of numeric GIDs. The first GID
is the user's real or effective primary group, and the remainder are the supplementary groups.
Changing group IDs in Perl can be slightly tricky. To change the process's real primary group, assign
a single number (not a list) to the $( variable. To change the effective group ID, assign a single
number to $). To change the list of supplementary groups as well, assign a space-separated list of
group IDs to $). The first number will become the new effective GID, and the remainder, if any, will
become the supplementary groups. You may force the list of supplementary groups to an empty list
by repeating the effective GID twice, as in:
$) = '501 501';
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Bulletproofing Servers 339
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Bulletproofing Servers 340
1. New USER and GROUP constants —At the top of the file, we change the PORT constant to
1002 and the PIDFILE constant to a file located in /var/run. We then define two new con-
stants, USER and GROUP, which contain names of the user and group that the server will run
as. These must correspond to valid entries in your /etc/passwd and /etc/group files—
change them as necessary for your system.
use constant PORT => 1002;
use constant PIDFILE => '/var/run/eliza_root.pid';
use constant USER => 'nobody';
use constant GROUP => 'nogroup';
2. Pass USER and GROUP to init_server()—After opening the listening socket, we call
init_server() using its new three-argument form, passing it the PID filename and the
values of the USER and GROUP constants.
my $pid = init_server(PIDFILE,USER,GROUP);
Licensed by
3. Children set real UID to effective UID before processing connections—This is the most im-
portant modification. After accepting an incoming connection and forking, but before reading
any data from the connected socket, the child process sets the real UID to the effective UID,
thereby permanently relinquishing its ability to regain root privileges.
Stjepan Maric
while (my $connection = $listen_socket->accept) {
my $host = $connection->peerhost;
log_die("Can't fork: $!") unless defined (my $child = fork());
if ($child == 0) {
$listen_socket->close;
$< = $>; # set real UID to effective UID
4218908
log_notice("Accepting a connection from $host\n");
interact($connection);
...
If we try to launch the modified server as an unprivileged user, it fails with an error message when
it is unable to open the reserved port. If we log in as the superuser and then launch the server, it
successfully opens the port and create the PID file (which will be owned by the root user and group).
If we run the ps command after launching the server, we see that the main server and its children
run as nobody:
nobody 2279 1.0 6.6 5320 4172 S 10:07 0:00 /usr/bin/perl eliza_root.pl
nobody 2284 0.5 6.7 5368 4212 S 10:07 0:00 /usr/bin/perl eliza_root.pl
nobody 2297 1.0 6.7 5372 4220 S 10:08 0:00 /usr/bin/perl eliza_root.pl
The risk of the server's inadvertently damaging your system while running as root is now restricted
to those files, directories, and commands that the nobody user has access to.
Taint Mode
Consider a hypothetical network server whose job includes generating e-mail to designated recip-
ients. Such a server might accept e-mail addresses from a socket and pass those addresses to the
UNIX Sendmail program. The code fragment to do that might look like this:
chomp($email =<$sock>);
system "/bin/mail $email <Mail_Message.txt";
After reading the e-mail address from the socket, we call system() to invoke /usr/lib/sendmail with
the desired recipient's address as argument. The standard input to sendmail is redirected from a
canned mail message file.
This script contains a security hole. A malicious individual who wanted to exploit this hole could pass
an e-mail address like this one:
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Bulletproofing Servers 341
Because system() invokes a subshell (a command interpreter such as /bin/sh) to do its work, all
shell metacharacters, including the semicolon and redirection symbols, are honored. Instead of
doing what its author intended, this command mails the entire system password file to the indicated
e-mail address!
This type of error is easy to make. One way to alleviate it is to pass system() and exec() a list
of arguments rather than giving it the command and its arguments as a single string. When you do
this, the command is executed directly rather than through a shell. As a result, shell metacharacters
are ignored. For example, the fragment we just looked at can be made more secure by replacing it
with this:
chomp($email = <$sock>);
open STDIN, "Mail_Message.txt";
system "/bin/mail",$email;
We now call system() using two separate arguments for the command name and the e-mail ad-
dress. Before we invoke system(), we reopen STDIN on the desired mail message so that the
mail program inherits it.
Other common traps include creating or opening files in a world-writable directory, such as /tmp.
A common intruder's trick is to create a symbolic link leading from a file he knows the server will try
to write to a file he wants to overwrite. This is a problem particularly for programs that run with root
privileges. Consider what would happen if, while running as root, the psychiatrist server tried to open
its PID file in /usr/tmp/eliza.pid and someone had made a symbolic link from that filename
to /etc/passwd—the server would overwrite thesystem file, with disastrous results. This is one
reason that our PID-file-opening routines always use a mode that allows the attempt to succeed if
the file does not already exist.
Unfortunately, there are many other places that such bugs can creep in, and it's difficult to identify
them all manually. For this reason, Perl offers a security feature called "taint mode." Taint mode
consists of a series of checks on your script's data processing. Every variable that contains data
received from outside the script is marked as tainted, and every variable that such tainted data
touches becomes tainted as well.
Tainted variables can be used internally, but Perl forbids them from being used in any way that might
affect the world outside the script. For example, you can perform a numeric calculation on some
data received from a socket, but you can't pass the data to the system() command.
Tainted data includes the following:
• The contents of %ENV
• Data read from the command line
• Data read from a socket or filehandle
• Data obtained from the backticks operator
• Locale information
• Results from the readdir() and readlink() functions
• The gecos field of the getpw* functions, since this field can be set by users
Tainted data cannot be used in any function that affects the outside world, or Perl will die with an
error message. Such functions include:
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Bulletproofing Servers 342
Chances are the first time you try this, the script will fail at an early phase with the mes-
sage "Insecure path..." or "Insecure dependency...". To avoid messages about PATH
and other tainted environment variables, you need to explicitly set or delete them during initialization.
For the psychotherapist server, we can do this during the become_daemon() subroutine, since we
are already explicitly setting PATH:
sub become_daemon {
...
$ENV{PATH} = '/bin:/sbin:/usr/bin:/usr/sbin';
delete @ENV{'IFS', 'CDPATH', 'ENV', 'BASH_ENV'};
...
}
Having made this change, the psychotherapist daemon seems to run well until one particular cir-
cumstance arises. If the daemon is terminated abnormally, say by a kill -9, the next time we try to
run it, the open_pid_file() routine will detect the leftover PID file and check whether the old
process is still running by calling kill() with a 0 signal:
my $pid = $fh>; croak "Server already running with PID $pid" if kill 0 => $pid;
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Bulletproofing Servers 343
The reason for this error is clear. The value of $pid was read from the leftover PID file, and since
it is from outside the script, is considered tainted. kill() affects the outside world, and so is pro-
hibited from operating on tainted variables. In order for the script to work, we must somehow untaint
$pid.
There is one and only one way to untaint a variable. You must pattern match it using one or more
parenthesized subexpressions and extract the subexpressions using the numbered variables $1,
$2, and so forth. Seemingly equivalent operations, such as pattern substitution and assigning a
pattern match to a list, will not work. Perl assumes that if you explicitly perform a pattern match and
then refer to the numbered variables, then you know what you're doing. The extracted substrings
are not considered tainted and can be passed to kill() and other unsafe calls. In our case, we
expect $pid to contain a positive integer, so we untaint it like this:
sub open_pid_file {
...
my $pid = $fh>;
croak "Invalid PID file" unless $pid =~ /^(\d+)$/;
croak "Server already running with PID $1" if kill 0 => $1;
...
}
We pattern match $pid to /^(\d+)$/and die if it fails. Otherwise, we call kill() to send the
signal to the matched expression, using the untainted $1 variable. We will use taint mode in the last
iteration of the psychotherapist server at the end of this chapter.
As this example shows, even tiny programs like the psychotherapist server can contain security
holes (although in this case the holes were very minor). Taint mode is recommended for all nontrivial
network applications, particularly those running with superuser privileges.
Using chroot()
Another common technique for protecting the system against buggy servers involves the
chroot() call. chroot() takes a single argument containing a directory path and changes the
current process so that this path becomes the top-level directory (" / "). The effects of chroot()
are irrevocable. Once the new top-level directory has been established, the program cannot see
outside it or affect files or directories above it. This is a very effective technique for insulating the
script from sensitive system files and binaries.
chroot() does not change the current working directory. Ordinarily you will want to chdir() into
part of the restricted space before calling chroot(). chroot() can be called only when the pro-
gram is running with root privileges and is available only on UNIX systems. It is most frequently used
by programs that need to run a lot of external commands or are particularly powerful. For example,
the FTP daemon can be configured to allow anonymous users access to a restricted part of the
filesystem. To enforce this restriction, FTP calls chroot() soon after the anonymous user logs in,
changing the top-level directory to the designated restricted area.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Bulletproofing Servers 344
For the purposes of this example, we use /home/ftp as the directory to chroot() to. This is the
same directory used for anonymous FTP on Linux systems and is unlikely to contain confidential
material or vulnerable files.
After calling chroot(), the script is quite effectively sealed off from the rest of the system. Like an
explorer entering an undeveloped wilderness, your script must bring with it everything it needs,
including configuration files, external utilities, and Perl libraries. These need to be placed in the
chroot() destination directory, and all hard-coded path names in your script have to be adjusted
to reflect what the filesystem will look like after the destination directory becomes top level. For
example, the file that lived at /home/ftp/bin/ls before chroot() becomes /bin/ls after a
chroot() to the /home/ftp directory.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Bulletproofing Servers 345
If the script launches other programs during its operation, they too will be subject to the
chroot() restrictions. This means that any dependencies that they have, including configuration
files and dynamically linked libraries, must be copied into the chroot() directory.
As a concrete example of this, when I first ran the program with this modification, everything seemed
to be fine until the Chatbot::Eliza module tried to issue a warning message, at which point a message
appeared in the system log warning me that Perl couldn't load Carp::Heavy, an internal component
of the Carp module. Apparently this module isn't loaded automatically when you use Carp but is
loaded dynamically the first time that Carp is needed. However, because the Perl library tree became
unavailable as soon as chroot() was called, it could not be loaded. The solution I chose was to
explicitly use Carp::Heavy in the Daemon module thereby preloading it. Another solution would
have been to copy this file into the appropriate location under /home/ftp/lib.
Watch for this, particularly if you use Perl's Autoloader facility. Autoloader's strategy of delaying
compilation of .pm files until needed means that all Autoloader-processed .al files must be acces-
sible to the script within its chroot() environment.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Bulletproofing Servers 346
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Bulletproofing Servers 347
Lines 1–12: Module initialization and constants—We add the -T switch to the top line of the file,
turning on Perl's taint mode. We define ELIZA_HOME and other constants.
Lines 13–14: Install TERM and HUP handler—We install the subroutines do_term() and
do_hup() as the handlers for the TERM and HUP signals, respectively. We also install
do_term() as the handler for INT.
Line 15: Fetch port from command line—We modify this line slightly so that the port argument
remains in @ARGV rather than being shifted out of it. This is so that the do_relaunch() routine
(which we look at later) will continue to have access to the command-line arguments.
Lines 16–43: Socket initialization, main loop, and connection handling—The only change is in
line 25, where instead of calling fork() directly, we call launch_child(), a new function
defined in Daemon. This subroutine forks, calls chroot(), and abandons root privileges, as in
previous versions of the script. In addition to these functions, launch_child() keeps track of
the spawned child PIDs so that we can terminate them gracefully when the server receives a
HUP or termination signal.
launch_child() takes two optional arguments: a callback routine to invoke when the child
dies and a directory path to chroot() to. The first argument is a code reference. It is invoked
by the Daemon module's CHLD handler after calling waitpid() to give our code a chance to
do any additional code. We don't need this feature in this example, so we leave the first argument
blank (we'll use it in Chapter 16, when we revisit Daemon). We do, however, want
launch_child() to chroot() for us, so we provide ELIZA_HOME in the second argument.
Lines 44–48: do_term() TERM handler —The TERM handler logs a message to the system log
and calls a new subroutine named kill_children() to terminate all active connections. This
subroutine is defined in the revised Daemon module. After kill_children() returns, we exit
the server.
Lines 49–58: do_hup() HUP handler—We close the listening socket, terminate active con-
nections with kill_children(), and then call do_relaunch(), another new subroutine de-
fined in the Daemon module. do_relaunch() will try to reexecute the script and won't return
if it is successful. If it does return, we die with an error message.
Lines 59–65: Patches to Chatbot::Eliza—As we've done before, we redefine the Chat-
bot::Eliza::_testquit() subroutine in order to correct a bug in its end-of-file detection.
We also define an empty Chatbot::Eliza::DESTROY() subroutine to quash an annoying
warning that appears when running this script under some versions of Perl.
Lines 66–68: Log normal termination—We log a message when the server terminates, as in
earlier versions.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Bulletproofing Servers 348
• SIG_BLOCK—The signals indicated by the signal set are added to the process signal mask, blocking them.
• SIG_UNBLOCK—The signals indicated by the signal set are removed from the signal mask, unblocking
them.
• SIG_SETMASK—The process signal mask is cleared completely and replaced with the signals indicated
by the signal set.
Signal sets can be created and examined using a small utility class called POSIX::SigSet, which
manipulates sets of signals in much the same way that IO::Select manipulates sets of filehandles.
To create a new signal set, call POSIX::SigSet->new() with a list of signal constants. The con-
stants are named SIGHUP, SIGTERM, and so forth:
$signals = POSIX::SigSet->new(SIGINT,SIGTERM,SIGHUP);
sigprocmask() returns a true value if successful; otherwise, it returns false. See the POSIX POD
pages for other set operations that one can perform with the POSIX::SigSet class.
Let's walk through the new Daemon module (Figure 14.7).
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Bulletproofing Servers 349
Figure 14.7. Daemon module with support for restarting the server
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Bulletproofing Servers 350
Lines 1–21: Module setup—The only change is the importation of a new set of POSIX functions
designated the " :signal_h " group. These functions provide the facility for temporarily blocking
signals that we will use in the launch_child() subroutine.
Lines 22–33: init_server() subroutine—This subroutine is identical to previous versions.
Lines 34–47: become_daemon() subroutine—This subroutine is identical to previous versions
in all but one respect. Before calling chdir() to make the root directory our current working
directory, we remember the current directory in the package global $CWD. This allows us to put
things back the way they were before we relaunch the server.
Lines 48–55: change_privileges() subroutine—This is identical to previous versions.
Lines 56–70: launch_child() subroutine—The various operations of forking and initializing
the child server processes are now consolidated into a launch_child() subroutine. This
subroutine takes a single argument, a directory path which, if provided, is passed to pre-
pare_child() for the chroot() call.
Licensed by
We begin by creating a new POSIX::SigSet containing the INT, CHLD, TERM, and HUP signals,
and try to fork. On a fork error, we log a message. If the returned PID is greater than 0, we are
in the parent process, so we add the child's PID to %CHILDREN. In the child process, we reset
the four signal handlers to their default actions and call prepare_child() to set user privileges
Stjepan Maric
and change the root directory.
Before exiting, we unblock any signals that have been received during this period and return the
child PID, if any, to the caller. This happens in both the parent and the child.
Lines 71–79: prepare_child() subroutine—This subroutine is identical to the previous ver-
4218908
sions, except that the chroot() functionality is now conditional on the function's being passed
a directory path. In any case, the subroutine overwrites the real UID with the effective UID,
abandoning any privileges the child process inherited from its parent.
Lines 80–85: reap_child() subroutine—This subroutine is the CHLD handler. We call wait-
pid() in a tight loop, retrieving the PIDs of exited children. Each process reaped in this way is
deleted from the %CHILDREN global in order to maintain an accurate tally of the active connec-
tions.
Lines 86–90: kill_children() subroutine—We send a TERM signal to each of the PIDs of
active children. We then enter a loop in which we sleep() until the %CHILDREN hash contains
no more keys. The sleep() call is interrupted only when a signal is received, typically after an
incoming CHLD. This is an efficient way for the parent to wait until all the child connections have
terminated.
Lines 91–99: do_relaunch() subroutine—The job of do_relaunch() is to restore the en-
vironment to a state as similar to the way it was when the server was first launched as possible,
and then to call exec() to replace the current process with a new instance of the server.
We begin by regaining root privileges by setting the effective UID to the real UID. We now want
to restore the original working directory. However, we are running in taint mode, and the
chdir() call is taint sensitive. So we pattern match on the working directory saved in $CWD and
call chdir() on the extracted directory path.
Next we must set up the arguments to exec(). We get the server name from $0 and the port
number argument from $ARGV[0]. However, these are also tainted and cannot be passed di-
rectly to exec(), so we must pattern match and extract them in a similar manner. When the
new server starts up, it will complain if there is already a PID file present, so we unlink the file.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Bulletproofing Servers 351
Finally, we invoke exec() with all the arguments needed to relaunch the server. The first ar-
gument is the name of the Perl interpreter, which exec() will search for in the (safe) PATH
environment variable. The second is the -T command-line argument to turn on taint mode. The
remaining arguments are the script name, which we extracted from $0, and the port argument.
If successful, exec() does not return. Otherwise, we die with an error message.
Lines 100–142: Remainder of module—The remainder of the module is identical to earlier ver-
sions.
The following is a transcript of the system log showing the entries generated when I ran the revised
server, connected a few times from the local host, and then sent the server an HUP signal. After
connecting twice more to confirm that the relaunched server was operating properly, I sent it a
TERM signal to shut it down entirely.
Jun 13 05:54:57 pesto eliza_hup.pl[8776]:
Server accepting connections on port 1002
Jun 13 05:55:51 pesto eliza_hup.pl[8808]:
Accepting a connection from 127.0.0.1
Jun 13 05:56:01 pesto eliza_hup.pl[8810]:
Accepting a connection from 127.0.0.1
Jun 13 05:56:08 pesto eliza_hup.pl[8776]:
HUP signal received, reinitializing...
Jun 13 05:56:08 pesto eliza_hup.pl[8776]:
Closing listen socket...
Jun 13 05:56:08 pesto eliza_hup.pl[8776]:
Terminating children...
Jun 13 05:56:08 pesto eliza_hup.pl[8776]:
Trying to relaunch...
Jun 13 05:56:10 pesto eliza_hup.pl[8811]:
Server accepting connections on port 1002
Jun 13 05:56:14 pesto eliza_hup.pl[8815]:
Accepting a connection from 127.0.0.1
Jun 13 05:56:19 pesto eliza_hup.pl[8815]:
Connection from 127.0.0.1 finished
Jun 13 05:56:26 pesto eliza_hup.pl[8817]:
Accepting a connection from 127.0.0.1
Jun 13 05:56:28 pesto eliza_hup.pl[8811]:
TERM signal received, terminating children...
Jun 13 05:56:28 pesto eliza_hup.pl[8811]:
Server exiting normally
You can easily extend this technique to other signals. For example, you could use USR1 as a mes-
sage to activate verbose logging and USR2 to go back to normal logging.
Summary
Because network daemons are intended to run in an unattended fashion for long periods of time,
it's worth investing a little extra time to make the code bullet-proof. This chapter presented some of
the common techniques for increasing the stability, manageability, and security of network dae-
mons.
Logging, whether directly to a file or to a standard logging daemon, allows you to monitor the status
of the daemon and to detect exceptional conditions.
Privilege manipulation enables daemons to perform certain startup and shutdown tasks as privi-
leged users, but to abandon those privileges before interacting with untrusted network clients. This
avoids the daemon's inadvertently damaging the host (whether on its own or encouraged by a hostile
attacker).
Taint checking activates a mode in which the script checks for common unsafe operations, such as
passing untrusted data from the network to an external command. This closes the most common
security hole in Perl-based network servers.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Bulletproofing Servers 352
The chroot() call seals the server into a subdirectory, insulating it from the rest of the filesystem.
This helps to harden servers that manipulate files.
Finally, one often needs some way to reconfigure a running server. For those servers that run from
configuration files, the most common technique is to send it an HUP signal. The chapter closed with
an example of how to handle HUP in a forking server by the simple expedient of relaunching it.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
353
Chapters 9 through 12 demonstrated several techniques for handling concurrent incoming connec-
tions to a server application:
1. Serial —The server processes connections one at a time. This is typical of UDP servers,
because each transaction is short-lived, but it is distinctly uncommon for connection-oriented
servers.
2. Accept-and-fork —The server accepts connections and forks a new child process to handle
each one. This is the most common server design on UNIX systems and includes servers
launched by the inetd super daemon.
3. Accept-and-thread —The server accepts connections and creates new threads of execution
to handle each one. This can have better performance than accept-and-fork because the sys-
tem overhead to launch new threads is often less than it would be to launch new processes.
4. Multiplexed —The server uses select() and its own session state maintenance logic to
interweave the processing of multiple connections. This has excellent performance because
there's no process-launching overhead, but there is the cost of increased code complexity,
particularly if nonblocking I/O is used.
In most cases, one of these four architectures will meet your requirements. However, in certain
circumstances, particularly those in which the server must manage a heavy load, you should con-
sider more esoteric designs. This chapter discusses two additional server architectures: preforking
and prethreading.
Preforking
It's easiest to understand how a preforked server works by contrasting it with an accept-and-fork
server. As you recall from Chapter 6, accept-and-fork servers spend most of their time blocking in
accept(), waiting for a new incoming connection. When the connection comes in, the parent
wakes up just long enough to call fork() and pass the connected socket to its child. After forking,
the child process goes on to handle the connection, while the parent process goes back to waiting
for accept().
The core of an accept-and-fork server are these lines of code:
while ( my $c = $socket->accept ) {
my $child = fork;
die unless defined $child;
if ($child == 0) { # in child process
handle_connection($c);
exit 0;
}
close $c; # in parent process
}
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Preforking and Prethreading 354
This technique works well under typical conditions, but it can be a problem for heavily loaded serv-
ers. Here, connections come in so rapidly that the overhead from fork() call has a noticeable
impact, and the server may not be able to keep up with incoming connections. This is particularly
the case for Web server applications, which process many short requests that arrive in rapid-fire
succession.
A common solution to this problem is a technique called preforking. As the name applies, preforking
servers fork() themselves multiple times soon after launch. Each forked child calls accept()
individually, handles the incoming connection completely, and then goes back to waiting on
accept(). Each child may continue to run indefinitely or may exit after processing a predetermined
number of requests. The original parent process, meanwhile, acts as a supervisor for the whole
process, forking off new children when old ones die and shutting down all the children when the
time comes to terminate.
At its heart, a preforking server looks like this:
for (1..PREFORK_CHILDREN) {
next if fork; # parent process
do_child($socket); # child process
exit 0; # child never loops
}
sub do_child {
my $socket = shift;
my $connection_count = 0;
while (my $c = $socket->accept ) {
handle_connection($c);
close $c;
}
}
The main loop forks a number of children, passing the listening socket to each one. Each child
process calls accept() on the socket and handles the connection.
That's it in a nutshell, but many details make implementing a preforking server more complex than
this. The parent process has to wait on its children and launch new ones when they die; it has to
shut down its children gracefully when the time comes to terminate; signal handlers must be written
carefully so that signals intended for the parent don't get handled by the children and vice versa.
The server gets more complicated if you want it to adapt itself dynamically to serve the network by
maintaining fewer children when incoming traffic is light and more children when the traffic is heavy.
The next sections take you through the evolution of a preforking server from a simple but functional
version to a reasonably complex beast.
A Web Server
For the purposes of illustration, we will write a series of Web servers. These servers will respond to
requests for static files only and recognize only a handful of file extensions. Although limited, the
final product will be a fully functional server that you can communicate with through any standard
Web browser.
Each version of the server contains a few subroutines that handle the interaction with the client by
implementing a portion of the HTTP core protocol. Since they're invariant, we'll put these subroutines
together into a module called Web.
We discussed the HTTP protocol from the client's point of view in Chapters 9 and 12. When a
browser connects to the server, it sends an HTTP request consisting of a request method (typically
"GET") and the URL it wishes to fetch. This may be followed by optional header fields; the whole
request is then terminated by two carriage return/linefeed (CRLF) pairs. The server reads the re-
quest and translates the URL into the path to a physical file somewhere on the filesystem. If the file
exists and the client is allowed to fetch it, then the server sends a brief header followed by the file
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Preforking and Prethreading 355
contents. The header begins with a numeric status code indicating the success or failure of the
request, followed by optional fields describing the nature of the document that follows. The header
is separated from the file contents by another pair of CRLF sequences. The HEAD request is treated
in a similar fashion, but instead of returning the entire document, the server returns just the header
information.
Figure 15.1 lists the Web module.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Preforking and Prethreading 356
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Preforking and Prethreading 357
Lines 1–8: Module setup—The module declares the handle_connection() and doc-
root() functions for export. The former is the main entry point for Web transaction handling.
The latter is used to set the location of the "document root", the physical directory that corre-
sponds to the URL "/".
Lines 9–10: Declare global variables—Our only global variable is $DOCUMENT_ROOT, which
contains the path to the physical directory that corresponds to the topmost URL at the site. All
files served by the Web server will reside under this directory. We default to /home/www/
htdocs, but your script can call docroot() to change this location.
Like many line-oriented network protocols, HTTP terminates its lines with the CRLF sequence.
For readability, we define a $CRLF global that contains the correct character sequence.
Lines 11–32: The handle_connection() subroutine—Most of the work happens in han-
dle_connection(), which takes a connected socket as its argument and handles the entire
HTTP transaction. The first part of the subroutine reads the request by setting the line-end char-
acter ($/) to " $CRLF$CRLF " and invoking the <> operator.
Lines 16–19: Process request—The next section processes the request. It attempts first to parse
out the topmost line and extract the requested URL. If the request method isn't GET or HEAD,
or if the protocol the browser is using isn't HTTP/1.0 or HTTP/1.1, then the function sends an
error message to the browser by calling a subroutine named invalid_request(), and re-
turns. Otherwise, it calls the lookup_file() subroutine to try to open the requested file for
reading.
If lookup_file() is successful, it returns a three-element list that contains an open filehandle,
the type of the file, and its length. Otherwise, it returns an empty list and calls not_found() to
send an appropriate error message to the browser.
Another exceptional condition that the subroutine needs to deal with is the case of the browser
requesting a URL that ends in a directory name rather than a filename. Such URLs must end
with a slash, or else relative links in HTML documents, such as ../service_info.html, won't work
correctly. If the browser requests a URL that ends in a directory and the URL has no terminating
slash, then lookup_file() reports this case by returning a file type of "directory." In this
eventuality, the server calls a function named redirect() to tell the browser to reissue its
request using the URL with a slash appended.
Lines 20–24: Print header—If the requested document was opened successfully, handle_con-
nection() produces a simple HTTP header by sending a status line with a result code of 200,
followed by headers indicating the length and type of the document. This is terminated by a
CRLF pair. A real Web server would send other information as well, such as the name of the
server software, the current date and time, and the modification time of the requested file.
Lines 25–32: If the request was HEAD, then we're finished and we exit from the routine. Oth-
erwise, we copy the contents of the filehandle to the socket using a tight while() loop. When
the entire file has been copied to the socket, we close its filehandle and return.
Lines 33–48: lookup_file() subroutine—The lookup_file() subroutine is responsible for
translating a requested URL into a physical file path, gathering some information about the se-
lected file, and opening it, if possible. The subroutine is also responsible for making sure that
the browser doesn't try to play malicious tricks with the URL, such as incorporating double dots
into the path in order to move into a part of the filesystem that it doesn't have permission to
access.
Lines 35–39: Process URL— lookup_file() begins by turning the URL into a physical path
by prepending the contents of $DOCUMENT_ROOT to the URL. We then do some cleanup on the
URL. For example, the path may contain a query string (a " ? " followed by text) and possibly
an HTML fragment (a " # " followed by text). We strip out this information.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Preforking and Prethreading 358
The path may terminate with a slash, indicating that it is a directory. In this case, we append
index.html to the end of the path in order to retrieve the automatic "welcome page."
The last bit of path cleanup is to prevent the remote user from tricking us into retrieving files
outside the document root space by inserting relative path elements (such as " .. ") into the
URL. We defeat this by refusing to process paths that contain relative elements.
Line 40: Handle directory requests—Now we need to deal with requests for paths that end in
directory names (without the terminating slash). In this case, we must alert the caller of the fact
so that it can generate a redirect. We apply the -d directory test operator to the path; if the
operator returns true, we return a phony document type of "directory" to the caller.
Lines 41–45: Determine MIME type and size of document—The next part of the subroutine
determines the MIME type of the requested document. A real Web browser would have a long
lookup table of file extensions. We look for HTML, GIF, and JPEG files only and default to text/
plain for anything else.
The routine now retrieves the size of the requested file in bytes by calling stat(). Perl already
called stat() internally when it processed the -d switch, so there isn't any reason to repeat
the system call. The idiom stat(_) retrieves the buffered status information from that earlier
invocation, saving a small amount of CPU time. The file may not exist, in which case stat()
returns undef.
Lines 46–48: Open document—The last step is to open the file by calling IO::File->new().
There is another hidden trap here if the remote user includes shell metacharacters (such as "
> " or " | ") in the URL. Instead of calling new() with a single argument, which will pass these
metacharacters to the shell for processing, we call new() with two arguments: the filename and
the file mode (" < " for read). This inhibits metacharacter processing and avoids our inadvertently
launching a subprocess or clobbering a file if we're passed a maliciously crafted URL. If
new() fails, we return undef. Otherwise, the function returns a three-element list of the open
filehandle, the file type, and the file length.
Lines 49–66: Redirect() function—The redirect() function is responsible for sending a
redirection message to the browser. It's called when the browser asks for a URL that ends in a
directory and no terminal slash. The ultimate goal of the function is to transmit a document like
this one:
HTTP/1.0 301 Moved permanently
Location: http://192.168.2.1:8080/service_records/
Content-type: text/html
<HTML>
<HEAD><TITLE>301 Moved</TITLE></HEAD>
<BODY><H1>Moved</H1>
<P>The requested document has moved
<A HREF="http://192.168.2.1:8080/service_records/";>here</A>.</P>
</BODY>
</HTML>
The important part of the document is the status code, 301 for "moved permanently," the Loca-
tion field, which gives the full URL where the document can be found. The remainder of the
document produces a human-readable page for the benefit of some (extremely old) browsers
that don't recognize the redirect command.
The logic of redirect() is very straightforward. We recover the IP address of the server host
and the listening port by calling the connected socket's sockhost() and sockport() meth-
ods. We then generate an appropriate document based on these values.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Preforking and Prethreading 359
This version of redirect() suffers the minor esthetic deficiency of replacing the name of the
server host with its dotted IP address. You could fix this by calling gethostbyaddr() (Chapter
3) to turn this address into a hostname, probably caching the result in a global for performance
considerations.
Lines 67–93: invalid_request() and not_found() subroutines—The inva-
lid_request() and not_found() functions are very similar. invalid_request() returns
a status code of 400, which is the blanket code for "bad request". This is followed by a little HTML
document that explains the problem in human-readable terms. not_found() is similar but has
a status code of 404, used when the requested document is not available.
Lines 94–98: docroot() subroutine—The docroot() subroutine either returns the current
value of $DOCUMENT_ROOT or changes it if an argument is provided.
I used this "baseline" server to verify that the Web module was working properly. After creating the
socket, the server enters an accept() loop. Each time through the loop it calls the Web module's
handle_connection() to handle the request.
If you run this server and point your favorite Web browser at port 8080 of the host, you'll see that it
is perfectly capable of fetching HTML files and following links. However, pages with multiple inline
images will be slow to display, because the browser tries to open a new connection for each image
but the Web server can handle connections only in a serial fashion.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Preforking and Prethreading 360
Licensed by
Stjepan Maric
4218908
Daemon won't work on Win32 systems because it makes various UNIX-specific calls. Appendix A
lists a simple DaemonDebug module, which has the same interface calls as Daemon but doesn't
autobackground, open the syslog, or make other UNIX-specific calls. Instead, the process remains
in the foreground and writes its error and debugging messages to standard error. In the following
code examples, just replace "Daemon" with "DaemonDebug" and everything should work fine on
Win32 systems. You might do this on UNIX systems as well if you want the server to remain in the
foreground or you are having problems getting the Sys::Syslog module to work.
We've looked at accept-and-fork servers before, but we do things a bit differently in this one, so we'll
step through it.
Lines 1–7: Load modules—We load the standard IO::* modules, Daemon, and Web. The latter
two modules must be installed in the current directory or somewhere else in your Perl @INC path.
Line 8: Define constants—We choose a filename for the PID file used by Daemon. After auto-
backgrounding, this file will contain the PID of the server process.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Preforking and Prethreading 361
Line 9: Declare globals—The $DONE global variable is used to flag the main loop to exit.
Line 10: Install signal handlers—We create a handler for INT and TERM to bump up the
$DONE variable, causing the main loop to exit. During initialization, Daemon installs a CHLD
handler as well.
Lines 11–14: Create listening socket—We create a listening IO::Socket::INET object in the usual
way.
Line 15: Create IO::Select object—We create an IO::Select object containing the socket for use
in the main accept loop. The rationale for this will be explained in a moment.
Lines 16–18: Initialize server—We call the Daemon module's init_server() routine to create
the PID file for the server, autobackground, and initialize logging.
Lines 19–30: Main accept loop—We enter a loop in which we call accept(), fork off a child to
handle the connection, and continue looping. The loop will only terminate when the INT or
TERM interrupt handler sets the $DONE global to true.
The problem with this strategy is that the loop spends most of its time blocking in the call to
accept(), making it likely that the termination signal will be received during this system call.
However, accept() is one of the slow I/O calls that is automatically restarted when interrupted
by a signal. Although $DONE is set to true, the server accepts one last incoming connection
before it realizes that it's time to quit. We would prefer that the server exit immediately.
In previous versions of the forking server we have either (1) let the interrupt handler kill the server
immediately or (2) used IO::Socket's timeout mechanism to make accept() interruptable. For
variety, this version of the server uses a different strategy. Rather than block in accept(), we
block in a call to IO::Select->can_read(). Unlike the I/O calls, select() is not automat-
ically restarted. When the INT or TERM signal is received, the can_read() method is interrup-
ted and returns undef. We detect this and return to the top of the loop, where the change in
$DONE is detected.
If, instead, can_read() returns true, then we know we have an incoming connection. We go
on to call the socket object's accept() method. If this is successful, then we call the
launch_child() function exported by the Daemon module.
Recall that launch_child() is a wrapper around fork() that launches children in a signal-
safe manner and updates a package global containing the PIDs of all active children.
launch_child() can take a number of arguments, including a callback to be invoked when
the child is reaped. In this case, we're not interested in handling that event, so we pass no
arguments.
If launch_child() returns a child PID of 0, then we know we are in the child process. We
close our copy of the listening socket and call the Web module's handle_connection()
method on the connected socket. Otherwise, we are the parent. We close our copy of the con-
nected socket and continue looping.
The subjective performance of the accept-and-fork server is significantly better than the serial ver-
sion, particularly when handling pages with inline images.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Preforking and Prethreading 362
Lines 1–6: Load modules—We load the IO::* modules, Daemon, and Web.
Lines 6–7: Define constants—In addition to the PIDFILE constant needed by the
init_server() routine, we declare PREFORK_CHILDREN to be the number of child server
processes we will fork.
Lines 8–11: Create listening socket—We create the listening socket in the usual way.
Lines 12–13: Initialize the server—We call the Daemon module's init_server() function to
autobackground the server, set up logging, and create the PID file. The server will actually exit
soon after this, and the PID file will disappear; this problem will be fixed in the next iteration of
the server.
Lines 14–15: Prefork children—We call our make_new_child() subroutine PREFORK_CHIL-
DREN times to spawn the required number of children. The main server process then exits,
leaving the children to run the show.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Preforking and Prethreading 363
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Preforking and Prethreading 364
Our strategy is to create and maintain a temporary lock file to use for flock() serialization. Each
child will attempt to lock the file before calling accept() and release the lock immediately afterward.
The result of this is to protect the call to accept() so that only one process can call it at any time.
The others are blocked in flock() and waiting for the lock to become available.
We discussed the syntax of flock() in the Chapter 14 section, Direct Logging to a File.
Conveniently enough, we don't have to create a separate lock file because we can use our PID file
for this purpose. On entry to the do_child() subroutine, we call IO::File's open() method to open
the PID file, using the O_RDONLY flag to open it in a read-only fashion.
In this version of the preforking Web server, we make the necessary modifications to serialize
accept() and to relaunch child processes to replace exited ones. We also arrange for the parent
process to kill its children cleanly when it exits. Figure 15.5 shows the server with both sets of
modifications in place.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Preforking and Prethreading 365
Figure 15.5. This preforking server serializes accept() and relaunches new children to replace old ones
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Preforking and Prethreading 366
Lines 1–7: Import modules—We import the Fcntl module in addition to those we imported in
earlier versions. This module exports several constants we need to perform file locking and
unlocking.
Lines 8–11: Define constants—In addition to PREFORK_CHILDREN and PIDFILE, we define a
MAX_REQUEST constant. This constant determines the number of transactions each child will
handle before it exits. By setting this to a low value, you can watch children exit and the parent
spawn new ones to replace them. We also define DEBUG, which can be set to generate verbose
log messages.
Lines 12–13: Declare global variables— $CHILD_COUNT is updated to reflect the number of
children active at any given time. $DONE is used as before to flag the parent server that it is time
to exit.
Line 14: Signal handlers—The INT and TERM handlers process requests to terminate. As before,
we will rely on the Daemon module to install a handler for CHLD.
Lines 15–20: Create listening socket, initialize server—We create the listening socket and call
the Daemon module's init_server() routine to write the PID file and go into the background.
Lines 21–24: Main loop—We now enter a loop in which we launch PREFORK_CHILDREN and
then go to sleep until a signal is received. As we will see, each call to make_new_child()
increments the $CHILD_COUNT global by one each time it creates a child, and the CHLD callback
routine decrements $CHILD_COUNT each time a child dies. The effect of the loop is to wait until
CHLD or another signal is received and then to call make_new_child() as many times as
necessary to bring the umber of children up to the limit set by PREFORK_CHILDREN.
This continues indefinitely until the parent server receives an INT or TERM signal and sets
$DONE to true.
Lines 25–27: Kill children and exit—When the main loop is finished, we kill all the children by
calling the Daemon module's kill_children() subroutine. The essence of this routine is the
line of code:
kill TERM => keys %CHILDREN;
where %CHILDREN is a hash containing the PIDs of the active children launched by
launch_child(). kill_children() waits until the last child has died before terminating.
Lines 28–37: make_new_child() subroutine—As in the last version, the
make_new_child() subroutine is invoked to create a new server child process. One change
from the previous version is that when we call the launch_child() subroutine, we pass it a
reference to a subroutine to be invoked whenever Daemon reaps the child. In this case, our
callback is cleanup_child(), which decrements the $CHILD_COUNT global by one. The
other new feature is that after the parent launches a new child, it increments $CHILD_COUNT
by one. Together, these changes allow $CHILD_COUNT to reflect an accurate count of active
child processes.
Lines 38–52: do_child() subroutine—The do_child() subroutine, which runs each child's
accept() loop, is modified to serialize accepts. On entry to the subroutine, we open the PID
file read-only, creating a filehandle that we can use for locking. Before each call to accept(),
we call flock() on the filehandle with an argument of LOCK_EX to gain an exclusive lock. We
then release this lock following accept() by calling flock() again with the LOCK_UN argu-
ment.
After accepting the connection, we call the Web module's handle_connection() routine as
before.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Preforking and Prethreading 367
There are five children, and each one (as indicated by the WCHAN column) is in a system call named
tcp_parse. This routine is presumably called by accept() while waiting for an incoming con-
nection.
In contrast, the latest version of the preforking server shows a different profile:
PID SIZE WCHAN STAT %CPU %MEM TIME COMMAND
15313 2984 pause S 0.0 4.6 0:00 web_prefork2.pl
15314 2980 flock_lock S 0.0 4.6 0:00 web_prefork2.pl
15315 2980 tcp_parse S 0.0 4.6 0:00 web_prefork2.pl
15316 2980 flock_lock S 0.0 4.6 0:00 web_prefork2.pl
15317 2980 flock_lock S 0.0 4.6 0:00 web_prefork2.pl
15318 2980 flock_lock S 0.0 4.6 0:00 web_prefork2.pl
The process at the top of the list (PID 15313) is the parent. Top shows it in pause because that's
the system call invoked by sleep(). The other five processes (15314–15318) are the children.
Only one of them is performing an accept(). The others are blocked in the flock_lock system
call. As the children process incoming connections, they take turns, with never more than one calling
accept() at any given time.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Preforking and Prethreading 368
There are two common solutions to this problem. One is for the parent and children to send mes-
sages via a filehandle. The other technique is to use shared memory so that the parent and child
processes share a Perl variable. When the variable is changed in a child process, the changes
become visible in the parent as well. In this section, we show an example of an adaptive preforking
server that uses a pipe for child-to-parent communications. We'll look at the shared memory solution
in the next section.
Chapter 2 demonstrated how unidirectional pipes created with the pipe() call can be used by a
set of child processes to send messages to their common parent (see the section Creating Pipes
with the pipe() Function). The same technique is ideal in this application.
At startup time, the adaptive server creates a pipe using pipe():
pipe(CHILD_READ,CHILD_WRITE);
This creates two handles. CHILD_WRITE will be used by the children to write status messages, and
CHILD_READ will be used by the parent to receive them. Each time we fork a new child process,
the new child closes CHILD_READ and keeps a copy of CHILD_WRITE. The format of the status
messages is simple. They consist of the child's PID, whitespace, the current status, and a newline:
2209 busy
The status may be any of the strings "idle," "busy," and "done." The child issues the "idle" status
just before calling accept() and "busy" just after accepting a new connection. The child announces
that it is "done" when it has processed its maximum number of connections and is about to exit.
The parent reads the messages in a loop, parsing them and keeping a global named %STATUS up
to date. Each time a child's status changes, the parent counts the busy and idle children and if
necessary launches new children or kills old ones to keep the number of idle processes in the desired
range. We want the parent's read loop to be interruptable by signals so that we can kill the server.
Before the server exits, it kills each remaining child so that everything exits cleanly. Similarly, we
arrange for the child processes' accept() loop to be interruptable so that the child exits immedi-
ately when it receives a termination signal from its parent.
At any time, there is a single active CHILD_READ filehandle in the parent and multiple
CHILD_WRITE filehandles in the children. You might well wonder what prevents messages from
the children being garbled as they are intermingled. This design works because of a particular
characteristic of the pipe implementation. Provided that messages are below a certain size thresh-
old, write operations on pipes are automatic. A message written to a pipe by one process is guar-
anteed not to interrupt a message written by another. This ensures that messages written into the
pipe come out intact at the other end and not garbled with data from writes performed by other
processes. The size limit on automatic messages is controlled by the operating system constant
PIPE_BUF, available in the header file limits.h. This varies from system to system, but 512 bytes is
generally a safe value.
Figure 15.6 shows the code for the adaptive server.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Preforking and Prethreading 369
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Preforking and Prethreading 370
Lines 1–8: Load modules—We bring in the standard IO::* modules, Fcntl, and our own
Daemon and Web modules.
Lines 9–14: Define constants—We define several new constants. HI_WATER_MARK and
LO_WATER_MARK define the maximum and minimum number of idle servers, respectively. They
are set deliberately low in this example to make it easy to watch the program work. DEBUG is a
constant indicating whether to print debugging information.
Lines 15–16: Declare globals—The $DONE flag causes the server to exit when set to true. The
%STATUS hash contains child status information. As in the previous example, the child PIDs form
the keys of the hash, while the status information forms the values.
Line 17: Interrupt handlers—We install a handler for INT and TERM that sets the $DONE flag to
true, ultimately causing the server to exit. Recall also that the Daemon module automatically
handles the CHILD signal by reaping children and maintaining a list of child PIDs in the %CHIL-
DREN global.
Licensed by
Lines 18–21: Create socket—We create a listening socket in the usual way.
Lines 22–24: Create pipe—We create a unidirectional pipe with the pipe() call and add the
CHILD_READ end of the pipe to an IO::Select set for use in the main loop. We will discuss the
rationale for using IO::Select momentarily.
Stjepan Maric
Lines 25–26: Initialize server—We call the Daemon module's init_server() routine to create
the PID file for the server, autobackground, and initialize logging.
Lines 27–28: Prefork children—We call our internal make_new_child() subroutine to fork the
specified number of child server processes.
4218908
Line 29: Main loop—The main loop of the server runs until $DONE is set to true in a signal handler.
Each time through the loop, the server waits for a status change message from a child or a
signal. To keep the number of idle children between the low and high water marks, it updates
the contents of %STATUS and runs the code that we have seen previously for launching or killing
children.
Lines 30–42: Process messages from the pipe—Looking at the main loop in more detail, we
want to read status lines from the CHILD_READ filehandle using sysread(). However, we can't
simply let the parent block in the I/O call, because we want to be able to terminate when we
receive a TERM signal or notification that one of the child processes has died; sysread(), like
the other slow I/O calls, is automatically restarted by Perl after interruption by a signal.
The easiest solution to this problem is again to use select() to wait for the pipe to become
readable because select() is not automatically restarted. We call the IO::Select object's
can_read() method to wait for the pipe to become ready, and then invoke sysread() to read
its current contents into a buffer. The data read may contain one message or several, depending
on how active the children are. We split the data into individual messages on the newline char-
acter and parse the messages. If the child's status is "done," we delete its PID from the %STA-
TUS global. Otherwise, we update the global with the child's current status code.
Lines 43–52: Launch or kill children—After updating %STATUS, we collect the list of idle children
by using grep() to filter the %STATUS hash for those children whose status is set to "idle." If
the number of idle children is lower than LO_WATER_MARK, we call make_new_child() as
many times as required to bring the child count up to the desired level. If the number of idle
children exceeds HI_WATER_MARK, then we politely tell the excess children to quit by sending
them a HUP ("hangup") signal. As we will see later, each child has a HUP handler that causes it
to terminate after finishing its current connection. This is better than terminating the child im-
mediately, because it avoids breaking a Web session that is in process.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Preforking and Prethreading 371
When we tally the idle children, we sort them numerically by process ID, causing older excess
children to be killed preferentially. This is probably unnecessary, but it might be useful if the child
processes are leaking memory.
Lines 54–70: Termination—When the main loop is done, we log a warning and call the
kill_children() subroutine defined in Daemon. kill_children() sends each child a
TERM and then waits for each one to exit. When the subroutine returns, we log a second message
and exit.
Lines 58–67: make_new_child() subroutine— make_new_child() is invoked to create a
new child process. We invoke the Daemon module's launch_child() function to fork a new
child in a signal-safe manner. When we call launch_child(), we pass it a code reference to
a callback routine that will be invoked immediately after the child is reaped. The callback,
cleanup_child(), is responsible for keeping %STATUS up to date even if the child exits ab-
normally.
launch_child() returns the PID of the child in the parent process and numeric 0 in the child
process. In the former case, we simply log a debugging message. In the latter, we close the
CHILD_READ filehandle, because we no longer need it, and run our Web server routines by
calling do_child(). When do_child() is finished, we exit.
>Lines 68–91: do_child() subroutine—At its heart, this routine does exactly what the previous
version of do_child() did. It serializes on the lock file using flock(), calls the listening
socket's accept() method, and passes the connected socket to the Web module's han-
dle_connection() function.
The main differences from the previous version are (1) it handles HUP signals sent to it by the
parent by shutting down gracefully, and (2) it writes status messages to the CHILD_WRITE
filehandle.
Lines 70–73: Initialize subroutine and start accept() loop—When we enter the do_child()
routine, we open the lock file and initialize the $cycles variable as before. We then install a
handler for HUP which sets the local variable $done to true. Our accept loop exits when
$done becomes true or we have processed the maximum number of transactions. At the top of
the accept() loop, we write a status message containing our process ID (stored in $$) and
the "idle" status message.
Lines 76–83: Lock and call accept()—The rationale for the next bit of code is a bit subtle. We
call flock() and then accept() as before. However, what happens if the HUP signal from the
parent comes in while we're in one or the other of those calls? The HUP handler executes and
sets $done to true, but since Perl restarts slow system calls automatically, we will not notice the
change in $done until we have received an incoming connection, processed it, and returned to
the top of the accept loop.
We cannot handle this by interposing an interruptable select() between the calls to
flock() and accept(), because the HUP might just as easily come while we are blocked for
the flock() call, and flock() is also restartable. Instead, we wrap the calls to flock() and
accept() in an eval{} block. At the top of the block we install a new local HUP handler, which
bumps up $done and dies, forcing the entire eval{} block to terminate when the HUP signal is
received. We test the value returned by the block, and if it is undefined, we return to the top of
the loop, where the change in $done will be detected.
Lines 84–91: Handle connection—If the eval{} block runs to completion, then we have ac-
cepted a new incoming connection. We send a "busy" message to the parent via
CHILD_WRITE and call the handle_connection() subroutine. After the loop terminates, we
write a "done" message to the parent, close all our open filehandles, and exit.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Preforking and Prethreading 372
As written, there is a potential bug in the parent code. The parent process reads from
CHILD_READ in maximum chunks of 4,096 bytes rather than in a line-oriented fashion. If the children
are very active and the parent very slow, it might happen that more than 4,096 bytes of messages
could accumulate and the last message get split between two reads. Although this is unlikely (4,096
bytes is sufficient for 400 messages given an average size of 10 bytes per message), you might
consider buffering these reads in a string variable and explicitly checking for partial reads that don't
terminate in a newline.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Preforking and Prethreading 373
The first argument gives the name of the variable to tie, in this case %H. The second is the name of
the IPC::Shareable module. The third argument is a "glue" ID that will identify this variable to the
processes that will share it. This can be an integer or any string of up to four letters. In this example
we use a glue ID of Test.
The last argument is a hash reference containing options to pass to IPC::Shareable. There are a
variety of options, but the most frequent are create, destroy, exclusive, and mode. The create option
causes the shared memory segment to be created if it doesn't exist already. It is often used in
conjunction with exclusive to cause the tie() to fail if the segment already exists, and with de-
stroy to arrange for the shared memory segment to be destroyed automatically when the process
exits. Finally, mode specifies an octal access mode for the shared memory segment. It functions
like file modes, where 0666 is the most liberal, and allows any process to read and write the memory
segment, and 0600 is the most conservative, making the shared variable accessible only to pro-
cesses that share the same user ID.
Multiple processes can tie hashes to the same memory segment, provided that they have sufficient
access privileges. In a typical case of a parent that must share data with multiple children, the parent
first creates the shared memory using the create, destroy, and exclusive options. Each child then
ties its own variable to the same glue ID. The children are not responsible for creating or destroying
the shared memory, so they don't pass options to tie():
tie %my_copy, 'IPC::Shareable', 'Test';
After a hash variable is tied, all changes made to the variable by one process are seen immediately
by all others. You can store scalar variables, objects, and references into the values of a shared
hash, but not filehandles or subroutine references. However, there are certain subtleties to storing
complex objects into shared hashes; see the IPC::Shareable documentation for all the caveats.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Preforking and Prethreading 374
If multiple processes try to modify the same shared variable simultaneously, odd things can happen.
Even something as simple as $H{'key'}++ is a bit risky, because the ++ operation occurs inter-
nally in several steps: The current value is fetched, incremented, and stored back into the hash. If
another process tries to modify the value before ++ has finished executing, its changes will be
overwritten. The simple solution is to lock the hash before performing a multistep update and unlock
it before you finish. Here's the idiom:
tied(%H)->shlock;
$H{'key'}++;
tied(%H)->shunlock;
The tied() method returns a reference to an object that is maintained internally by IPC::Shareable.
It has just two public methods: shlock() and shunlock(). The first method locks the variable so
that it can't be accessed by other processes, and the second reverses the lock. (These methods
have no direct relationship to the lock() function used in threading or the flock() function used
earlier in this chapter to serialize accept().)
Scalar variables can also be tied to shared memory using a similar interface. Tied arrays are cur-
rently not supported.
A new version of the adaptive preforking Web server written to take advantage of IPC::Shareable
is shown in Figure 15.7.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Preforking and Prethreading 375
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Preforking and Prethreading 376
Lines 1–8: Load modules—We load the same modules as before, plus the IPC::Shareable mod-
ule.
Lines 9–15: Define constants—We define a new constant, SHM_GLUE, which contains the key
that parent and children will use to identify the shared memory segment.
Lines 16–17: Declare globals—We declare $DONE and %STATUS, which have the same signif-
icance as in the previous example. The major difference is that %STATUS is tied to shared mem-
ory and updated directly by the children, rather than kept up to date by the parent.
Lines 18–19: Install signal handlers—We install TERM and INT handlers that set the $DONE flag
to true, causing the server to terminate. We also intercept the ALRM signal with a handler that
does absolutely nothing. As you will see, the parent spends most of its time in the sleep() call,
waiting for one of its children to send it an ALRM to tell it that the contents of %STATUS have
changed. We must install a handler for ALRM to override the default action of terminating the
program completely.
Lines 20–25: Create socket, initialize server—We create a listening socket and call the Daemon
module's init_server() routine in the usual way.
Lines 26–28: Tie %STATUS —We tie %STATUS to shared memory, using options that cause the
shared memory to be created with restrictive access modes and to be destroyed automatically
when the parent exits. If the memory segment already exists when tie() is called, the call will
fail. This may happen if another program chose the same ID value for a shared memory segment
or if the server crashed abnormally, leaving the memory allocated. In the latter case, you may
have to delete the shared memory manually using a tool provided by your operating system. On
Linux systems, the command to remove a shared memory segment is ipcrm.
The contents of %STATUS are identical to those in the last example. Its keys are the PIDs of
children, and its values are their status strings.
Lines 29–30: Prefork children—We prefork some children by calling make_new_child() the
required number of times.
Lines 31–43: Status loop—As the children process incoming connections, they will update
%STATUS and the changes will be visible to the parent process immediately. But it would be
woefully inefficient to do a busy loop over %STATUS looking for changes. Instead, we rely on the
children to tell us when %STATUS has changed, by waiting for a signal to arrive. The two signals
we expect to get are ALRM, sent by the child when it changes %STATUS, and CHLD, sent by the
operating system when a child dies for whatever reason.
We enter a loop that terminates when $DONE becomes true. At the top of the loop, we call
sleep(), which puts the process to sleep until some signal is received. When sleep() returns,
we process %STATUS exactly as before, launching new children and killing old ones to keep the
number of idle children between the low and high water marks.
Lines 44–47: Termination—When the main loop is done, we call Daemon's kill_chil-
dren() to terminate any running children, print out some diagnostic messages, and exit.
Lines 48–56: make_new_child() subroutine—This subroutine is the same as the one used
in the first version of the adaptive server, except that it no longer does pipe management. As in
the earlier version, we call the Daemon module's launch_child() subroutine with a callback
to cleanup_child().
Lines 57–83: do_child() subroutine— do_child() runs the accept() loop for each child,
accepting and processing incoming connections from clients. On entry to the subroutine, we tie
a local variable named %status to the shared memory segment identified by SHM_GLUE. Be-
cause we expect that the segment has already been created by the parent, we do not use the
create or exclusive flags this time. If the variable cannot be tied, the child exits with an error
message.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Preforking and Prethreading 377
We set up the lock file for serialization and enter an accept() loop. Each time the status of the
child changes, we write its new status directly into the %status variable and notify the parent
that the variable has changed by sending the parent an ALRM signal. The idiom looks like this:
$status{$$} = 'idle'; kill ALRM=>getppid();
In other respects do_child() is identical to the earlier version, including its use of an
eval{} block to intercept and handle HUP signals gracefully.
Lines 84–87: cleanup_child() subroutine— cleanup_child() is called by the Daemon
module's reap_child() subroutine to handle a child that has just been reaped. We delete the
child's PID from %STATUS. This ensures that %STATUS is kept up to date even if the child has
terminated prematurely.
Some final notes on this server: I initially attempted to use the same tied %STATUS variable for both
the parent and children, allowing the children to inherit %STATUS through the fork. This turned out
to be a disaster because IPC::Shareable deallocated the shared memory segment whenever any
of the children exited. A little investigation revealed that the destroy flag was being inherited along
with the rest of the shared variable. One could probably fix this by hacking into IPC::Shareable's
internal structure and manually deactivating the destroy flag. However, there's no guarantee that
the internal structure won't change at some later date.
Some posters to the comp.lang.perl.modules newsgroup have warned that IPC::Shareable is not
entirely stable, and although I have not encountered problems with it, you might want to stick with
the simpler pipe implementation on production systems.
Prethreading
If you are working with a threading version of Perl, you can design your server to use a prethreading
architecture. Prethreading is similar to preforking, except that instead of launching multiple pro-
cesses to call accept(), the prethreading server creates multiple threads to deal with incoming
connections. As it is for the preforking server, the rationale is to avoid the overhead of creating a
new thread for every incoming connection.
In this section we develop a prethreading Web server that implements the same adaptive features
as the preforking server from the previous sections.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Preforking and Prethreading 378
Lines 1–7: Load modules—In addition to the standard modules, we load the Thread module,
making Perl's threaded API available for our use.
Lines 8–10: Create constants, globals, and interrupt handlers—We select a path to use for the
server's PID file and install signal handlers that will gracefully terminate the server when it re-
ceives either a TERM or INT.
Lines 11–17: Create listening socket and autobackground—We create the listening socket and
go into the background by calling the Daemon module's init_server() routine. We again
create an IO::Select object to use in the main loop to avoid being blocked in accept() when a
termination signal is received.
Lines 18–24: accept() loop—The accept() loop is similar to others we've seen in this chap-
ter. For each new incoming connection, we call Thread->new() to launch a new thread of
execution. The new thread will run the do_thread() subroutine.
Lines 26–31: do_thread() subroutine—In the do_thread() subroutine, we first detach our
thread so that it isn't necessary for the main thread to join() us after we are through. We then
call handle_connection(), and when this is done, close the socket.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Preforking and Prethreading 379
Like the other servers in this chapter, web_thread1.pl autobackgrounds itself at startup time. Its
status messages are written to the syslog, and you can stop it by sending it a TERM or INT signal
in this way:
% kill -TERM 'cat /tmp/web_thread.pid'
The main thread creates a listening socket and then launches PRETHREAD threads of execution,
each running the subroutine do_thread(). The main thread then goes to sleep. Meanwhile, each
thread enters an accept() loop in which it waits for an incoming connection, handles it, and then
goes back to waiting. Which thread handles which connection is nondeterministic.
Of course, things are not quite this simple. As is, this code won't work on all platforms because on
some systems, the call to accept() fails if more than one thread calls it simultaneously, and we
need a mechanism to ensure that only one thread will call accept() at one time.
Fortunately, because we are using threads, we can take advantage of the built-in lock() call and
don't have to resort to locking an external file. We simply declare a scalar global variable
$ACCEPT_LOCK and modify the do_thread() routine to look like this:
sub do_thread {
my $socket = shift
my $c;
while (1) {
{
lock $ACCEPT_LOCK;
next unless $c = $socket->accept;
}
handle_connection($c);
close $c;
}
}
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Preforking and Prethreading 380
The while() loop now contains an inner block that defines the scope of the lock. Within that block
we attempt to get a lock on $ACCEPT_LOCK. Due to the nature of thread locking, only one thread
can obtain a lock at a time; the others are suspended until the lock becomes available. After ob-
taining the lock, we call accept(), blocking until there is an incoming connection. Immediately after
accepting a new connection, we release the lock by virtue of leaving the scope of the inner brace.
This allows another thread to obtain the lock and call accept(). We now handle the connection
as before.
Adaptive Prethreading
Another deficiency in the basic prethreading server is that if all the threads launched at server startup
are busy serving connections, incoming connections have to wait. We would like the main thread
to launch new threads when needed to handle an increased load on the server and to delete excess
threads when the load diminishes.
Licensed by
We can accomplish this using a strategy similar to that of the preforking server by maintaining a
global %STATUS hash that the main server thread monitors and each thread updates. Unlike with
the preforking server, there's no need to use pipes or shared memory to keep this hash updated.
Since the threads are all running in the same process, they can modify %STATUS directly, provided
that they take appropriate steps to synchronize their access to the hash by locking it before modifying
it.
Stjepan Maric
The keys of %STATUS are the thread identifiers (TIDs), and the values are one of the strings "busy,"
"idle," or "goner." The first two have the same meaning they did in the preforking servers. We'll
explain the third status code later. To simplify the management of %STATUS, we use a small sub-
4218908
routine named status() that allows threads to examine and change the hash in a thread-safe
manner. Given a TID, status() returns the status of the indicated thread:
my $tid = Thread->self->tid;
my $status = status($tid);
If the second argument is undef, status() deletes the indicated thread from %STATUS entirely.
Each worker thread's accept() loop invokes status() to change the status of the current thread
to "idle" before calling accept() and to "busy" after it accepts a connection.
The main thread monitors changes to %STATUS and acts on them. To do its job efficiently, the thread
must have a way to know when a marker has changed %STATUS. The best way is to use a condition
variable. Each time through the main thread's loop, it calls cond_wait() on the condition variable,
putting itself to sleep until one of the worker threads indicates that the variable has changed. Code
in the status() subroutine calls cond_broadcast() whenever a thread updates %STATUS,
waking up the main thread and allowing it to manage the change.
The last detail is that the adaptive server needs a way to shut down gracefully. As before, the server
responds to the TERM and INT signals by shutting down, but how does the main thread tell its various
worker threads that shutdown time has arrived?
There is currently no way to deliver a signal specifically to a thread. The way we finesse this is to
have each worker periodically check its status code for a special value of "goner" and then exit. To
decommission a worker, the master simply calls status() to set the worker's status code appro-
priately.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Preforking and Prethreading 381
Figure 15.9 lists the prethreaded Web server. Its increased size relative to the simple threaded
server indicates the substantial complexity of code that is required to coordinate the activities of the
multiple threads.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Preforking and Prethreading 382
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Preforking and Prethreading 383
Lines 1–8: Load modules—We bring in the IO::Socket, IO::File, and IO::Select modules, along
with the Thread module. Thread doesn't import the cond_wait() and cond_broadcast()
functions by default, so we import those functions explicitly.
Lines 9–14: Define constants—We define the various constants used by the server, including
PRETHREAD, the number of threads to launch at startup time, the high and low water marks,
which have the same significance as in the preforked servers, and a DEBUG flag to turn on status
messages. We also define a MAX_REQUEST constant to control the number of transactions a
thread will accept before spontaneously exiting.
Lines 15–18: Declare global variables— $ACCEPT_LOCK, as discussed previously, is used for
protecting accept() so that only one thread can accept from the listening socket at a time.
%STATUS reports the state of each thread, indexed by its TID, and $STATUS is a condition
variable used both to lock %STATUS and to indicate when it has changed. $DONE flags the main
thread that the server is shutting down.
Line 19: Install signal handlers—We install a signal handler named terminate() for the INT
and TERM signals. This handler sets $DONE to true and returns.
Lines 20–25: Create listening server socket and go into background—We create a listening
socket and autobackground by calling init_server(). We also create an IO::Select object
containing the listening socket for use by each worker thread.
Line 26: Prelaunch some threads—We launch PRETHREAD threads by calling
launch_thread() the appropriate number of times before we enter the main loop.
Lines 27–40: Main thread: monitor worker threads for status changes —The main thread now
enters a loop that runs until $DONE is true, indicating that the user has requested server termi-
nation. Each time through the loop, we lock the $STATUS condition variable and immediately
call cond_wait(), unlocking the condition variable and putting the main thread to sleep until
another thread calls cond_broadcast() on the variable.
When cond_wait() returns, we know that a worker thread has signalled that its status has
changed and that $STATUS is again locked, protecting us against further changes to %STA-
TUS. We count the number of idle threads and either launch new ones or shut down existing
ones to keep the number of idle threads between the low and high water marks. The way we do
this is similar to the adaptive preforking servers, except that we cannot kill worker threads with
a signal. Instead, we set their status to "goner" and allow them to exit themselves.
Lines 41–47: Clean up—After the main loop has finished, we set each worker thread's status to
"goner" and call exit(). Although the main thread has now finished, the server process itself
won't exit until each thread has finished processing pending transactions, checked its status
code, and exited as well.
Lines 48–67: do_thread() routine—The do_thread() routine forms the body of each worker
thread. We begin by recovering the current thread's TID and initializing our status to "idle." We
now enter a loop that terminates when our status code becomes "goner" or we have serviced
the number of transactions specified by MAX_REQUEST.
We need to poll our status on a periodic basis to recognize when termination has been reques-
ted, so we don't want to get blocked in lock() or accept(). To do this, we take advantage of
the IO::Select object created by the main thread to call can_read() with a timeout of 1 second.
If an incoming connection arrives within that time, we service it. Otherwise, we return to the top
of the loop so that we can check that our status hasn't changed.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Preforking and Prethreading 384
If can_read() returns true, the socket is ready for accept(). We serialize access to
accept() by locking the $ACCEPT_LOCK variable, and call accept(). If this is successful, we
set our status to "busy" and handle the connection. After the connection is done, we again set
our status to "idle." After the accept() loop is done, we set our status to undef, causing the
status() subroutine to remove our TID from the %STATUS hash.
Lines 71–83: status() subroutine—The status() subroutine is responsible for keeping
%STATUS up to date. We begin by locking $STATUS so that the hash doesn't change from un-
derneath us. If we were called with only the TID of a thread, we look up its status in %STATUS
and return it. Otherwise, if we were provided with a new status code for the TID, we change
%STATUS accordingly and call cond_broadcast() on the $STATUS variable in order to alert
any threads that are waiting on the variable that %STATUS has been updated.
When we run the prethreaded Web server with DEBUG true, we can see messages appear in the
syslog that indicate the birth and death of each worker thread, interspersed with messages from the
master thread that indicate its tally of each worker's status:
Jun 25 14:03:36 pesto web_prethread1.pl: Thread 1: starting
Jun 25 14:03:36 pesto web_prethread1.pl: Thread 2: starting
Jun 25 14:03:36 pesto web_prethread1.pl: Thread 3: starting
Jun 25 14:03:36 pesto web_prethread1.pl: Thread 4: starting
Jun 25 14:03:36 pesto web_prethread1.pl: Thread 5: starting
Jun 25 14:03:36 pesto web_prethread1.pl:
1=>idle 2=>idle 3=>idle 4=>idle 5=>idle
Jun 25 14:03:40 pesto web_prethread1.pl:
1=>busy 2=>idle 3=>idle 4=>idle 5=>idle
Jun 25 14:03:40 pesto web_prethread1.pl: Thread 1: handling connection
Jun 25 14:03:44 pesto web_prethread1.pl:
1=>busy 2=>idle 3=>busy 4=>idle 5=>idle
Jun 25 14:03:44 pesto web_prethread1.pl: Thread 3: handling connection
Jun 25 14:03:47 pesto web_prethread1.pl: Thread 2: handling connection
Jun 25 14:03:47 pesto web_prethread1.pl:
1=>busy 2=>busy 3=>busy 4=>idle 5=>idle
Jun 25 14:03:52 pesto web_prethread1.pl: Thread 4: handling connection
Jun 25 14:03:52 pesto web_prethread1.pl:
1=>busy 2=>busy 3=>busy 4=>busy 5=>idle
Jun 25 14:03:52 pesto web_prethread1.pl: Thread 6: starting
Jun 25 14:03:52 pesto web_prethread1.pl:
1=>busy 2=>busy 3=>busy 4=>busy 5=>idle 6=>idle
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Preforking and Prethreading 385
NetServer::Generic does not provide customized logging functions, autobackgrounding, PID file
handling, or other more specialized functions frequently required by production servers. However,
you can always layer these onto your application. In any case, the module is perfect when you need
to get a server up and running fast and inetd provides insufficient performance for your needs.
Performance Measures
How much do preforking and prethreading improve performance? For preforking, the advantage is
clear. Because of the overhead of launching new processes, heavily loaded servers generally see
a marked performance boost when going from a conventional accept-and-fork design to preforking.
In fact, when I used the standard WebStone benchmark [http://www.mindcraft.com/webstone] to
compare the connection rate of the accept-and-fork server of Figure 15.3 and the preforking server
of Figure 15.5 on a Linux system, I saw an approximately fivefold increase in performance at heavy
load levels after adjusting for the overhead of the actual file transfer.
The situation is less clear-cut for threaded servers. The overhead for thread creation is not as large
as for process creation, and the prethreaded design itself introduces new overhead for thread lock-
ing and synchronization. With the WebStone benchmarks I was unable to document speedup in the
prethreaded server of Figure 15.9 compared to the conventional threaded server of Figure 15.8.
The performance of both the threaded and prethreaded designs was better than that of the accept-
and-fork server, but roughly equivalent to that of the preforking server.
However, such performance is very sensitive to the operating system, hardware, kernel parameters,
and other factors. It's worth subjecting a prototype of your particular application to timing tests before
commiting to one design over another.
Surprisingly, all the Web servers developed in this chapter came in with better benchmarks than the
state-of-the-art Apache Web server (almost ninefold better at moderate load levels). Although this
isn't a fair comparison since Apache does many things that the simple Web servers developed in
this chapter do not, it does illustrate that Perl can deliver sufficient performance for serious network
applications.
On a less positive note, a side effect of the testing was to confirm that under heavy loads the threaded
implementations of Perl occasionally crash. Perl threading is still not ready for production systems,
at least through version 5.6. Ironically, the instability even affected scripts that don't use the thread-
ing features. For example, under high client loads the pure accept-and-fork server of Figure 15.3
would frequently hang when run under a threaded Perl interpreter. This problem disappeared when
I retested the server using a version of Perl compiled without thread support.
Summary
In this chapter we have examined at some length two specialized architectures for connection-
oriented servers: preforking and prethreading. In so doing, we have seen a number of strategies for
dealing effectively with interprocess communication, including signals, shared memory, named
pipes, and condition variables. When designing a server to use under heavy loads, it's worth giving
these architectures consideration and possibly benchmarking the alternative designs under typical
loads.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
386
We've used select() and IO::Select extensively to multiplex among multiple I/O streams. How-
ever, the select() system call has some design limitations related to its use of a bit vector to
represent the filehandles to be monitored. On an ordinary host, such as a desktop machine, the
maximum number of files is usually a small number, such as 256, and the bit vectors will therefore
be no longer than 32 bytes. However, on a host that is tuned for network applications, such as a
Web server, this limit may be in the thousands. The bit vectors necessary to describe every possible
filehandle then become quite large, forcing the operating system to scan through a large, sparsely
populated bit vector each time select() is called. This may have an impact on performance.
For this reason, the POSIX standard calls for an alternative API called poll(). It does much the
same thing as select() but uses arrays rather than bit vectors to represent sets of filehandles.
Because only the filehandles of interest are placed in the arrays, the poll() call doesn't waste time
scanning through a large data structure to determine which filehandles to watch. You might also
want to use poll() if you prefer its API, which is more elegant in some ways than select().
poll() is available to Perl programmers only via its object-oriented interface, IO::Poll. It was in-
troduced during the development of Perl version 5.6. Be sure to use IO::Poll version 0.04 and higher
because earlier versions weren't completely functional. This version can be found in Perl versions
5.7 and higher.
Using IO::Poll
IO::Poll is a little like IO::Select turned inside out. With the IO::Select API, you create multiple
IO::Select sets—typically one each for reading and writing—and monitor them with a call to
IO::Select->select(). With IO::Poll, you create a single IO::Poll object and add filehandles to
it one at a time, each with a mask that indicates the conditions you are interested in monitoring. You
then call the IO::Poll object's poll() method, which blocks until one or more of the conditions is
met. After poll() returns, you interrogate the object to learn which handles were affected.
A typical program begins like this:
use IO::Poll qw(POLLIN POLLOUT POLLHUP);
This loads the IO::Poll module and brings in the three constants POLLIN, POLLOUT, and
POLLHUP. These constants will be used in forming a mask to indicate what conditions of filehandles
you are interested in monitoring.
The next step is to create an IO::Poll object, then add to it the handle(s) you wish to monitor:
my $poll = IO::Poll->new;
poll->mask(\*STDIN => POLLIN);
$poll->mask(\*STDOUT => POLLOUT);
$poll->mask($socket => POLLIN|POLLOUT);
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
IO::Poll 387
The mask() method is used both to add handles to the IO::Poll object and to remove them. It takes
two arguments: the handle to be watched and a bitmask designating the conditions to monitor. In
the example, STDIN is monitored for the POLLIN condition, STDOUT for the POLLOUT condition,
and the handle named $socket is monitored for both POLLIN or POLLOUT events, formed by
logically ORing the two constants. As described in more detail later, POLLIN and POLLOUT condi-
tions occur when the handle is ready for reading or writing, respectively.
Having set up the IO::Poll object, you usually enter an I/O loop. Each time through the loop, you call
the poll object's poll() method to wait for an event to occur and then call handles() to determine
which handles were affected:
while (1) {
$poll->poll();
my @readers = $poll->handles(POLLIN|POLLHUP|POLLERR);
my @writers = $poll->handles(POLLOUT);
foreach (@readers) {
do_reader($_);
}
foreach (@writers) {
do_writers($_);
}
}
The poll() method waits until one of the requested conditions becomes true and returns the
number of handles that had events. As with select(), you can provide an optional timeout value
to return if no events occur within a designated period. The handles() method returns all the
handles that have conditions indicated by the passed bitmask. This example calls handles() twice
using different bitmasks. The first checks for handles that are ready to be read from (POLLIN), those
that were closed by the peer (POLLHUP), and those that have some other error (POLLERR). The
second call looks for handles that are ready for writing. The remainder of the example loop pro-
cesses these handles in an application-specific manner.
Like select(), poll() must be used with sysread() and syswrite() only. Mixing
poll() with routines that use standard I/O buffering (the <> operator or plain read() and
write()) does not work.
IO::Poll Events
IO::Poll allows you to monitor handles for a richer set of conditions than those made available by
IO::Select. In addition to watching a handle for incoming data and the ability to accept outgoing data
without blocking, IO::Poll allows you to watch handles for two levels of incoming "priority data," for
end-of-file conditions, and for several different types of error. Each condition is known as an "event."
Each event is designated by one of the constants summarized in Table 16.1. They are divided into
constants that can be added to bitmasks sent to poll() using the mask() method, and constants
that are returned from poll() via the handles() method.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
IO::Poll 388
The following list explains the significance of each event in more detail.
POLLIN—The handle has data for reading, and sysread() will not block. In the case of a
listening socket, POLLIN detects the presence of an incoming connection and accept() will
not block. What happens at an end of file varies somewhat among operating systems and is
discussed later.
POLLRDNORM —Like POLLIN, but applies only to normal (nonpriority) data.
POLLRDBAND —Priority data is available for reading. An attempt to read out-of-band data
(Chapter 17) will succeed.
POLLPRI — "High priority" data is available for reading. High priority data is a historical relic and
should not be used for TCP/IP programming.
POLLOUT—The handle can accept at least 1 byte of data for writing (as modified by the value
of the socket's send buffer low water mark, as described in Chapter 12). syswrite() does not
block as long as its length does not exceed this value. This event does not distinguish between
normal and priority data.
POLLWRNORM —The handle can accept at least 1 byte of normal (nonpriority) data.
POLLWRBAND —The handle can accept at least 1 byte of out-of-band data (Chapter 17).
POLLERR —An error occurred on the handle, such as a PIPE error. For sockets, you may be
able to recover the actual error number by calling sockopt() with the SO_ERROR option
(Chapter 13).
POLLNVAL —The handle is invalid. For example, it is closed.
POLLHUP —In the case of pipes and sockets, the remote process closed the connection. For
normal files, this event doesn't apply.
There are subtle differences in the behavior of POLLIN and POLLHUP among operating systems
and among different types of I/O handles. On many systems, poll() returns POLLIN on a readable
handle if an end of file occurs. As you recall, for regular filehandles, this occurs when the end of the
file is read. For sockets, this occurs when the peer closes its end of the connection.
Unfortunately, this behavior is not universal. On some, if not all, Linux systems, POLLIN is not set
when a socket is closed. Instead, you must check for a POLLHUP event. However, POLLHUP is
relevant only to sockets and pipes, and does not apply to ordinary filehandles; this makes program
logic a bit convoluted.
The most reasonable strategy is to recover the handles that may be readable by calling handles
with the bitmask POLLIN|POLLHUP|POLLERR. Pass each handle to sysread(), and let the return
value tell you what state the handle was in.
Similarly, it is easiest to check for handles that are writable using the bitmask POLLOUT|POLL-
ERR. The subsequent call to syswrite() will indicate whether the handle is open or has an error.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
IO::Poll 389
IO::Poll Methods
We've seen most of the IO::Poll methods already. Here is the definitive list.
$poll = IO::Poll->new
Creates a new IO::Poll object. Unlike IO::Select, new() does not accept arguments.
$mask = $poll->mask($handle [$mask])
Gets or sets the current event bitmask for the indicated handle. If no mask argument is specified, the current
one is returned. Otherwise, the argument is used to set the mask. A mask of 0 removes the handle from the
monitored set entirely. All handles are monitored for error conditions (POLLNVAL, POLLERR, POLLHUP)
whether you request it in the bitmask or not.
$poll->remove($handle)
Removes the handle from the polling list. This is exactly equivalent to calling mask() with a bitmask argument
of 0.
$events = $poll->poll([$timeout])
Wait until a monitored handle has an event or until $timeout occurs, returning the number of handles with
events. $timeout is given in seconds and may be fractional. A timeout of 0 results in nonblocking behavior.
An absent timeout, or a timeout of -1, causes poll() to block indefinitely.
@handles = $poll->handles([$mask])
Called with no arguments, handles() returns a list of all handles known to the IO::Poll object. Called with a
bitmask of events, it returns all handles that had one of the specified events during the previous call to poll().
$mask = $poll->events($handle)
The events() method returns a bitmask containing all the events involving $handle that occurred during
the previous call to poll().
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
IO::Poll 390
Licensed by
Stjepan Maric
4218908
Figure 16.1. The gab7.pl script uses IO::Poll to multiplex input and output
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
IO::Poll 391
To make it more interesting, gab7.pl uses nonblocking I/O. Data read on STDIN is buffered to a
scalar variable named $to_socket. Likewise, data received from the socket is buffered in
$to_stdout. The data in the buffers is written to their appropriate destinations whenever
poll() indicates that the operation won't block. If either buffer grows too large, then further reading
from its associated input source is disabled until the buffer again has sufficient room.
Lines 1–8: Load modules—We begin by bringing in the IO::Socket and IO::Poll modules. IO::Poll
doesn't import constants by default, so we must do this manually, asking for the POLLIN,
POLLOUT, and POLLERR constants. We also bring in the Errno module so as to have access to
the EWOULDBLOCK constant.
Lines 9–10: Declare constants and globals—We define the maximum size to which our internal
buffers can grow. Further reading from the socket or STDIN is inhibited until the associated data
buffer shrinks to a smaller size. We detect and handle PIPE errors, so we set the PIPE handler
to IGNORE.
We define our globals. In addition to two scalars to hold buffered data, there are a pair of flags
named $stdin_done and $sock_done. These flags are set to true when the corresponding
handle is closed and are used during the determination of each handle's event mask.
Lines 11–13: Open socket—We read the desired hostname and port from the command line
and connect in the usual way using IO::Socket.
Lines 14–16: Create IO::Poll object—We now create a new IO::Poll object and add the socket
and STDIN filehandles to its list of monitored handles using the POLLIN mask. These masks
will be adjusted when there is data to write as well as to read.
Lines 17–18: Make filehandles nonblocking—We now put the socket and STDOUT into non-
blocking mode. This allows the client to continue working even if the socket or standard output
are temporarily unable to accept new writes.
Lines 19–20: Main loop—We loop until there are no more handles to do I/O on. The loop con-
dition is simply to check that the IO::Poll object's handles() method returns a nonempty list.
At the very top of the loop we call poll() to block until IO::Poll indicates that one of the handles
is ready for I/O.
Lines 21–29: Handle readers—The next chunk of code recovers the handles that have data to
read or are signaling end of file by calling the IO::Poll object's handles() method with the mask
POLLOUT|POLLERR.
If STDIN is ready for reading, we read from it and append the data to the variable
$to_socket. Likewise, data from the socket is appended to $to_stdout. If either read fails,
then we set one or both of the $stdin_done and $sock_done flags to true. We will check
these flags at the end of the loop.
Lines 30–48: Handle writers—Now it's time for the writable handles. We call the IO::Poll object's
handles() method with a flag that returns filehandles that are either writable or have errors.
If STDOUT is on the list, then we attempt to write the contents of $to_stdout to it. Likewise with
$to_socket for the socket. Because both sockets are nonblocking, we have to deal with
EWOULDBLOCK errors and with partial writes. The logic here is similar to that used in Chapter
13. On EWOULDBLOCK, we skip the filehandle and wait until later to try a write. On a partial read,
we remove the portion of the buffer that was successfully written, leaving the unwritten portion
to try later.
In the case of a syswrite() error that is not EWOULDBLOCK, we simply terminate with an error
message.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
IO::Poll 392
Lines 49–58: continue{} block—The core logic of the program is all contained in the con-
tinue{} block, which is executed once at the end of each iteration of the loop. Its job is to
create event masks for the three handles that are appropriate for the next iteration of the loop.
We begin by setting the three masks to a default of 0, which, if unchanged, removes the handle
from the poll set. Next we examine the $to_stdout buffer. If it contains data, then we set the
mask for STDOUT to POLLOUT, indicating that poll() should tell us when the handle is writable.
Similarly, we set the mask for STDIN to POLLIN, asking to be alerted when there is data to read
from standard input. However, we suppress this if either of two circumstances apply: (1) the
length of the buffer that contains data bound for the socket is already at its maximum value, in
which case we don't want to make it larger; or (2) either the socket or standard input itself is
closed.
Now we need to set the mask for the socket. Unlike standard input or output, the socket is read/
write. If there is data to write to the socket ($to_socket has nonzero length) and the socket
was not previously closed, then we set its mask to POLLOUT. To this we add the POLLIN flag if
the length of the buffer going to standard output is not already at its maximum.
Having created the masks, we call $poll->mask() three times to set them for their respective
filehandles.
Line 59: Shut down the socket at termination time—Our last step is to deal with the situation in
which we reach the end of STDIN. As in the various versions of the gab client, the most elegant
solution is to shut down our end of the socket for writing and then to wait for the peer to close
down its end. The only twist here is that we don't want to do this while there is unsent data in
the $to_socket buffer, so we wait for the length of the buffer to reach 0 before executing
shutdown(1).
Summary
The IO::Poll module provides an interface to the system poll() call and can be used as an alter-
native to select() for multiplexing across multiple I/O handles. Compared with select(),
poll() provides improved performance when multiplexing across a large number of handles and
should be considered for servers that will have heavy loads. However, IO::Poll should be used with
care when writing applications designed for portability, because it became a part of the standard I/
O library only as of Perl version 5.6.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
393
This part covers a variety of more specialized networking topics, including the handling of TCP
urgent data, UNIX domain sockets, and the UDP protocol. In addition, we cover the specialized
topics of broadcasting and multicasting.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
394
TCP is fundamentally stream based. Data placed by the sending process into the operating system's
TCP transmit buffer is received by the other end and read by the receiving process in exactly the
same order in which it was sent. But what if the sending process detects some exceptional condition
and it needs to alert the receiver immediately? Vanilla TCP can't handle this well because all data
has equal priority, and the urgent message will have to wait its turn behind all the data sent before
it. This is where TCP "urgent" data fits in. This facility, more commonly known as "out-of-band" data,
makes it possible, in a limited and highly qualified manner, to send and receive TCP messages that
are delivered ahead of the ordinary TCP stream.
To illustrate the use of such a facility, consider a terminal-based application that allows the user to
queue a stream of long-running commands to be executed on the server. After he issues several
commands, and while the server is still chewing through them, the user changes his mind and
decides to cancel by hitting the interrupt key. But the commands have already been sent to the
server and are in the TCP receive queue waiting for processing. Somehow the client must transmit
a cancel signal to the server immediately, without queuing a cancel command behind other normal-
priority data. One way to accomplish this is to use TCP urgent data to notify the server to clear its
list of pending commands and to ignore commands already received but not yet read. We develop
just such an application in the course of this chapter.
In this code fragment, we call send() to transmit the character "a" across the socket $socket.
The MSG_OOB flag specifies that the message is urgent and must be delivered right away. On the
other end, the recipient of the message can read the urgent data by calling recv() with the same
flag:
recv ($socket,$data,1,MSG_OOB) or die "Can't recv(): $!";
Here we're asking recv() to fetch 1 byte of urgent data from the socket and store it in the scalar
$data.
This looks simple enough, but there is significant complexity lurking under the surface. Although the
term "out-of-band data" implies that it is transmitted outside the normal data stream, this is not the
case.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
TCP Urgent Data 395
Urgent data works in the manner illustrated in Figure 17.1. During normal TCP operations, the
sending process queues data into the operating system's TCP transmit buffer. The contents of the
buffer are spooled across the network and eventually end up in the TCP receive buffer of the des-
tination host. A sending process now sends 1 byte of urgent data by calling send() with the
MSG_OOB flag. This causes three things to happen:
1. The TCP stream is put into URGENT mode, and the operating system alters the receiving
process to this fact by sending it the URG signal.
2. The urgent data is appended to the transmit buffer, where it will be sent to the receiving process
using the normal TCP flow-control rules.
3. A mark, known as the "urgent pointer," is added to the TCP stream to mark the position of the
urgent data. There is only one urgent pointer per TCP stream.
When the receiving process calls recv() with the MSG_OOB flag, the operating system uses the
urgent pointer to extract the urgent data byte from the stream and return it separately from the rest.
Other calls to sysread() and recv() ordinarily skip over the urgent data, pretending to the caller
that it isn't there.
Because of the way TCP urgent data works, there are numerous caveats and restrictions on its use:
1. Only a single byte of urgent data can be sent at one time. If you use send() to send multiple
characters, only the last one will be considered urgent by the receiver.
2. Because there is only one urgent pointer per stream, if the sender calls send() to write urgent
data multiple times before the receiver calls recv(), only the last urgent event will be re-
ceived. All earlier urgent data marks will be erased, and the earlier urgent data bytes will
appear in the normal data stream.
3. Although the receiving process is sent the URG signal immediately, the urgent data itself is
subject to all the TCP flow-control rules. This means that the receiving process may be notified
that there is urgent data available before it has actually arrived. Furthermore, it may be nec-
essary to clear some room in the TCP receive buffer before the urgent data byte can be re-
ceived.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
TCP Urgent Data 396
Caveat 3 is the real kicker. If the receiver calls recv() with MSG_OOB before the urgent data has
arrived, the call will fail with an EWOULDBLOCK error. The alternatives are to just ignore the urgent
data or to perform one or more normal reads until the urgent data arrives. The work needed to
implement the latter option is eased slightly by the fact that sysread() stops automatically at the
urgent pointer boundary. We'll see examples of this later.
Lines 1–6: Set up the socket—We create a socket connected to the indicated host and port.
Lines 7–10: Install signal handlers—We install an INT handler that prints a warning message and
then sends a byte of urgent data across the socket using this idiom:
send($socket,"!",MSG_OOB);
We also want to be able to quit the program, so we trap the QUIT with a signal handler that calls
exit(). On UNIX systems, the QUIT signal is usually issued by pressing "^\" (control-backslash).
Lines 10–15: Main loop—The remainder of the program is just a loop that writes the string "normal
data XX...\n" to the server, where XX is incremented by one each time through the loop. After
each call to syswrite(), the loop pauses for 1 second.
The odd construction 1 until sleep 1 guarantees that the script sleeps for a minimum of 1
second each time through the loop. Otherwise, every time we press the interrupt key, sleep() is
terminated prematurely and we don't get writes that are spaced evenly.
When we run the client, it runs for thirty iterations (about 30 s) and quits. If we hit the interrupt key
a couple of times during that period, we see the following messages:
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
TCP Urgent Data 397
% urg_send.pl
sending 2 bytes of normal data: aa
sending 2 bytes of normal data: ab
sending 1 byte of OOB data!
sending 2 bytes of normal data: ac
sending 2 bytes of normal data: ad
sending 2 bytes of normal data: ae
sending 1 byte of OOB data!
sending 2 bytes of normal data: af
Now we turn our attention to the server (Figure 17.3), which is only a bit more complicated than the
client. The server installs an URG handler that will be invoked whenever urgent data arrives on the
socket. However, in order for the operating system to know to deliver the URG signal, we must
associate our process ID (PID) with the socket by calling fcntl() with a command of F_SETOWN
and our PID as its argument.
Lines 1–6: Load modules—In addition to IO::Socket, we load the Fcntl module. This provides the
definition for the F_SETOWN constant.
Lines 7–11: Install URG handler—We install an anonymous subroutine, which tries to recv() 1 byte
of urgent data on the socket using the idiom we gave earlier. If recv() is successful, we print an
acknowledgment; otherwise, we get an error. Notice that even though we ask to receive 100 bytes
of data, the protocol restrictions allow only 1 byte of urgent data to be delivered. This server will
confirm that fact.
Lines 12–16: Create socket and accept() an incoming connection—We create a listen socket and
accept() a single incoming connection on it, storing the connected socket in $sock This is not a
general-purpose server, so we don't bother with an accept() loop.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
TCP Urgent Data 398
Lines 17–18: Set the owner of the socket—We pass the connected socket to fcntl(), with a
command of F_SETOWN and the current process ID stored in $$ as the argument. This sets the
owner of the socket so that we receive the URG signal.
Lines 19–22: Read data from the socket—We use sysread() to read conventional data from the
socket until we reach the end of file. Everything we read is echoed to standard output.
When we run the server and client together and interrupt the client twice, we see output like this:
% urg_recv.pl
Listening on port 2007...
got 2 bytes of normal data: aa
got 2 bytes of normal data: ab
got 1 byte of OOB data!
got 2 bytes of normal data: ac
got 2 bytes of normal data: ad
got 2 bytes of normal data: ae
got 1 byte of OOB data!
got 2 bytes of normal data: af
...
Notice that the urgent data never appears in the normal data stream read by sysread().
As written, there is a potential race condition in this server. It is possible for urgent data to come in
during or soon after the call to accept(), but before fcntl() has set the owner of the socket. In
this case, the server misses the urgent data signal. This may or may not be an issue for your ap-
plication. If it is, you could either
• engineer the client to introduce a brief delay after establishing the connection but before sending
out urgent data; or
• apply fcntl() to the listening socket, in which case the owner setting is inherited by all con-
nected sockets returned by accept().
Running the server and client together and generating a couple of interrupts now shows this pattern:
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
TCP Urgent Data 399
% urg_recv2.pl
Listening on port 2007...
got 2 bytes of normal data: aa
got 2 bytes of normal data: ab
recv() error: Invalid argument
got 1 bytes of normal data: !
got 2 bytes of normal data: ac
got 2 bytes of normal data: ad
got 2 bytes of normal data: ae
recv() error: Invalid argument
got 1 bytes of normal data: !
got 2 bytes of normal data: af
Each time an urgent data byte is received, the server's URG handler is called, just as before. How-
ever, because the data is now inline, the recv() call fails with an error of EINVAL. The urgent data
(an exclamation mark character) instead appears in the data stream read by sysread().
Notice that the urgent data always appears at the beginning of the data returned by a sys-
read() call. This is no coincidence. A feature of the urgent data API is that reads terminate at the
urgent data pointer even if the caller requested more data. In the case of inline data, the next byte
read by sysread() will be the urgent data itself. In the case of out-of-band data, the next byte read
will be the character that follows the urgent data.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
TCP Urgent Data 400
Licensed by
Stjepan Maric
4218908
Regrettably, this server needs to use a trick because of an idiosyncrasy of select(). Many im-
plementations of select() continue to indicate that a socket has urgent data to read even after
the program has called recv(), but calling recv() a second time fails with an EINVAL error be-
cause the urgent data buffer has already been emptied. This condition persists until at least 1 byte
of normal data has been read from the socket and, unless handled properly, the program goes into
a tight loop after receiving urgent data.
To work around this problem, we manage a flag called $ok_to_read_oob. This flag is set every
time we read normal data and cleared every time we read urgent data. At the top of the
select() loop, we add the socket to the list to be monitored for urgent data if and only if the flag
is true.
From the user's perspective, urg_recv3.pl behaves identically with urg_recv.pl. When we run it in
one terminal and the urg_send.pl client in another, we see the following output when we press the
interrupt key repeatedly in the client:
% urg_recv3.pl
Listening on port 2007...
got 2 bytes of normal data: aa
got 2 bytes of normal data: ab
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
TCP Urgent Data 401
$flag = sockatmark($socket)
sockatmark() is used to determine the location of the urgent data pointer. In the normal out-of-band case,
sock_atmark() returns true if the next sysread() will return the byte following the urgent data. In the case
of SO_OOBINLINE sockets, sock_atmark() returns true if the next sysread() will return the urgent data
itself.
Recall that sysread() always pauses at the location of the urgent pointer. The reason for this
feature is to give the process a chance to call sockatmark(). This code fragment shows the idiom:
# read until we get to the mark
until (sockatmark($socket)) {
my $result = sysread($socket,$data,1024);
die "socket closed before reaching mark" unless $result;
}
Each time through, sysread() is called to read (and in this case discard) 1,024 bytes of data from
the socket. The loop terminates normally when the urgent data pointer is reached, or abnormally if
the socket is closed (or encounters another error) before the urgent pointer is found. After the loop
ends, the next read will return the urgent data byte if the SO_OOBINLINE option was set or, if the
option was unset, it returns the normal data byte following that.
Implementing sockatmark()
Although the sockatmark() function is part of the POSIX standard, it hasn't yet made it into Perl
as a built-in function, or, indeed, into the standard libraries of many operating systems. To use it,
you must call your own version using an ioctl() call.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
TCP Urgent Data 402
To implement the sockatmark() function, we must call ioctl() with a command of SIOCAT-
MARK, the constant value for which can be found in a converted C header file, typically sys/
ioctl.ph. After calling ioctl(), the operand is filled with a packed integer argument containing 1 if
the socket is currently at the urgent data mark and 0 otherwise:
require "sys/ioctl.ph";
sub sockatmark {
my $s = shift;
my $d;
return unless ioctl($s,SIOCATMARK,$d);
return unpack("i",$d) != 0;
}
This looks simple, but there's a hitch. The particular header file needed is not standard across all
operating systems and is variously named sys/ioctl.ph, sys/socket.ph, sys/sockio. ph, or sys/sock-
ios.ph. This makes it difficult to write portable code. Furthermore, none of these converted header
files is part of the standard Perl distribution, but they must be created manually using a finicky and
sometimes unreliable Perl script called h2ph. This tool is documented in the online POD documen-
tation, but the capsule usage is as follows:
% cd /usr/include
% h2ph -r -l .
This assumes that you are using a UNIX system that keeps its header files in /usr/include. Users of
other operating systems that have a C or C++ compiler installed must locate their compiler's header
directory and run h2ph from there. Even then, h2ph occasionally generates incorrect Perl code and
the resulting .ph files may need to be patched by hand.
Having generated the converted header files, we're still stuck with having to guess which one con-
tains the SIOCATMARK constant. One approach is to try several possibilities until one works. The
following code snippet first uses a hard-coded value for Win32 systems, and then tries a series of
possible .ph file paths. If none succeeds, it dies.
$^O eq 'Win32' && eval "sub SIOCATMARK { 0x40047307 }";
defined &SIOCATMARK || eval { require "sys/ioctl.ph" };
defined &SIOCATMARK || eval { require "sys/socket.ph" };
defined &SIOCATMARK || eval { require "sys/sockio.ph" };
defined &SIOCATMARK || eval { require "sys/sockios.ph" };
defined &SIOCATMARK or die "Can't determine value for SIOCATMARK";
Figure 17.5 lists a small module named Sockatmark.pm that implements the sockatmark() call.
When loaded, it adds an atmark() method to the IO::Socket class, allowing you to interrogate the
socket directly:
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
TCP Urgent Data 403
use Sockatmark;
warn "at the mark" if $sock->atmark;
Alternatively, you can explicitly import the sockatmark() function in the use line:
use Sockatmark 'sockatmark';
warn "at the mark" if sockatmark($sock);
A Travesty Server
We now have all the ingredients necessary to write a client/server pair that does something useful
with urgent data. This server implements "travesty," a Markov chain algorithm that analyzes a text
document and generates a new document that preserves all the word-pair (tuple) frequencies of the
original. The result is a completely incomprehensible document that has an eerie similarity to the
writing style of the original. For example, here's an excerpt from the text generated after running the
previous chapter through the travesty algorithm:
It initiates an EWOULDBLOCK error. The urgent data signal. This may be several such messages from different
children. The parent will start a new document that explains the problem in %STATUS. Just before the urgent data
is because this version can handle up to the EWOULDBLOCK error constant. The last two versions of interrupts
now shows this pattern: Each time through, sysread() is called the "thundering herd" phenomenon. More seriously,
however, some operating systems may not already at its maximum.
The results of running Ernest Hemingway through the wringer are similarly amusing. Oddly, James
Joyce's later works seem to be entirely unaffected by this translation.
The client/server pair in this example divides the work in the classical manner. The client runs the
user interface. It prompts the user for commands to load text files into the analyzer, generate the
travesty, and reset the word frequency tables. The server does the heavy lifting, constructing the
Markov model from uploaded files and generating travesties of arbitrary length.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
TCP Urgent Data 404
TCP urgent data is useful in this application because it frequently takes longer for the server to
analyze the word tuple frequencies in an uploaded text file than for the client to upload it. The user
may wish to abort the upload midway, in which case the client must send the server an urgent signal
to stop processing the file and to ignore all data sent from the time the user interrupted the process.
Conversely, once the tuple frequency tables are created, the server has the ability to generate
travesty text far faster than the network can transfer it. We would like the user to be able to interrupt
the incoming text stream, again by issuing an urgent data signal.
The client/server pair requires three external modules in addition to the standard ones: Sockatmark,
which we have already seen; Text::Travesty, the travesty generator; and IO::Getline, the nonblock-
ing replacement for Perl's getline() function, which we developed in Chapter 13 (Figure 13.2).
In this case we won't be using IO::Getline for its nonblocking features, but for its ability to clear its
internal line buffer when the flush() method is called.
You then call add() one or more times to analyze the word tuple frequencies in a section of text:
$t->add($text);
Once the text is analyzed, you can generate a travesty with calls to generate() or
pretty_text():
$travesty = $t->generate(1000);
$wrapped = $t->pretty_text(2000);
Both methods take a numeric argument that indicates the length of the generated travesty, meas-
ured in words. The difference between the two methods is that generate() creates unwrapped
raw text, while pretty_text() invokes Perl's Text::Wrap module to create nicely indented and
wrapped paragraphs.
The words() method returns the number of unique words in the frequency tables. reset() clears
the tables and readies the object to receive fresh data to analyze:
$word_count = $t->words;
$t->reset;
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
TCP Urgent Data 405
The initial three-digit result code is what the client pays attention to. The human-readable text
is designed for remote debugging.
3. As an additional aid to debugging, the server uses CRLF pairs for all incoming commands and
outgoing responses. This makes the server compatible with Telnet and other common network
clients.
Figure 17.6 lists the server application.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
TCP Urgent Data 406
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
TCP Urgent Data 407
Lines 1–12: Load modules and initialize signal handlers—The travesty server follows the familiar
accept-and-fork architecture. In addition to the usual networking packages, we load Fcntl in order
to get access to the F_SETOWN constant and the Text::Travesty, IO::Getline, and Sockatmark mod-
ules. Recall that the latter adds the atmark() method to the IO::Socket class. We also define a
constant, DEBUG, which enables debugging messages, and a global to hold the IO::Getline object.
After loading the required modules, we set up two signal handlers. The CHLD handler is the usual
one used in accept-and-fork servers. We initially tell Perl to ignore URG signals. We'll reenable them
in the places where they have meaning, during the uploading and downloading of large data
streams.
Lines 13–26: Create listening socket and enter accept loop—The server creates a listening socket
and enters its accept() loop. Each incoming connection spawns a child that runs the han-
dle_connection() subroutine. After handle_connection() terminates, the child dies.
Lines 27–49: The handle_connection() subroutine— handle_connection() is responsible
for managing the Text::Travesty object, reading client commands from the socket, and handing the
command off to the appropriate subroutine. We begin by calling fcntl() to set the owner of the
socket so that the process can receive urgent signals. If this is successful, we set the line termination
character to the CRLF pair using local to dynamically scope the change in the $/ global variable
to the current block and all subroutines it invokes.
We now create a new Text::Travesty object and an IO::Getline wrapper for the socket. Recall from
Chapter 13 that IO::Getline has nonblocking behavior by default. In this application, we don't use
its nonblocking features, so we turn blocking back on after creating the wrapper. The IO::Getline
wrapper is global to the package so as to allow the URG handler to find it; since this server uses a
different process to service each incoming connection, this use of a global won't cause problems.
Having finished our initialization, we write our welcome banner to the client, using result code 200.
Notice that the IO::Getline module accepts all the object methods of IO::Socket, including sys-
write(). This makes the code easier to read than would calling the getline object's handle()
method each time to recover the underlying socket.
The remainder of the handle_connection() code is the command-processing loop. Each time
through the loop, we read a line, parse it, and take the appropriate action. The BYE command is
handled directly in the loop, and the others are passed to an appropriate subroutine. If a command
isn't recognized, the server issues a 500 error.
Lines 50–65: The analyze_file() subroutine—The analyze_file() subroutine processes
uploaded data. It accepts a Text::Travesty object, reinitializes it by calling its reset() method, and
then transmits a 201 message, which prompts the remote host to upload some text data.
We're now going to accept uploaded data from the client by calling $gl->getline() repeatedly
until we encounter a line consisting of a dot, or until we are interrupted by an URG signal.
To terminate the loop cleanly, we wrap it in inside an eval{} block and create an URG handler that
is local to the block. If an urgent signal comes in, the handler calls the subroutine do_urgent()
and then dies. Because die() is called within an eval{}, its effect is to terminate the eval{}
block and continue execution at the first statement after the eval{}.
Before exiting, we transmit a code 202 message giving the number of unique words we processed,
regardless of whether the upload was interrupted. Notice that we treat interrupted file transfers just
as if the uploaded file ended early. We leave the travesty generator in whatever state it happened
to be in when the URG signal was received. Because the travesty generator is not affected by the
analysis of a partial file, this causes no harm and might be construed as a feature. Another appli-
cation might want to reset itself to a known state.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
TCP Urgent Data 408
Again, notice that the send() method is passed by IO::Getline to the underlying IO::Socket object.
Lines 89–93: The reset_travesty() subroutine— reset_travesty() calls the travesty ob-
ject's reset() method and transmits a message acknowledging that the word frequency tables
have been cleared.
Lines 94–108: The do_urgent() signal handler— do_urgent() is the signal handler responsible
for emptying the internal read buffer when an urgent data byte is received. We recover the socket
from the global IO::Getline object and invoke sysread() in a tight loop until the socket's
atmark() method returns true. This discards any and all data up to the urgent byte.
We then invoke recv() to read the urgent data itself. The exact contents of the urgent data have
no particular meaning to this application, so we ignore it. When this is done, we clear out any of the
remaining data in the IO::Getline object's internal buffer by calling its flush() method. The end
result of these manipulations is that all unread data transmitted up to and including the urgent data
byte is discarded.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
TCP Urgent Data 409
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
TCP Urgent Data 410
Lines 1–9: Load modules—We turn on strict type checking and load the required networking mod-
ules, including the Sockatmark module developed in this chapter. We also make STDOUT nonbuf-
fered so that the user's command prompt appears immediately.
Lines 10–12: Set up globals—The $HOST and $PORT globals contain the remote hostname and port
number to use. If not provided on the command line, they default to reasonable values. Two other
globals are used by the script. $gl contains the IO::Getline object that wraps the connected socket,
and $quit_now contains a flag that indicates that the program should exit. Both are global so that
they can be accessed by signal handlers.
Lines 13–15: Set up default signal handlers—We set up some signal handlers. The QUIT signal,
ordinarily generated from the keyboard by ^\, is used to terminate the program. INT, however, is
a bit more interesting. Each time the handler executes, it increments the $quit_now global by one.
If the variable reaches 2 or higher, the program exits. Otherwise, the handler prints " Press ^C
again to exit." The result is that to terminate the program, the user must press the interrupt key
Licensed by
twice without intervening commands. This prevents the user from quitting the program when she
intended to interrupt output. The URG handler is set to run the do_urgent() subroutine, which we
will examine later.
Lines 16–18: Create connected socket—We try to create an IO::Socket handle connected to the
Stjepan Maric
remote host. If successful, we use fcntl() to set the socket's owner to the current process ID so
that we receive URG signals.
Lines 19–22: Create IO::Getline wrapper—We create a new IO::Getline wrapper on the socket, turn
blocking behavior back on, and immediately look for the welcome banner from the host by pattern
4218908
matching for the 200 result code. If no result code is present, we die with an appropriate error
message.
Lines 23–36: Command loop—We now enter the program's main command loop. Each time through
the loop, we print a command prompt (">") and read a line of user input from standard input. We
parse the command and call the appropriate subroutine. User commands are:
• analyze—Upload and analyze a text file
• generate NNNN—Generate NNNN words of travesty
• reset—Reset frequency tables
• bye—Quit the program
• goodbye—Quit the program
The command loop's continue{} block sets $quit_now to 0, resetting the global INT counter.
Lines 37–60: The do_analyze() subroutine —The do_analyze() subroutine is called to upload
a text file to the server for analysis. The subroutine receives a file path as its argument and tries to
open it using IO::File. If the file can't be opened, we issue a warning and return. Otherwise, we send
the server the DATA command and the response line. If the response matches the expected 201
result code, we proceed. Otherwise, we echo the response to standard error and return.
We now begin to upload the text file to the server. As in the server code, the upload is done in an
eval{} block, but in this case it is the INT signal that we catch. Before entering the block, we set
a local variable $abort to false. Within the block we create a local INT handler that prints a warning,
sets $abort to true, and dies, causing the eval{} block to terminate. By declaring the handler
local, we temporarily replace the original INT handler, and restore it automatically when the
eval{} block is finished. Within the block itself we read from the text file one line at a time and send
it to the server. When the file is finished, we send the server a "." character.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
TCP Urgent Data 411
After finishing the loop, we check the $abort variable. If it is true, then the transfer was interrupted
prematurely when the user hit the interrupt key. We need to alert the server to this fact so that it can
ignore any data that we've sent it that it hasn't processed yet. This is done by sending the server 1
byte of urgent data.
The last step is to read the response line from the server and print the number of unique words
successfully processed.
Lines 61–67: Handle the reset and bye commands—The do_reset() subroutine sends a RESET
command to the server and checks the result code. do_bye() sends a BYE command to the server,
but in this case does not check the result code because the program is about to exit anyway.
Lines 68–90: The do get() subroutine—The do_get() subroutine is called when the user choo-
ses to generate a travesty from a previously uploaded file. We receive an argument consisting of
the number of words of travesty to generate, which we pass on to the server in the form of a GEN-
ERATE command. We then read the response from the server and proceed only if it is the expected
203 "travesty follows" code.
We are now ready to read the travesty from the server. The logic is similar to the do_analyze()
subroutine. We set the local variable $abort to a false value and enter a loop that is wrapped in
an eval{}. For the duration of the loop, the default INT handler is replaced with one that increments
$abort and dies, terminating the eval{} block. The loop accepts lines from the server, removes
the CRLF pairs with chomp(), and prints them to standard output with proper newlines. The loop
terminates normally when it encounters a line consisting of one dot.
After the loop is done, we check the $abort variable for abnormal termination. If it is set to a true
value, then we send the server an urgent data byte, telling it to stop transmission. Recall that this
also results in the server sending back an urgent data byte to indicate the point at which transmission
was halted.
Lines 91–104: The do_urgent() subroutine—The do_urgent() subroutine handles URG signals
and is identical to the subroutine of the same name in the server. It discards everything in the socket
up to and including the urgent data byte and resets the contents of the IO::Getline object.
Lines 105–113: Print the program usage— print_usage() provides a terse command summary
that is displayed whenever the user types an unrecognized command.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
TCP Urgent Data 412
The next step was to test that I could interrupt uploads. I ran the analyze command again, but this
time hit the interrupt key before the analysis was complete:
> analyze /home/lstein/docs/ch17.txt
analyzing...interrupted!...processed 879 words
The message indicates that only 879 of 2,658 unique words were processed this time, confirming
that the upload was aborted prematurely. Meanwhile, on the server's side of the connection, the
server's do_urgent() URG handler emitted the following debug messages as it discarded all data
through to the urgent pointer:
command = DATA
discarding 1024 bytes
discarding 1024 bytes
discarding 1024 bytes
discarding 1024 bytes
discarding 531 bytes
reading 1 byte of urgent data
The final test was to confirm that I could interrupt travesty generation. I issued the command gen-
erate 20000 to generate a very long 20,000-word travesty, then hit the interrupt key as soon as text
started to appear.
> reset
reset successful
> analyze /home/lstein/docs/ch17.txt
analyzing...processed 2658 words
> generate 20000
As expected, the transmission was interrupted and the client's URG signal handler printed out a
series of debug messages as it discarded data leading up to the server's urgent data.
to
use IO::Sockatmark;
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
TCP Urgent Data 413
Summary
TCP urgent data provides a way for one process to signal another via a TCP stream that some time-
critical event has occurred. Although the urgent signal is transmitted "out of band," meaning that it
is delivered to the remote host in a priority fashion, the urgent data itself is not truly out of band but
is subject to the same sequencing and flow-control rules as ordinary TCP data. To read the contents
of the urgent data, it may be necessary to read (and possibly discard) normal data until the urgent
data becomes available. Hence, urgent data is easiest to work with when it is the existence of the
data that counts and not its actual contents.
In addition to reading the contents of the urgent data byte, you can discover its position in the data
stream using the sockatmark() function. This provides a way to mark a section of the TCP data
stream for special treatment. In the travesty example, we used urgent data to mark a section of the
data stream for disposal.
Because of the restrictions on TCP urgent mode, you might consider the alternative of using two
separate sockets, one for normal communication and the other for high-priority control data. Such
an arrangement allows you to transmit and receive multibyte high-priority messages and eliminates
the need for sockatmark() and other arcane issues. However, it will add to the complexity of your
software by doubling the number of sockets that need to be managed.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
414
Up to now we have focused exclusively on applications that use the TCP protocol and have said
little about the User Datagram Protocol, or UDP. This is because TCP is generally easier to use,
more reliable, and more familiar to programmers who are used to dealing with files and pipes. On
the Internet, TCP-based applications protocols outnumber those based on UDP by a factor of at
least 10 to 1.
Nevertheless, UDP is extremely useful for certain applications, and sometimes can do things that
would be difficult, if not impossible, for a TCP-based service to achieve. The next few chapters
introduce UDP, discuss the design of UDP-based servers, and show how to use UDP for broad-
casting and multicasting applications.
The client is different from the TCP-based programs we are more familiar with. Figure 18.1 shows
the complete code for this program.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
The UDP Protocol 415
Lines 1–5: Load modules—We begin by turning on strict code checking and then bring in the
standard Socket library and its line-end constants. We set $/ to CRLF, not because we'll be
performing line-oriented reads, but in order for the chomp() call at the end of the script to remove
the terminating CRLF properly.
Lines 6–8: Define constants—We define some constant values. DEFAULT_HOST is the name of
the host to contact if not specified on the command line; we use the loopback address, "local-
host." DEFAULT_PORT is the port to contact if not overridden on the command line; it can be
either the port number or a symbolic service name. We use "daytime" as the service name.
UDP data is transmitted and received as discrete messages. MAX_MSG_LEN specifies the max-
imum size of a message. Since the daytime strings are only a few characters, it is safe to set
this constant to a relatively small value of 100 bytes.
Lines 9–10: Read command-line arguments—We read the command-line arguments into the
$host and $port global variables; if these variables are not provided, we use the defaults.
Lines 11–13: Get protocol and port—We use getprotobyname() to get the protocol number
for UDP and call getservbyname() to look up the port number for the daytime service. If the
user provided the port number directly, we skip the last step. We declare an empty variable
named $data to receive the message transmitted by the remote host.
Line 14: Create the socket—We create the socket by calling Perl's built-in socket() function.
We use AF_INET for the domain, creating an Internet socket, SOCK_DGRAM for the type, creating
a datagram-style socket, and the previously derived protocol number for UDP.
If successful, socket() returns a true value and assigns a socket to the filehandle. Otherwise,
the call returns undef and we die with an error message.
Line 15: Create the destination address—The final preparatory step is to create the destination
address for outgoing messages. We call inet_aton() to turn the hostname into a packed
string and pack this with the port into a sockaddr_in structure, using the function of the same
name.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
The UDP Protocol 416
Line 16: Send the request—We now have a socket and a destination address. The next step is
to send a message to the server to tell it that it has a customer waiting. With the daytime service,
one can send any message (even an empty one) and the server will respond with the time of day.
To send the message, we call the send() function. send() takes four arguments: the socket
name, the message to send, the message flags, and the destination to send it to. For the mes-
sage contents we use the string "What time is it?" but any string would do. We pass a 0 for the
message flags in order to accept the defaults. For the destination address, we use the packed
sockaddr_in address that we built earlier.
If the message is correctly queued for delivery, send() returns a true value. Otherwise, we die
with an error message.
Line 17: Receive response—The message has now been sent (or at least successfully queued),
so we wait for a response using the recv() function. Like send(), this call also takes several
arguments, including the socket, a variable in which to store the received data, and a numeric
value indicating the maximum length of the message that we will receive.
If a message is received, recv() copies up to MAX_MSG_LEN bytes of it into $data. In case of
an error, recv() returns undef, and we exit with an error message. Otherwise, recv() returns
the packed address of the sender. We don't do anything with the sender's address but will put
it to good use in the server examples given in later sections.
Lines 18–19: Print the response—We remove the CRLF at the end of the message with
chomp() and print its contents to standard output.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
The UDP Protocol 417
Using socket SOCK, send() sends the message data that is contained in $message to the desti-
nation indicated by $dest_addr. The $flags argument, which in addition to controlling TCP out-
of-band data can be used to adjust esoteric routing parameters, should be set to 0. The destination
address must be a packed socket address created by sockaddr_in(). Like all other INET ad-
dresses, the address includes the port number and IP address of the destination.
send() will return the number of bytes successfully queued for delivery. If for some reason it
couldn't queue the message, send() returns undef and sets $! to the relevant error message.
Note that a positive response from send() does not mean that the message was successfully
delivered, or even that it was placed on the network wire. All this means is that the operating system
has successfully copied the message into the local send buffer. UDP is unreliable and guarantees
nothing.
Having used a socket to send a message to one destination address, a program can turn right
around and use send() to send a second message to a different destination. Unlike TCP, in the
UDP protocol there is no long-term relationship between a socket and its peer.
To receive a UDP message, call recv(). This function also takes four arguments, and uses this
idiom:
$sender = recv (SOCK,$data,$max_size,$flags);
In this case $data is a scalar that receives the contents of the message, $max_size is the maxi-
mum size of the datagram that you can accept, and $flags should once again be set to 0. The
recv() call will block until a datagram is received. On receipt of a message, recv() returns the
message contents in $data and the packed address of the sender in the function result. The sender
address is provided so that you can reply to the sender.
If the received datagram is larger than $max_size, it will be truncated. If some error occurs,
recv() returns undef and sets $! to the appropriate error code.
If you are familiar with the C-language socket API, you should know that the Perl recv() function
is actually implemented on top of the C language recvfrom() call, not the recv() call itself.
Once a UDP socket is bound, many systems do not allow it to be rebound to a different address.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
The UDP Protocol 418
After the UDP socket is connected, send() will accept only the first three arguments. You should
not try to specify a destination address as the fourth argument, or you will get an "invalid argument"
error. This is convenient for clients that wish to communicate only with a single UDP server. After
connecting the socket, clients can send() to the same server multiple times without having to give
the destination address repeatedly.
Should you wish to change the destination address, you may do so by calling connect() again
with the new address. Although the C-language equivalent of this call allows you to dissolve the
association by connecting to a NULL address, Perl does not provide easy access to this functionality.
A nice side effect of connecting a datagram socket is that such a socket can receive messages only
from the designated peer. Messages sent to the socket from other hosts or from other ports on the
peer host are ignored. This can add a modicum of security to a client program. However, connecting
a datagram socket does not change its basic behavior: It remains message oriented and unreliable.
Servers that typically must receive and send messages to multiple clients should generally not
connect their sockets.
UDP Errors
UDP errors are a little unusual because they can occur asynchronously. Consider what happens
when you use send() to transmit a UDP datagram to aremote host that has no program listening
on the specified port. With TCP, you would get a "connection refused" (ECONNREFUSED) error on
the call to connect(). Similarly, a problem at the remote end, such as the server going down, will
be reported synchronously the next time you read or write to the socket.
UDP is different. The return value from send() tells you nothing about whether the message was
delivered at the remote end, because send() simply returns true if the message is successfully
queued by the operating system. In the event that no server is listening at the other end and you go
on to call recv(), the call blocks forever, because no reply from the host is forthcoming.1
Asynchronous Errors
There is, however, a way to recover some information on UDP communications errors. If a UDP
socket has been connected, it is possible to receive asynchronous errors. These are errors that
occur at some point after sending a datagram, and include ECONNREFUSED errors, host unreachable
messages from routers, and other problems.
Asynchronous errors are not detected by send(), because this always reports success if the da-
tagram was successfully queued. Instead, after an asynchronous error occurs, the next call to
recv() returns an undef value and sets $! to the appropriate error message. It is also possible
to recover and clear the asynchronous error by calling getsockopt() with the SO_ERROR com-
mand.
You may also use select() on a UDP socket to determine whether an asynchronous error is
available. The socket will appear to be readable, and recv() will not block.
The implementation of UDP on Linux systems differs somewhat from this description. On such
systems, asynchronous errors are always returned regardless of whether or not a socket is con-
nected. In addition, if the network is sufficiently fast, it is sometimes possible for send() to detect
and report datagram delivery errors as well.
1Of course, this could also happen if either your request or the server's response is lost in transit.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
The UDP Protocol 419
To create a socket bound to a known local port or interface address, provide one or more of the
LocalAddr and LocalPort arguments:
my $sock = IO::Socket::INET->new (Proto => 'udp',
LocalAddr => 12000,
LocalPort => 'localhost'
) or die $@;
You may also connect() the socket and set a default destination address for send() by providing
new() with the PeerAddr and optionally PeerPort arguments:
my $sock = IO::Socket::INET->new (Proto =>'udp',
PeerAddr=>'wuarchive.wustl.edu:daytime(17)'
) or die $@;
IO::Socket implements both send() and recv() methods. They are wrappers around the epon-
ymous built-in functions, with a few improvements. For one, the $flags argument is optional in
both send() and recv() methods. (It is required in the built-in version.) In addition, the recv()
call remembers the source address of the most recently received datagram. You may retrieve it
using the peername(), peeraddr(), peerport(), and peerhost() methods.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
The UDP Protocol 420
The send() method sends the contents of $data via the socket, returning the number of bytes successfully
enqueued. $flags and $dest_addr have the same meaning as in the built-in send() function. $flags is
optional and, if not specified, defaults to 0. For connected sockets, $dest_addr should not be used.
As a convenience, if an unconnected socket has previously been used to receive a packet and if
$dest_addr is not explicitly specified, the socket object uses this address as the default destination for
send().
Licensed by
Stjepan Maric
4218908
Lines 1–7: Set up script—We load the IO::Socket module, bringing in the default socket con-
stants and the constants related to line endings.
We set the input record terminator global to CRLF and read the destination host and port from
the command line.
Lines 8–10: Create socket—We call IO::Socket::INET->new() to create a new socket. We
specify a Proto of " udp," overriding IO::Socket's defaults. In addition, we pass PeerHost and
PeerPort arguments, causing new() to connect() the socket after creating it.
Lines 11–12: Send request, receive response—We call the socket's send() method to send a
request. We then block in recv() until we get a response. If successful, the response is copied
into $data.
Lines 13–15: Print response—We remove the CRLF from the response with chomp() and print
it to standard output.
When we run the revised script, it works in the same way as the earlier version:
% udp_daytime_cli2.pl wuarchive.wustl.edu
Thu Aug 17 11:00:30 2000
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
The UDP Protocol 421
Lines 1–7: Initialize script—We bring in the IO::Socket module and its constants. We again de-
clare the MAX_MSG_LEN constant and define a timeout of 10 seconds for the receipt of all the
responses. As before, we set the input record separator to CRLF.
Line 8: Set up a signal handler—We will use alarm() to set the timeout on received responses,
so we install an ALRM signal handler, which simply dies with an appropriate message.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
The UDP Protocol 422
Aside from the time-zone differences, the three machines that responded reported the same time,
plus or minus a few seconds. It is likely that they are running XNTP servers, a UDP-based protocol
for synchronizing clocks with an authoritative source.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
The UDP Protocol 423
UDP Servers
UDP servers are generally much simpler in design than their TCP brethren. A typical UDP server
is a simple loop that receives a message from an incoming client, processes it, and transmits a
response. A server may handle requests from different clients with each iteration of the loop.
Because there's no long-term relationship between client and server, there's no need to manage
connections, maintain concurrency, or retain state for an extended time. By the same token, a UDP
server must be careful to process each transaction quickly or it may delay the response to waiting
requests.
We will look at UDP servers in more detail in Chapter 19. In this chapter, we show a very simple
example of a UDP client/server pair.
Lines 1–7: Initialize module—We load the IO::Socket module and initialize our constants. The
MY_ECHO_PORT constant should be set to an unused port on your system. We allow our port
number to be changed at runtime using a command-line argument. If this argument is present,
we recover it and store it in $port.
Line 8: Install INT handler—We install an INT handler so that the server exits gracefully when
the interrupt key is pressed. Microsoft Windows users will want to comment this out to avoid Dr.
Watson errors.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
The UDP Protocol 424
Lines 9–10: Create the socket—We call IO:Socket::INET->new() to create a UDP socket
bound to the port specified on the command line. The LocalPort argument is required to bind to
the correct port, but as with TCP sockets there's no need to provide LocalAddr explicitly.
IO::Socket::INET assumes INADDR_ANY, allowing the socket to receive messages on any of
the host's network interfaces.
Lines 11–21: Main loop—We enter an infinite loop. Each time through the loop we call the sock-
et's recv() method, copying the message into $msg_in. If for some reason we encounter an
error, we just continue with the next iteration of the loop.
After accepting a message, we call the socket's peeraddr() method to recover the packed
address of the sender, and attempt to translate it into a DNS hostname as before. If this fails,
we retrieve the dotted-quad form of the peer's IP address. The call to peerport() returns the
sender's port number. We print a status message to standard error and generate a response
consisting of the client's message reversed end-to-end.
We now take advantage of another trick in the IO::Socket module. As mentioned earlier, if you
call the send() method immediately after recv(), IO::Socket uses the stored peer address as
its default destination. This means that we do not have to explicitly pass the destination address
to send(). This reduces the idiom to a succinct:
$sock->send($msg_out) or die "send(): $!\n"; # (line 21)
Line 22: Close the socket—Although this statement is never reached, we call the socket's
close() method at the end of the script.
Lines 1–8: Initialization—We load the IO::Socket module and initialize our constants and global
variables. We use the standard " echo " service port as our default. This can be overridden on
the command line, for instance to talk to the reverse-echo server discussed in the previous
section.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
The UDP Protocol 425
Lines 9–10: Create socket—We create a new IO::Socket::INET object, requesting the UDP pro-
tocol and specifying a PeerAddr that combines the selected hostname and port number. Be-
cause we know in advance that the socket will be used to send messages to only one single
host, we allow IO::Socket to call connect().
Lines 11–16: Main loop—We read a line of input from standard input, then remove the terminal
newline and send() it to the server. We don't need to specify a destination address, because
the default destination has been set with connect(). We then call recv() to receive a re-
sponse and print it to standard output.
Line 17: Close the socket—The loop exits when standard input is closed. We close the socket
by calling its close() method.
I launched the echo server from the previous section on the machine brie.cshl.org and ran the client
on another machine, being careful to specify port 2007 rather than the default echo port. The tran-
script from the client session looked like this:
% udp_echo_cli1.pl brie.cshl.org 2007
hello there
ereht olleh
what's up?
?pu s'tahw
goodbye
eybdoog
^D
If other clients had sent requests during the same period of time, the server would have processed
them as well and printed an appropriate status message.
If the quality of your connection is excellent, you may see the entire contents of the file scroll by and
the command-line prompt reappear after the last line is echoed. More likely, though, you will see
the program get part way through the text file and then hang indefinitely. What happened?
Remember that UDP is an unreliable protocol. Any datagram sent to the remote server may fail to
reach its destination, and any datagram returned from the server to the local host may vanish into
the ether. If the remote server is very busy, it may not be able to keep up with the flow of incoming
packets, resulting in buffer overrun errors.
Our echo client doesn't take these possibilities into account. After we send() the message, we
blithely call recv(), assuming that a response will be forthcoming. If the response never arrives,
we block indefinitely, making the script hang.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
The UDP Protocol 426
This is yet another example of deadlock. We won't get a message from the server until we send it
one to echo back, but we can't do that because we're waiting for a message from the server!
As with TCP, we can avoid deadlock either by timing out the call to recv() or by using some form
of concurrency to decouple the input from the output.
We wrap recv() in an eval{} block and set a local ALRM handler that invokes die(). Just prior
to making the system call, we call the alarm() function with the desired timeout value. If the function
returns normally, we call alarm(0) to cancel the alarm. Otherwise, if the alarm clock goes off before
the function returns, the ALRM handler runs and we die. But since this fatal error is trapped within
an eval{} block, the effect is to abort the entire block and to leave the error message in the $@
variable. Our last step is to examine this variable and issue a warning if a timeout occurred or die if
the variable contains an unexpected error.
Using a variant of this strategy, we can design a version of the echo client that transmits a message
and waits up to a predetermined length of time for a response. If the recv() call times out, we try
again by retransmitting the request. If a predetermined number of retransmissions fail, we give up.
Figure 18.6 shows a modified version of the echo client, udp_echo_cli2.pl.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
The UDP Protocol 427
Lines 1–15: Initialize module, create socket—The main changes are two new constants to con-
trol the timeouts. TIMEOUT specifies the time, in seconds, that the client will allow recv() to
wait for a message. We set it to 2 seconds. MAX_RETRIES is the number of times the client will
try to retransmit a message before it assumes that the remote server is not answering.
Lines 16–30: Main loop—We now place a do{} loop around the calls to send() and recv().
The do{} loop retransmits the outgoing message every time a timeout occurs, up to
MAX_RETRIES times. Within the do{} loop, we call send() to transmit the message as before,
but recv() is wrapped in an eval{} block. The only difference between this code and the
generic idiom is that the local ALRM handler bumps up a variable named $retries each time
it is invoked. This allows us to track the number of timeouts. After the eval{} block completes,
we check whether the number of retries is greater than the maximum retry setting. If so we issue
a short warning and die.
The easiest way to test the new and improved echo client is to point it at a port that isn't running the
echo service, for example, 2008 on the local host:
% udp_echo_cli2.pl localhost 2008
anyone home?
Retrying...1
Retrying...2
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
The UDP Protocol 428
Retrying...3
Retrying...4
Retrying...5
timeout
The reverse-echo server generates a response that preserves this format. The server's response
to the sample request given earlier would be:
42: efil fo gninaem eht
The modifications to the reverse-echo server of Figure 18.4 are trivial. We simply replace line 19
with a few lines of code that detect messages having the sequence number/payload format and
generate an appropriately formatted response.
if ( $msg_in =~ /^(\d+): (.*)/ ) {
$msg_out = "$1: ".reverse $2;
} else {
$msg_out = reverse $msg_in;
}
For backward compatibility, messages that are not in the proper format are simply reversed as
before. Another choice would be to have the server discard unrecognized messages.
All the interesting changes are in the client, which we will call udp_echo_cli3.pl (Figure 18.7). Our
strategy is to maintain a hash named %PENDING to contain a record of every request that has been
sent. The hash is indexed by the sequence number of the outgoing request and contains both a
copy of the original request and a counter that keeps track of the number of times the request has
been sent.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
The UDP Protocol 429
Figure 18.7. The udp_echo_cli3.pl script detects duplicate and misordered messages
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
The UDP Protocol 430
A global variable $seqout is incremented by 1 every time we generate a new request, and another
global, $seqin, keeps track of the sequence number of the last response received from the server
so that we can detect out-of-order responses.
We must abandon the send-and-wait paradigm of the earlier UDP clients and assume that respon-
ses from the server can arrive at unpredictable times. To do this, we use select() with a timeout
to multiplex between STDIN and the socket. Whenever the user types a new request (i.e., a string
to be reversed), we bump up the $seqout variable and create a new request entry in the %PEND-
ING array.
Whenever a response comes in from the server, we check its sequence number to see if it corre-
sponds to a request that we have made. If it does, we print the response and delete the request
from %PENDING. If a response comes in whose sequence number is not found in %PENDING, then
it is a duplicate response, which we discard. We store the most recent sequence number of an
incoming response in $seqin, and use it to detect out-of-order responses. In the case of this client,
Licensed by
we simply warn about out-of-order responses, but don't take any more substantial action.
If the call to select() times out before any new messages arrive, we check the %PENDING array
to see if there is still one or more unsatisfied requests. If so, we retransmit the requests and bump
up the counter for the number of times the request has been tried.
Stjepan Maric
In order to mix line-oriented reads from STDIN with multiplexing, we take advantage of the IO::Get-
line module that we developed in Chapter 13 (Figure 13.2). Let's walk through the code now:
Lines 1–9: Load modules, define constants—We bring in the IO::Socket, IO::Select, and IO::Get-
line modules.
4218908
Lines 10–12: Define the structure of the %PENDING hash—The %PENDING hash is indexed by
request sequence number. Its values are two-element array references containing the original
request and the number of times the request has been sent. We use symbolic constants for the
indexes of this array reference, such that $PENDING{$seqno}[REQUEST] is the text of the
request and $PENDING{$seqno}[TRIES] is the number of times the request has been sent
to the server.
Lines 13–18: Global variables— $seqout is the master counter that is used to assign unique
sequence numbers to each outgoing request. $seqin keeps track of the sequence number of
the last response we received. The server $host and $port are read from the command line
as before.
Lines 19–22: Create socket, IO::Select objects, and IO::Getline objects—We create a UDP
socket as before. If successful, we create an IO::Select set initialized to contain the socket and
STDIN, as well as an IO::Getline object wrapped around STDIN.
Lines 23–25: The select() loop—We now enter the main loop of the program. Each time
through the loop we call the select set's can_read() method with the desired timeout. This
returns a list of filehandles that are ready for reading, or if the timeout expired, an empty list. We
loop through each of the filehandles that are ready for reading. There are only two possibilities.
One is that the user has typed something and STDIN has some data for us to read. The other
is that a message has been received and we can call recv() on the socket without blocking.
Lines 26–32: Handle input on STDIN —If STDIN is ready to read, we fetch a line from its
IO::Getline wrapper by calling the getline() method. Recall that the syntax for IO::Get-
line->getline() works like read(). It copies the line into a scalar variable (in this case,
$_) and returns a result code indicating the success of the operation.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
The UDP Protocol 431
If getline() returns false, we know we've encountered the end of file and we exit the loop.
Otherwise, we check whether we got a complete line by looking at the line length returned by
getline(), and if so, remove the terminating end-of-line sequence and call send_mes-
sage() with the message text and a new sequence number.
Lines 33–37: Handle a message on the socket—If the socket is ready to read, then we've re-
ceived a response from the server. We retrieve it by calling the socket's recv() method and
pass the message to our receive_message() subroutine.
Lines 39–41: Handle retries—If @ready is empty, then we have timed out. We call the
do_retries() subroutine to retransmit any requests that are pending.
Lines 42–49: The send_message() subroutine—This subroutine is responsible for transmitting
a request to the server given a unique sequence number and the text of the request. We con-
struct the message using the simple format discussed earlier and send() it to the server.
We then add the request to the %PENDING hash. This subroutine is also called on to retransmit
requests, so rather than setting the TRIES field to 1, we increment it and let Perl take care of
creating the field if it doesn't yet exist.
Lines 50–66: The receive_message() subroutine—This subroutine is responsible for pro-
cessing an incoming response. We begin by parsing the sequence number and the payload. If
it doesn't fit the format, we print a warning and return. Having recovered the response's sequence
number, we check to see whether it is known to the %PENDING hash. If not, this response is
presumably a duplicate. We print a warning and return. We check to see whether the sequence
number of this response is greater than the sequence number of the last one. If not, we print a
warning, but don't take any other action.
If all these checks pass, then we have a valid response. We print it out, remember its sequence
number, and delete the request from the %PENDING hash.
Lines 67–77: The do_retries() subroutine—This subroutine is responsible for retransmitting
pending requests whose responses are late. We loop through the keys of the %PENDING hash
and examine each one's TRIES field. If TRIES is greater than the MAX_RETRIES constant, then
we print a warning that we are giving up on the request and delete it from %PENDING. Otherwise,
we invoke send_message() on the request in order to retransmit it.
To test udp_echo_cli3.pl, I modified the reverse-echo server to make it behave unreliably. The
modification occurs at line 20 of Figure 18.4 and consists of this:
for (1..3) {
$sock->send($msg_out) or die "send(): $!\n" if rand() > 0.7;
}
Instead of sending a single response as before, we now send a variable number of responses using
Perl's rand() function to generate a random coin flip. Sometimes the server sends one response,
sometimes none, and sometimes several.
When we run udp_echo_cli3.pl against this unreliable server, we see output like the following. In
this transcript, the user input is bold, standard error is italic, and the output of the script is roman.
% udp_echo_cli3.pl localhost 2007
hello there
0: retrying...
hello there => ereht olleh
Discarding duplicate message seqno = 0
Discarding duplicate message seqno = 0
this is unreliable communications
1: retrying...
this is unreliable communications => snoitacinummoc elbailernu si siht
but it works anyway
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
The UDP Protocol 432
2: retrying...
but it works anyway => yawyna skrow ti tub
Discarding duplicate message seqno = 2
Discarding duplicate message seqno = 2
Even though some responses were dropped and others were duplicated, the client still managed to
associate the correct response with each request.
A cute thing about this client is that it will work with unmodified UDP echo servers. This is because
we designed the message protocol in such a way that the protocol is correct even if the server just
returns the incoming message without modification.
As written in Figure 18.7, the client is slightly inefficient because we time out can_read(), even
when there's nothing in %PENDING to wait for. We can fix this problem by modifying line 23 of Figure
18.7 to read this way:
my @ready = $select->can_read ( %PENDING ? TIMEOUT : () );
If %PENDING is nonempty, we call can_read() with a timeout. Otherwise, we pass an empty list
for the arguments, causing can_read() to block indefinitely until either the socket or STDIN are
ready to read.
Summary
The UDP protocol is a connectionless, unreliable protocol most suitable for brief, stateless interac-
tions.
A UDP client program creates a UDP socket using socket(), sends messages to the remote host
using send(), and receives incoming messages with recv(). A UDP server creates a socket using
socket(), assigns it to a prearranged port using bind(), awaits incoming requests using
recv(), and sends out responses with send().
Because of UDP's unreliability, messages can be lost, and naively written clients, such as those
that call send() and recv() in a rigid loop, will hang while waiting for the reply to a message that
was never received. One way to handle this problem is with timeouts, but this introduces problems
with duplicate responses. The general solution is to use sequence numbers to track requests and
their responses. This works quite well but complicates the program.
Alternatively, one might not care about occasional dropped messages. The chat server developed
in the next chapters illustrates this principle.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
433
TCP provides reliable connection-oriented network service, but at the cost of some overhead in
setting up and tearing down connections and maintaining the fidelity of the data stream. As we have
seen, there's also programmer overhead: TCP server applications have to go to some lengths to
handle multiple concurrent clients.
Sometimes 100 percent reliability isn't necessary. Perhaps the application can tolerate an occa-
sional dropped or out-of-order packet, or perhaps it can simply retransmit a message that hasn't
been acknowledged. In such cases, UDP offers a simple, lightweight solution.
A Sample Session
Figure 19.1 shows a sample session with the chat client. As always, keyboard input is in a bold font
and output from the program is in normal font.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
UDP Servers 434
We begin by invoking the client with the name of the server to connect to. The program prompts us
for a nickname, logs in, and prints a confirmation message. We then issue the /channels command
to fetch the list of available channels. This client, like certain other command-line chat clients, ex-
pects all commands to begin with the " / " character. Anything else we type is assumed to be a
public message to be transmitted to the current channel. The system replies with the names of five
channels, a brief description, and the number of users that belong to each one (a single user may
be a member of multiple channels at once, so the sum of these numbers may not reflect the total
number of users on the system).
We join the Weather channel using the /join command, at which point we begin to see public mes-
sages from other users, as well as join and departure notifications. We participate briefly in the
conversation and then issue the /users command to view the users who currently belong to the
channel. This command lists users' nicknames, the length of time that they have been on the system,
and the channels that they are subscribed to.
We send a private message to one of the users using the /private command, /join the Hobbies
channel briefly, and finally log out using /quit.
In addition to the commands shown in the example (Figure 19.1), there's also a /part command that
allows one to depart a channel. Otherwise, the list of subscribed channels just grows every time you
join one.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
UDP Servers 435
Event Codes
In all our previous examples, we have passed information between client and server in text form.
For example, in the travesty server, the server's welcome message was the text string "100." How-
ever, some Internet protocols pass command codes and other numeric data in binary form. To
illustrate such systems, the chat server uses binary codes rather than human-readable ones.
In this system, all communication between client and server is via a series of binary messages. Each
message consists of an integer event code packed with a message string. For example, to create
a public message using the SEND_PUBLIC message constant, we call pack() with the format
"na*":
$message = pack("na*",SEND_PUBLIC,"hello, anyone here?");
To retrieve the code and the message string, we call unpack() with the same format:
($code,$data) = unpack("na*",$message);
We use the " n " format to pack the event code in platform-independent "network" byte order. This
ensures that clients and servers can communicate even if their hosts don't share the same byte
order.
The various event codes are defined as constants in a .pm file that is shared between the client and
server source trees. The code for packing and unpacking messages is encapsulated in a module
named ChatObjects::Comm. A brief description of each of the messages is given in Table 19.1.
Table 19.1. Event Codes
Code Argument Description
ERROR <error message> Server reports an error
LOGIN_REQ <nickname> Client requests a login
LOGIN_ACK <nickname> Server acknowledges successful login
LOGOFF <nickname> Client signals a signoff
JOIN_REQ <title> Client requests to join channel <title>
JOIN_ACK <title> <count> Server acknowledges join of channel <title>, currently containing
<count> users
PART_REQ <title> Client requests to depart channel
PART_ACK <title> Server acknowledges departure
SEND_PUBLIC <text> Client sends public message
PUBLIC_MSG <title> <user> User <user> has sent message <text> on channel <title>
<text>
SEND_PRIVATE <user> <text> Client sends private message <text> to user <user>
PRIVATE_MSG <user> <text> User <user> has sent private message <text>
USER_JOINS <channel> <user> User has joined indicated channel
USER_PARTS <channel> <user> User has departed indicated channel
LIST_CHANNELS Client requests a list of all channel titles
CHANNEL_ITEM <channel> <count> Sent in response to a LIST_CHANNELS request
<desc>
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
UDP Servers 436
User Information
The system must maintain a certain amount of state information about each active user: the chan-
nels she has subscribed to, her nickname, her login time, and the address and port her client is
bound to. While this information could be maintained on either the client or the server side, it's
probably better that the server keep track of this information. It reduces the server's dependency on
the client's implementing the chat protocol correctly, and it allows for more server-side features to
be added later. For example, since the server is responsible for subscribing users to a channel, it
is easy to limit the number or type of channels that a user can join. This information is maintained
by objects of class ChatObjects::User.
Channel Information
One other item of information that the server tracks is the list of channels and associated information.
In addition to the title, channels maintain a human-readable description and a list of the users cur-
rently subscribed. This simplifies the task of sending a message to all members of the channel. This
information is maintained by objects of class ChatObjects::Channel.
Concurrency
We assume that each transaction that the server is called upon to handle—logging in a user, sending
a public message, listing channels—can be disposed of rapidly. Therefore, the server has a single-
threaded design that receives and processes messages on a first-come, first-served basis. Mes-
sages come in from users in any order, so the server must keep track of each user's address and
associate it with the proper ChatObjects::User object.
On the other end, the client will be communicating with only one server. However, it needs to process
input from both the server and the user, so uses a simple select() loop to multiplex between the
two sources of input.
The object classes used by the server are designed for subclassing. This enables us to modify the
chat system to take advantage of multicasting in the next chapter.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
UDP Servers 437
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
UDP Servers 438
The client uses two dispatch tables to handle user commands and server events. %COMMANDS dis-
patches on commands typed by the user. Each key is the text of a command (e.g., " join "), and
each value is an anonymous subroutine that is invoked when the command is issued. In most cases,
the subroutine simply sends the appropriate event code to the server. Whenever the user types a
command, the client parses out the command and any optional arguments, and then passes the
command to the dispatch table.
The %MESSAGES global is the corresponding dispatch table for messages received from the server.
It has a similar structure to %COMMANDS, except that the keys are numeric event codes.
Lines 1–7: Import modules—The client turns on strict type checking and brings in the IO::Socket
and IO::Select modules. It then brings in two application-specific modules. ChatObjects::ChatC-
odes contains the numeric constants for server messages, and ChatObjects::Comm defines a
wrapper that packs and unpacks the messages exchanged with the server.
Lines 8–9: Install signal handlers—We want the client to log out politely even if it is killed with
the interrupt key. For this reason we install INT and TERM handlers that call exit() to perform
a clean shutdown. An END{} clause defined at the bottom of the script logs out of the server
before the client shuts down.
We also define two globals. $nickname contains the user's nickname, and $server contains
the ChatObjects::Comm wrapper.
Lines 10–33: Define dispatch tables—These lines create the %COMMANDS and %MESSAGES dis-
patch tables. When the main loop dispatches on a user command, it looks the command up in
the %COMMAND table and calls the anonymous subroutine it finds there, passing it any text that
followed the command on the line. Here is a typical %COMMANDS entry:
join => sub { $server->send_event(JOIN_REQ,shift) },
This is saying that when the user issues the /join command, the client should call the
$server object's send_event() method with an event code of JOIN_REQ and whatever ar-
gument followed the command. In this case, the argument is expected to be the name of a
channel to join.
A typical entry in %MESSAGES is this one:
PUBLIC_MSG() => \&public_msg,
This entry tells the script to invoke the subroutine public_msg() when the event code PUB-
LIC_MSG is received. The parentheses following the PUBLIC_MSG constant are necessary be-
cause otherwise Perl assumes that anything to the left of a => symbol is a string.
When the script dispatches to one of these subroutines, it passes the event code as the first
argument and the message text as the second. Passing the event code allows the same sub-
routine to handle different messages. For example, handling of the USER_JOINS and
USER_PARTS messages, which are sent to notify the client that another user has joined or de-
parted a channel, respectively, is sufficiently similar that it is handled by the same subroutine,
join_part().
Lines 34–37: Create the UDP socket and the server wrapper—We get the server name and port
number from the command line. If they are not given, we choose some defaults. This data is
passed to the ChatObjects::Comm->new() method. When we address this module, we will
see that its new() method is a thin wrapper that takes whatever parameters are passed to it,
adds Proto => 'udp', and passes the arguments to IO::Socket::INET->new().
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
UDP Servers 439
Notice that we are passing the PeerAddr argument to the IO::Socket::INET->new(), caus-
ing IO::Socket to attempt a connect() with the indicated server host. This address will be used
as the destination whenever we call send(), ignoring any destination address that we provide
on the argument list. Recall from Chapter 18 that the other effect of connecting a UDP socket
is to filter out messages sent to the socket from arbitrary hosts. Since the client is going to
exchange messages with one server, both of these behaviors are desirable.
Lines 38–40: Log in—We invoke an internal subroutine named do_login() to prompt the user
to log in and send the appropriate login message to the server. If successful, this subroutine
returns the user's chosen nickname.
Lines 41–53: Dispatch Loop—We'll be reading user commands from standard input and receiv-
ing messages from the server socket. select() lets us watch both handles for incoming data.
We create a new IO::Select object initialized to a set containing both the server socket and
STDIN. The server socket is wrapped inside the ChatObjects::Comm object, so we must retrieve
the handle by calling the object's socket() method.
Each time through the loop we call $select->can_read() to recover those handles that have
data to read. If one of the handles is STDIN, then we invoke the subroutine do_user() to
process user commands. Otherwise, we invoke do_server() to process messages received
on the socket.
Notice that the can_read() method call will indicate that STDIN is ready for reading if the user
happened to close the stream by pressing the end-of-file key. do_user() specifically checks
for the EOF condition and returns false. When this happens, we exit the loop, terminating the
program.
Lines 54–66: Handle user commands—The do_user() subroutine reads commands from
standard input and dispatches on them. Its argument is the \*STDIN glob reference returned
by select(). Because of the bad interactions between select() and standard I/O buffering,
we don't use the angle-bracket operator to read from STDIN. Instead, we use sysread() to
fetch the longest plausible line from standard input and assume that it will correspond to a line
of input. This is a valid assumption provided that the user is typing at a terminal. If we wanted
to take commands from a file or pipe, we would use the IO::Getline wrapper from Chapter 13.
Each command is parsed into a command and its argument. Any command that doesn't begin
with a " / " is assumed to be a public message to send to the current channel. Internally we treat
this as a command named " public " and use the entire command line as its arguments.
We look up the command in the %COMMANDS dispatch table, and if it isn't found, we issue an
error message. Otherwise, we invoke the returned subroutine, passing it the command argu-
ments, if any. Most commands end up sending a message to the server by calling the global
$server object's send_event() method.
Lines 67–75: Handle server messages—>The do_server() method is called to handle an
incoming message from the server. The argument it receives from the select() loop is the
socket handle. We don't want to work with the socket directly, so we call the static method
sock2server() in the ChatObjects::Comm module in order to retrieve the corresponding
ChatObjects::Comm object.
We call the ChatObjects::Comm object's recv_event() method to receive a message from
the server and parse it into an event code and data. We use the code to look up a handler in the
%MESSAGES dispatch table. If one is found, we invoke it. Otherwise, we print a warning. After
invoking the subroutine, do_server() returns the event code as its function result.1
1It would be simpler to use the global $server object directly here, but this indirect method bears dividends in the multicast version of the
chat system developed in Chapter 21.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
UDP Servers 440
Lines 76–88: Log in—The do_login() subroutine first sends a LOGOFF event if the $nick-
name global is already defined. It then prompts the user for a login name by calling the
get_nickname() subroutine, and sends a LOGIN_REQ message to the server.
The subroutine now waits for a LOGIN_ACK from the server. It is possible for either the request
or the acknowledgment to get lost in transit, so do_login() repeats the login several times,
each time using select() with a 6-second timeout to wait for a response. If no LOGIN_ACK is
received after five tries, do_login() gives up.
Lines 89–158: Handle server events—Most of the remainder of the client consists of subroutines
that handle server events. Each of them parses the server event data (when need be) and prints
a message for the user. A typical example is the list_channel() subroutine, which is called
when the client receives a CHANNEL_ITEM message carrying information about a chat channel
that the user can join. The event data in this case consists of the channel title, a count of the
users subscribed to it, and a brief description of the channel's topic. The subroutine converts
this information into a nicely formatted table entry and prints it to standard output.
Licensed by
Notice that the event code is provided as the first argument to list_channel() and similar
routines. This allows some subroutines to handle similar messages, such as the
join_part() subroutine, which handles both JOIN_ACK and PART_ACK messages.
Lines 159–164: Log out and clean up—Because there's no connection involved, the server can't
Stjepan Maric
tell that a user has gone offline unless the client explicitly tells it so. The script ends with an
END{} block that is executed just before the program terminates. It sends a LOGOFF event to
the server and closes the socket.
Notice that with the exception of the login message, the client in Figure 19.2 doesn't retransmit
4218908
messages or explicitly wait for particular responses. Because this is an interactive application, we
rely on the user to notice that the occasional command didn't "take" and reissue it. Nor do we mind
if an occasional public message doesn't get through.
If necessary, we could add reliability to each outgoing message by retransmitting it until we receive
an acknowledgment from the server. The do_login() subroutine illustrates a simple way to do
this. Of course, this raises the risk of sending the server duplicate messages in the event that the
original message got through and it was the acknowledgment that was lost in transit. However,
duplicate messages don't matter to the server, because actions such as joining a channel have no
ill effect if repeated.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
UDP Servers 441
Lines 1–5: Bring in required modules—We turn on strict type checking and bring in the Carp and
IO::Socket modules. We also define a package global, %SERVERS, that will be used to do the
reverse association between an IO::Socket object and the ChatObjects::Comm object that wraps
it.
Lines 6–10: Object constructor—The new() method creates and initializes a new ChatOb-
jects::Comm object. We call another method, create_socket(), to create the appropriate
socket object, and wrap it in a blessed hash. Before returning the new object, we remember it
in the %SERVERS global.
Line 11: The create_socket() method—This method returns an appropriately initialized
IO::Socket::INET object. We call IO::Socket::INET->new() with a Proto argument of "udp"
and any other arguments that were passed to us.
Line 12: Look up a ChatObjects::Comm object based on its socket—The sock2server() class
method uses %SERVERS to look up a ChatObjects::Comm object based on its IO::Socket object.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
UDP Servers 442
Line 13: Look up a socket based on a ChatObjects::Comm object—The socket() method does
exactly the opposite, returning the IO::Socket object corresponding to a ChatObjects::Comm
object.
Lines 14–18: Close the socket—The close() method closes the socket and deletes the Cha-
tObjects::Comm object from %SERVERS.
Lines 19–29: Send an event—The client can use the send_event() method to send a com-
mand to the server, or the server can use it to send an event code to the client. It takes three
arguments containing the event code, the event data, and the destination address. The sub-
routine invokes pack() to pack the event code and data into the binary form used by the protocol
and sends it down the socket using send(). If a destination address is provided, we use the
four-argument form of send(). Otherwise, we assume that the socket has had a default desti-
nation assigned using connect(), and call the three-argument form of send(). Since
send() is the last call in the subroutine, its result code is implicitly returned by send_event().
Lines 30–36: Receive an event—The recv_event() function calls recv() to retrieve an event
from the server. The event is unpacked into the event code and data, and these values are
returned along with the peer address.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
UDP Servers 443
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
UDP Servers 444
ChatObjects::Channel is a small class that keeps track of each channel. It maintains the channel's
name and description, as well as the list of subscribers. The subscriber list is used in broadcasting
public messages and notifying members when a user enters or leaves the channel.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
UDP Servers 445
Lines 1–8: Load modules—The program begins by loading various ChatObjects modules, in-
cluding ChatObjects::ChatCodes, ChatObjects::Comm, and ChatObjects::User. It also defines
a DEBUG constant that can be set to a true value to turn on debug messages.
Lines 9–14: Define channels—We now create five channels by invoking the ChatOb-
jects::Channel->new() method. The method takes two arguments corresponding to the
channel title and description.
Lines 15–24: Create the dispatch table—We define a dispatch table, named %DISPATCH, similar
to the ones used in the client application. Each key in the table is a numeric event code, and
each value is the name of a ChatObject::User method. With the exception of the initial login, all
interaction with the remote user goes through a ChatObjects::User object, so it makes sense to
dispatch to method calls rather than to anonymous subroutines, as we did in the client.
Here is a typical entry in the dispatch table:
SEND_PUBLIC() => 'send_public',
This is interpreted to mean that whenever a client sends us a SEND_PUBLIC message, we will
call the corresponding ChatObject::User object's send_public() method.
Lines 25–28: Create a new ChatObjects:: Comm object—We get the port from the command
line and use it to initialize a new ChatObjects::Comm object with the arguments LocalPort=>
$port. Internally this creates a UDP protocol IO::Socket object bound to the desired port. Unlike
in the client code, in the server we do not specify a peer host or port to connect with, because
this would disable our ability to receive messages from multiple hosts.
Lines 29–32: Process incoming messages, handle login requests—The main server loop calls
the ChatObject::Server object's recv_event() repeatedly. This method calls recv() on the
underlying socket, parses the message, and returns the event code, the event message, and
the packed address of the client that sent the message.
Login requests receive special treatment because there isn't yet a ChatObjects::User object
associated with the client's address. If the event code is LOGIN_REQ, then we pass the address,
the event text, and our ChatObjects::Comm object to a do_login() subroutine. It will create
a new ChatObjects::User object and send the client a LOGIN_ACK.
Lines 33–35: Look up the user—Any other event code must be from a user who has logged in
earlier. We call the class method ChatObjects::User->lookup_byaddr() to find a Cha-
tObjects::User object that is associated with the client's address. If there isn't one, it means that
the client hasn't logged in, and we issue an error message by sending an event of type ERROR.
Lines 36–39: Handle event—If we were successful in identifying the user corresponding to the
client address, we look up the event code in the dispatch table and treat it as a method call on
the user object. The event data, if any, is passed to the method to deal with as appropriate. If
the event code is unrecognized, we complain by issuing an ERROR event. In either case, we're
finished processing the transaction, so we loop back and wait for another incoming request.
Lines 40–45: Handle logins—The do_login() subroutine is called to handle new user regis-
tration. It receives the peer's packed address, the ChatObjects::Comm object, and the
LOGIN_REQ event data, which contains the nickname that the user desires to register under.
It is certainly possible for two users to request the same nickname. We check for this eventuality
by calling the ChatObjects::User class method lookup_byname(). If there is already a user
registered under this name, then we issue an error. Otherwise, we invoke ChatOb-
jects::User->new() to create a new user object.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
UDP Servers 446
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
UDP Servers 447
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
UDP Servers 448
The set of enrolled channels is implemented as an array. Although the user may belong to multiple
channels, one of those channels is special because it receives all public messages that the user
sends out. In this implementation, the current channel is the first element in the array; it is always
the channel that the user subscribed to most recently.
Lines 1–4: Bring in required modules—The module turns on strict type checking and brings in
the ChatObjects::ChatCodes and Socket modules.
Lines 5–6: Overload the quote operator—One of Perl's nicer features is the ability to overload
certain operators so that a method call is invoked automatically. In the case of the ChatOb-
jects::User class, it would be nice if the object were replaced with the user's nickname whenever
the object is used in a string context. This would allow the string " Your name is $user " to
interpolate automatically to " Your name is rufus " rather than to " Your name is Cha-
tObjects::User=HASH(0x82b81b0). "
We use the overload pragma to implement this feature, telling Perl to interpolate the object
into double-quoted strings by calling its nickname() method and to fall back to the default
behavior for all other operators.
Lines 7–9: Set up package globals—The module needs to look up registered users in two ways:
by their nicknames and by the addresses of their clients. Two in-memory globals keep track of
users. The %NICKNAMES hash indexes the user objects by the users' nicknames.
%ADDRESSES, in contrast, indexes the objects by the packed addresses of their clients. Initially
these hashes are empty.
Lines 10–22: The new() method—The new() method creates new ChatObjects::User objects.
It is passed three arguments: the packed address of the user's client, the user's nickname, and
a ChatObjects::Comm object to use in sending messages to the user. We store these attributes
into a blessed hash, along with a record of the user's login time and an empty anonymous array.
This array will eventually contain the list of channels that the user belongs to.
Having created the object, we invoke the server object's send_event() method to return a
LOGIN_ACK message to the user, being sure to use the three-argument form of
send_event() so that the message goes to the correct client. We then stash the new object
into the %NICKNAMES and %ADDRESSES hashes and return the object to the caller.
There turns out to be a slight trick required to make the %ADDRESSES hash work properly. Oc-
casionally Perl's recv() call returns a packed socket address that contains extraneous junk in
the unused fields of the underlying C data structure. This junk is ignored by the send() call and
is discarded when sockaddr_in() is used to unpack the address into its port and IP address
components.
The problem arises when comparing two addresses returned by recv() for equality, because
differences in the junk data may cause the addresses to appear to be different, when in fact they
share the same port numbers and IP addresses. To avoid this issue, we call a utility subroutine
named key(), which turns the packed address into a reliable key containing the port number
and IP address.
Lines 23–32: Look up objects by name and address—The lookup_byname() and
lookup_byaddr() methods are class methods that are called to retrieve ChatObjects::User
objects based on the nickname of the user and her client's address, respectively. These methods
work by indexing into %NICKNAMES and %ADDRESSES. For the reasons already explained, we
must pass the packed address to key() in order to turn it into a reliable value that can be used
for indexing. The users() method returns a list of all currently logged-in users.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
UDP Servers 449
Lines 33–38: Various accessors—The next block of code provides access to user data. The
address(), nickname(), timeon(), and channels() methods return the user's address,
nickname, login time, and channel set. current_channel() returns the channel that the user
subscribed to most recently.
Lines 39–43: Send an event to the user—The ChatObjects::User send() method is a conven-
ience method that accepts an event code and the event data and passes that to the ChatOb-
ject::Server object's send_event() method. The third argument to send_event() is the us-
er's stored address to be used as the destination for the datagram that carries the event.
Lines 44–50: Handle user logout—When the user logs out, the logout() method is invoked.
This method removes the user from all subscribed channels and then deletes the object from
the %NICKNAMES and %ADDRESSES hashes. These actions remove all memory references to
the object and cause Perl to destroy the object and reclaim its space.
Lines 51–65: The join() method—The join() method is invoked when the user has re-
quested to join a channel. It is passed the title of the channel.
The join() method begins by looking up the selected channel object using the ChatOb-
jects::Channel lookup() method. If no channel with the indicated name is identified, we issue
an error event by calling our send() method. Otherwise, we call our channels() method to
retrieve the current list of channels that the user is enrolled in. If we are not already enrolled in
the channel, we call the channel object's add() method to notify other users that we are joining
the channel. If we already belong to the channel, we delete it from its current position in the
channels array so that it will be moved to the top of the list in the next part of the code. We make
the channel object current by making it the first element of the channels array, and send the
client a JOIN_ACK event.
Lines 66–80: The part() method—The part() method is called when a user is departing a
channel; it is similar to join() in structure and calling conventions.
If the user indeed belongs to the selected channel, we call the corresponding channel object's
remove() method to notify other users that the user is leaving. We then remove the channel
from the channels array and send the user a PART_ACK event. The removed channel may have
been the current channel, in which case we issue a JOIN_ACK for the new current channel, if
any.
Lines 81–89: Send a public message—The send_public() method handles the PUB-
LIC_MSG event. It takes a line of text, looks up the current channel, and calls the channel's
message() method. If there is no current channel, indicating that the user is not enrolled in any
channel, then we return an error message.
Lines 90–101: Send a private message—The send_private() method handles a request to
send a private message to a user. We receive the data from a PRIVATE_MSG event and parse
it into the recipient's nickname and the message text. We then call our lookup_byname()
method to turn the nickname into a user object. If no one by that name is registered, we issue
an error message. Otherwise, we call the user object's send() method to transmit a PRI-
VATE_MSG event directly to the user.
This method takes advantage of the fact that user objects call nickname() automatically when
interpolated into strings. This is the result of overloading the double-quote operator at the be-
ginning of the module.
Lines 102–111: List users enrolled in the current channel—The list_users() method gen-
erates and transmits a series of USER_ITEM events to the client. Each event contains informa-
tion about users enrolled in the current channel (including the present user).
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
UDP Servers 450
We begin by recovering the current channel. If none is defined (because the user is enrolled in
no channels at all), we send an ERROR event. Otherwise, we retrieve all the users on the current
channel by calling its users() method, and transmit a USER_ITEM event containing the user
nickname, the length of time the user has been registered with the system (measured in sec-
onds), and a space-delimited list of the channels the user is enrolled in.
Like the user class, ChatObjects::Channel overloads the double-quoted operator so that its
title() method is called when the object is interpolated into double-quoted strings. This allows
us to use the object reference directly in the data passed to send().
Lines 112–115: Listchannels— list_channels() returns a list of the available channels by
sending the user a series of CHANNEL_ITEM events. It calls the ChatObjects::Channel class's
channels() method to retrieve the list of all channels, and incorporates each channel into a
CHANNEL_ITEM event. The event contains the information returned by the channel objects'
info() method. In the current implementation, this consists of the channel title, the number of
Licensed by
enrolled users, and the human-readable description of the channel.
Line 116–118: Turn a packed client address into a hash key—As previously explained, the sys-
tem recv() call can return random junk in the unused parts of the socket address structure,
complicating the comparison of client addresses. The key() method normalizes the address
into a string suitable for use as a hash key by unpacking the address with sockaddr_in() and
Stjepan Maric
then rejoining the host address and port with a " : " character. Two packets sent from the same
host and socket will have identical keys.
Because we have a method named join(), we must qualify the built-in function of the same
name as CORE::join() in order to avoid the ambiguity.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
UDP Servers 451
Lines 1–3: Bring in modules—The module begins by loading the ChatObjects::User and Cha-
tObjects::ChatCodes modules.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
UDP Servers 452
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
UDP Servers 453
Modifications to ChatObjects::ChatCodes
The modifications to ChatObjects::ChatCodes are minimal. We simply define a new STILL_HERE
constant and add it to the @EXPORTS list:
@EXPORT = qw(
ERROR
LOGIN_REQ LOGIN_ACK
...
STILL_HERE
);
...use constant USER_ITEM => 190;
use constant STILL_HERE => 200;
1;
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
UDP Servers 454
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
UDP Servers 455
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
UDP Servers 456
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
UDP Servers 457
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
UDP Servers 458
Summary
The UDP protocol is ideal for lightweight message-oriented servers that do not require a high degree
of reliability. The Internet chat system described in this chapter is a good example of such an ap-
plication.
Although the chat system is fully functional, it lacks many features. For one thing, the system doesn't
provide a way to notify you when a specific user logs in to the system (called a "hot list" in some
systems). This feature would be straightforward to add. Another deficiency of the system is that it
doesn't provide anything in the way of long-term user registration and authentication. Anyone can
log in using any nickname, and as soon as the system is killed and restarted, all information on
registered users is lost. The only consistency check performed by the system is to prevent two
concurrent users from choosing the same nickname.
To support user authentication and persistent registration, you would have to add some sort of
database backend to the system. Implementations could range in complexity from simple DBM files
to sophisticated relational databases.
Last, several real-world chat systems provide Internet "relay" functionality. Instead of burdening a
single chat server with the responsibility of managing all registered users, relay systems distribute
the load among multiple servers. Messages and other events posted to one server are relayed to
the other servers so that they can broadcast the event to their users. You could add this feature to
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
UDP Servers 459
the current implementation by having each server log in to the other servers as if it were a client.
When a server receives an event from another server, it simply relays it to all its users, which might
include a mixture of users and other servers. However, you'd have to write code to prevent events
being ping-ponged in a never-ending loop.
Another way to reduce the burden on the chat server is to replace the current user-at-a-time method
of sending events to a channel's enrollees with a system that sends the event to all enrollees with
a single system call. This is the topic of Chapter 21.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
460
In this chapter we look at one of the advanced features of the UDP protocol—its ability to address
messages to more than one recipient via broadcasting. This chapter introduces this technology and
develops a tool that makes it easier to work with from Perl. We end by enhancing the Internet chat
Licensed by
client from Chapter 19 to allow it to locate a server at runtime using broadcasts.
Stjepan Maric
Consider an application in which information must be sent to many clients simultaneously. An In-
ternet teleconferencing system is one example. Another is a server that sends out periodic time
synchronization signals. You could implement such a system using conventional network protocols
in a couple of ways:
4218908
1. Using TCP —Accept incoming connections from clients that wish to subscribe to the service,
and create a connected socket for each one. Call syswrite() on each socket every time
you need to send information.
2. Using UDP —Accept incoming messages from clients and add each client's IP address and
port number to a list of subscribers. Each time we want to send information, we iterate over
each client's destination address and call send() on the socket, just like we did in the chat
server in Chapter 19 (Figure 19.5).
Both these solutions are known as "unicasting" because each transmitted message is addressed
to a single destination. To send identical messages to more than one destination, we have to call
syswrite() or send() multiple times. Although unicast approaches are effective in many cases,
they have a number of disadvantages:
1. Unicast is inefficient for large networks. —In unicast applications, it may be necessary to
transmit multiple copies of the same information across the local area network and its routers.
In a video-streaming application, for example, the same frame of video may have to be re-
transmitted thousands of times.
2. The destination must be known in advance. —By definition, to send a unicast message the
sender must know the address of the recipient. However, there are a handful of cases in which
it is impossible to know the recipient's address in advance. For example, in the Dynamic Host
Configuration Protocol (DHCP), a newly booted computer must contact a server to obtain its
name and IP address. However, in a classic chicken-and-egg problem, the client doesn't know
the server's IP address in advance, and the server can't send a unicast message back to it
unless it has an IP address.
3. Unicast doesn't allow anonymity. —A corollary of (2) is that a host receiving unicast messages
can't be anonymous. The peer needs its socket and IP address in order to get messages to
it. However, there are many applications, including the video-streaming application that we
have been discussing, in which it is neither necessary nor desirable for the server to know
which clients are receiving the video stream.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Broadcasting 461
Broadcasting and multicasting (the next chapter's topic) break out of the unicast paradigm by al-
lowing a single message transmitted by a host to be delivered to multiple addresses. The server
does not need to maintain multiple sockets or to call send() or syswrite() several times. Each
message is placed on the local area network only once, and distributed to the other machines on
the LAN in a way designed to minimize the burden on the network.
Furthermore, broadcasting and multicasting allow "resource discovery," a process that allows one
host to contact another without knowing its address in advance. This same feature enables anon-
ymous listeners to receive messages without making their presence known to the sender.
Broadcasting Explained
Broadcasting is an old technology that dates back to the earliest versions of TCP/IP. It is a nonse-
lective form of the UDP protocol in which messages placed on the local subnet are received and
processed by each host on the network. Because broadcasting is gregarious, it is strictly limited to
the local subnet. Unless deliberately configured otherwise, routers refuse to forward broadcast
packets across subnet boundaries.
Broadcasting is implemented using a special IP address known as the "broadcast address." As
explained in Chapter 3, the broadcast address is an IP address whose host part is replaced by all
ones. For example, for the class C network 192.168.3.124, the host part of the address is the last
byte, making its broadcast address 192.168.3.255. Strictly speaking, this is known as the "subnet
directed broadcast address," because the address is specific to the subnetwork. There are several
other types of broadcast addresses, the only one of which still regularly being used is the "all-ones"
broadcast address, 255.255.255.255. We will discuss this address later.
To broadcast a message, an application sends out a UDP datagram directed toward a network port
and the broadcast address for the network. The message will be distributed to all hosts on the local
network and picked up by any broadcast-capable network cards (Figure 20.1). The message is then
passed up to the operating system, which checks whether some process has bound to the port that
the message is addressed to. If there is such a socket, the message is handed off to the program
that owns it. Otherwise, the message is discarded.
Figure 20.1. Broadcast packets are received by all hosts on the local Subnet and either passed to a listening application or
discarded
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Broadcasting 462
Broadcasting is indiscriminate. All broadcast-capable interface cards attached to the network re-
ceive broadcast packets and pass them up to the operating system for processing. This is in contrast
to unicast packets, which are ordinarily filtered by the card and never reach the operating system.
Thus, excessive use of broadcasting can have a performance impact on all locally connected hosts
because it forces the operating system to examine and dispose of each irrelevant packet.
Broadcast Applications
Despite its limitations, broadcasting is extremely useful. Network broadcasts are used in the fol-
lowing categories of application:
• Resource Discovery—Broadcasts are frequently used when you know that there is a server out
there somewhere but you don't know its IP address in advance. For example, the DHCP uses
broadcasts to locate a DHCP server and to retrieve network configuration information for a client
that is booting. Similarly, the Network Information System (NIS) clients use broadcasts to locate
an appropriate NIS server on the local network.
• Route Information—Routers must exchange information in order to maintain their internal routing
tables in a consistent state. Some routing protocols use periodic broadcasts to advertise routes
and to advise other routers of changes in the network topology.
• Time Information—The Network Time Protocol (NTP) can be configured so that a central time
server periodically broadcasts the time across the LAN. This allows interested hosts to syn-
chronize their internal clocks to the millisecond.
Broadcasting is a core part of the IPv4 protocol and is available on any operating system that sup-
ports TCP/IP networking.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Broadcasting 463
A total of 26 hosts responded to the single ping packet, including the machine I was pinging from.
The machines that replied to the ping include Windows 98 laptops, a laser printer, some Linux
workstations, and servers from Sun and Compaq. Every machine on the subnet received the ping
packet, and each responded as if it had been pinged individually. Although one of the machines that
responded was a router (143.48.31.254), it did not forward the broadcast. We did not see replies
from machines outside the subnet.
Interestingly, the host I ran the ping on did not respond via its network interface of 143.48.31.42,
but on its loopback interface, 127.0.0.1. This illustrates the fact that the operating system is free to
choose the most efficient route to a destination and is not limited to responding to messages on the
same interface that it received them on.
Sending Broadcasts
There are four simple steps to sending broadcast packets:
1. Create a UDP socket. —Create a UDP socket in the normal way, either by using Perl's built-
in socket() function or with the IO::Socket module.
2. Set the socket's SO_BROADCAST option. —The designers of the socket API wanted to add
some protection against programs inadvertently transmitting to the broadcast address, so they
required that the SO_BROADCAST socket option be set to true before a socket can be used for
broadcasting. Use either the built-in setsockopt() call or the IO::Socket unified sock-
opt() method.
3. Discover the broadcast address for your subnet (optional). —The broadcast address is dif-
ferent from location to location. You could just hard code the appropriate address for your
subnet (or ask the user to enter it at runtime). For portability, however, you might want to
discover the appropriate broadcast address programatically. We discuss how to do this later.
4. Call send() to send data to the broadcast address. —Use sockaddr_in() to create a
packed destination address with the broadcast address and the port of your choosing. Pass
the packed address to send() to broadcast the message throughout the subnet.
Figure 20.2 shows a simple echo client based on the multiplexing client from Chapter 18. It reads
user input from STDIN and broadcasts the data to a hard-coded broadcast address. As responses
come in, it prints the IP address and port number of each respondent and the length of the data
received back.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Broadcasting 464
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Broadcasting 465
% broadcast_echo_cli.pl 143.48.31.255
hi there
received 9 bytes from 143.48.31.42:7
received 9 bytes from 143.48.31.36:7
received 9 bytes from 143.48.31.34:7
received 9 bytes from 143.48.31.32:7
received 9 bytes from 143.48.31.40:7
received 9 bytes from 143.48.31.60:7
received 9 bytes from 143.48.31.33:7
received 9 bytes from 143.48.31.31:7
received 9 bytes from 143.48.31.39:7
received 9 bytes from 143.48.31.35:7
received 9 bytes from 143.48.31.38:7
received 9 bytes from 143.48.31.37:7
this works
received 11 bytes from 143.48.31.42:7
received 11 bytes from 143.48.31.34:7
received 11 bytes from 143.48.31.32:7
received 11 bytes from 143.48.31.36:7
received 11 bytes from 143.48.31.35:7
received 11 bytes from 143.48.31.33:7
received 11 bytes from 143.48.31.31:7
received 11 bytes from 143.48.31.38:7
received 11 bytes from 143.48.31.37:7
received 11 bytes from 143.48.31.39:7
received 11 bytes from 143.48.31.40:7
received 11 bytes from 143.48.31.60:7
If you run this example program, replace the address on the command line with the broadcast
address suitable for your network. Each time the client broadcasts a message, it receives a dozen
responses, each corresponding to an echo server running on a machine in the local subnet. As it
happens, the machine that I ran the client program on (143.48.31.42) also runs an echo server, so
it is also one of the machines to respond. Broadcast packets always loop back in this way.
The echo service is commonly active on UNIX systems, and in fact all the responses seen here
correspond to various UNIX and Linux hosts on my office network. The Windows machines and the
laser printer that responded to the ping test do not run the echo server, so they didn't respond.
Receiving Broadcasts
In contrast to sending broadcast messages, you do not need to do anything special to receive them.
Any of the UDP servers used as examples in this book, including the earliest ones from Chapter
18, respond to messages directed to the broadcast address. In fact, without resorting to very-low-
level tricks, it is impossible to distinguish between UDP messages directed to your program via the
broadcast address and those directed to its unicast address.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Broadcasting 466
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Broadcasting 467
A brief example will make this clearer. Say we already know that we have an Ethernet interface
named tu0 (this corresponds to a "Tulip" 100bT Ethernet interface on a Digital Tru64 UNIX box).
We can fetch its broadcast address with the following fragment of code:
my $ifreq = pack('Z16 x16','tu0');
ioctl($sock,SIOCGIFBRDADDR,$ifreq);
my ($name,$family,$addr) = unpack('Z16 s x2 a4',$ifreq);
print "broadcast = ",inet_ntoa($addr),"\n";
We pack the name of the interface card into an ifreq structure, and pass it to ioctl() using the
SIOCGIFBRDADDR function code. ioctl() returns its result in $ifreq, which we unpack and
display. In order for this to work, we need to know the value of the SIOCGIFBRDADDR constant and
the magic formats to use for packing and unpacking the ifreq structure. We discuss the source of
this information in the next section.
Table 20.2. ioctl() Function Codes for Fetching Interface Information
Code Argument Description
SIOCGIFCONF ifconf Fetch list of interfaces.
SIOCGIFADDR ifreq Get IP address of interface.
SIOCGIFBRDADDR ifreq Get broadcast address of interface.
SIOCGIFNETMASK ifreq Get netmask of interface.
SIOCGIRDSTADDR ifreq Get destination address of a point-to-point interface.
SIOCGIFHWADDR ifreq Get hardware address of interface.
SIOCGIFFLAGS ifreq Get attributes of interface.
You can pass any open socket to the interface-related ioctl calls, even one that you created for
another purpose. In a typical broadcast application, you would create an unconnected UDP socket,
query it for the broadcast addresses, and then call send() on the socket to initiate the broadcast.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Broadcasting 468
@interface_names = $socket->if_list
Returns the list of interface names as an array of strings. Only interfaces whose drivers are loaded will be
returned.
$flags = $socket->if_flags($interface_name)
Returns the flags for the interface named by $interface_name. The flags are a bitmask of various attributes
that indicate, among other things, whether the interface is running and whether it is broadcast capable. Table
20.1 lists the most common flags.
$addr = $socket->if_addr($interface_name)
Returns the (unicast) IP address for the specified interface in dotted-quad form.
$broadcast_addr = $socket->if_broadcast($interface_name)
Returns the broadcast address for the specified interface in dotted-quad form. For interfaces that can't broad-
cast, returns undef.
$netmask = $socket->if_netmask($interface_name) Returns the netmask for the specified interface.
$dstaddr = $socket->if_dstaddr($interface_name)
For point-to-point interfaces, such as those using SLIP or PPP, returns the IP address of the remote end of
the connection. For interfaces that are not point-to-point, returns undef.
$hwaddr = $socket->if_hwaddr($interface_name)
Returns the interface's 6-byte Ethernet hardware address in the following form: aa:bb:cc:dd:ee:ff. Many
operating systems do not support the underlying ioctl() function code, in which case if_hwaddr() returns
undef. Also returns undef for non-Ethernet interfaces.
Loading IO::Interface with the import tag :flags imports a set of constants to use with the bitmask
returned by if_flags(). You can AND these constants with the flags in order to discover whether
an interface supports a particular attribute.
This fully functional example shows how to discover whether the Ethernet interface tu0 is up and
running:
#!/usr/bin/perl
use IO::Socket;
use IO::Interface ':flags';
my $socket = IO::Socket::INET->new(Proto=>'udp') or die;
my $flags = $socket->if_flags('tu0') or die;
print $flags & IFF_UP ? "Interface is up\n" : "Interface is down\n";
And here is our desired function to determine the host's subnet-directed broadcast address(es) at
runtime. It takes a socket as argument, calls if_list() to get the list of all interfaces, queries each
one in turn to find those that are broadcast capable, and then calls if_broadcast() to get the
address itself. The function returns a list of all valid broadcast addresses in dotted-quad form.
sub get_broadcast_addr {
my $sock = shift;
my @baddr;
for my $if ($sock->if_list) {
next unless $sock->if_flags($if) & IFF_BROADCAST;
push @baddr,$sock->if_broadcast($if);
}
return @baddr;
}
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Broadcasting 469
Another feature of IO::Interface allows you to use it in a function-oriented fashion. If you load IO::In-
terface with the import tag :functions, it imports the methods just described into the caller in the
form of function calls. This allows you to use the calls with ordinary socket handles if you prefer:
use Socket;
use IO::Interface ':functions';
socket(Sock,AF_INET,SOCK_DGRAM,scalar getprotobyname('udp'));
@interfaces = if_list(\*SOCK);
IO::Interface Walkthrough
Before we walk through IO::Interface, there is a major caveat. The ioctl() function codes vary
tremendously from operating system to operating system and are variously defined in the system
header files net/if.h, sys/socket.h, and sys/sockio.h. Before you can use IO::Interface, you must
convert the system header files into Perl .ph files using the h2ph tool described in Chapter 17 (Im-
plementing sockatmark()). However, as you recall, h2ph is far from perfect and the generated
files usually need hand tweaking before they will compile and load correctly.1
As a practical alternative to this implementation of IO::Interface, I strongly recommend using a C-
language extension by the same name that I developed during the course of researching this chap-
ter. Provided that your operating system has a C or C++ compiler, you can download this module
from CPAN and install it with little trouble. In addition to providing all the functionality of the pure-
Perl implementation, the C extension has the ability to change interface settings. For example, you
can use the module to change the IP address assigned to an Ethernet card. You will find this module
on CPAN.2
Nevertheless, it is educational to walk through the pure-Perl version of IO::Interface to get a feel for
how to write an interface to a fairly low-level part of the operating system. Figure 20.3 shows the
code for the module.
1In one case, I had to comment out a subroutine inexplicably named__foo_bar() in order to get the .ph file to load; in another, I deleted
several functions that appeared to be defined in terms of themselves!
2Another CPAN module, Net::Interface, also provides this functionality, but does not seem to be maintained and won't compile under recent
versions of Perl.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Broadcasting 470
Licensed by
Stjepan Maric
4218908
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Broadcasting 471
Lines 1–21: Set up the module—The first third of the module is Perl paperwork. We bring in the
Exporter module, declare exported variables, and create the module's export tags. The only non-
boilerplate part of the module is this line:
use Config;
which brings in Perl's Config module. This module exports a hash named %Config, which con-
tains information on a variety of architecture-specific data types, including the size of pointers
and integers. We need this information to figure out the formats to pack and unpack the data
type, including the size of pointers and integers. We need this information to figure out the for-
mats pack and unpack the data passed to the interface ioctls.
Lines 22–28: Bring in socket and interface libraries—We load the Socket library and import its
inet_ntoa() function. The reason for using require rather than use to load the module is
to avoid a number of irritating warnings that result from prototype conflicts between constants
defined in the .ph files and the same constants loaded from Socket.
require Socket;
Socket->import('inet_ntoa');
We now load the .ph files that contain constants for dealing with network interfaces. The net/
if.ph file contains definitions for the data structures used by the calls to ioctl(). We need it
chiefly to get at constants that determine the size of these structures. Next we load the sys/
ioctl.ph and sys/sockio.ph files. On some systems, the interface function codes are all defined
in the first file, but on others, you must load both files. We load the first file, check to see whether
we've gotten the SIOCGIFCONF function code; if not, we proceed to load the second.
require "net/if.ph";
require "sys/ioctl.ph";
require "sys/sockio.ph" unless defined &SIOCGIFCONF;
We now take advantage of a little-known feature of Perl's .ph system. Many ioctl() function
contain embedded codes the size of the data structures they operate on. When the C compiler
evaluates the include files, it is able to determine the size of these structures at compile time
and generate the correct constants; but Perl knows nothing about C data structures and needs
some help from the programmer to tell it their sizes.
This is what the %sizeof hash is for. Whenever a .ph file needs the size of a data structure, it
indexes into this hash. For example, it calls $sizeof{'int'} when it needs the size of an
integer, and $sizeof{'struct ifreq'} to fetch the size of the ifreq structure. To get the
right values for SIOCGIFCONF and friends, we must set up %sizeof before calling any of the
ioctl() function codes. This is done here:
%sizeof = ('struct ifconf' => 2 * $Config{ptrsize},
'struct ifreq' => 2 * IFNAMSIZ);
As it happens, there are only two C data structures that we need to worry about: the ifreq
structure, which contains information about a particular interface, and the ifconf structure,
which is used to fetch the list of all running interfaces. The ifconf structure is the simpler of
the two. It consists of an integer and a pointer. The pointer designates a region of memory to
receive the list of interface names, and the integer indicates the size of the region.
The sizes of integers and pointers vary from architecture to architecture, but we can determine
them at runtime using Perl's %Config array. Naively, you might guess the size of struct
ifconf to be the size of an integer plus the size of a pointer—but you'd be wrong on some
occasions. Most architectures have alignment constraints that force pointers to begin at memory
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Broadcasting 472
locations that are even multiples of the pointer size. If the integer and pointer sizes are not the
same (as is the case on some 64-bit systems), the C compiler will place padding after the integer
in order to align the pointer at its natural boundary. This means that the size of the ifconf
structure ends up being two pointers' worth, or 2 * $Config{ptrsize}.
The ifreq structure is a C "union," meaning that the same region of memory is used in several
ways depending on the context. The first half of the data structure holds the name of the interface
and is defined to be IFNAMSIZ bytes long (16 bytes in most implementations). The second half
variously contains
• a socket address, consisting of a 2-byte address family and the 4-byte IP address separated
by 2 bytes of padding.
• an Ethernet hardware address, consisting of a 2-byte address family and up to 6 bytes of
hardware address.
• interface flags, consisting of 2 bytes of flag data.
• the name of a "slave" interface, used in various load-balancing schemes (that we won't get
into here). The slave name is again IFNAMSIZ bytes long.
A C union is as large as necessary to hold all its variants. In this case, this is IFNAMSIZ repeated
twice, or 2 * IFNAMSIZ. With the %sizeof hash initialized correctly, the ioctl() function
codes evaluate to their proper values.
As an aside, it took me several iterations and several small test C programs to figure all this out.
Although I was happy to get it to work, the fact that it required an intimate knowledge of the
internal workings of the C compiler is disappointing.
Lines 29–34: Define pack() and unpack() formats for the ioctls—We will be moving data in
and out of the ifreq structure using the pack() and unpack() functions. We now define
formats for each of the ifreq variants we will use. We turn the IFNAMSIZ and IFHWADDR-
LEN constants into variables that we can use in double-quoted strings. Not all operating systems
define IFHWADDRLEN, in which case we default to the size of the Ethernet hardware address.
IFREQ_NAME is used for packing the interface name into the structure. It consists of a string
IFNAMSIZ bytes long. If the string doesn't fill the available space, it is null-padded using the Z
format. The bottom half of the data structure is initialized with IFNAMSIZ bytes of nulls using
the x format.
IFREQ_ADDR is used to retrieve interface IP addresses of various types. It consists of the in-
terface name, a 2-byte integer containing address family, 2 bytes of padding, and a 4-byte char-
acter string corresponding to the IP address.
IFREQ_ETHER is used to unpack the Ethernet address. In this variant, ifreq contains the in-
terface name, a 2-byte integer containing the address family (which is usually AF_UNSPECI-
FIED), and 6 unsigned bytes of address information.
IFREQ_FLAG is the simplest. It consists of the interface name followed by a short integer con-
taining the interface flags.
Lines 35–38: Attach the IO::Interface methods to IO::Socket—The next bit of code exports the
various subroutines defined in IO::Interface to the IO::Socket namespace, turning them into
methods that can be used with IO::Socket objects. For each of the functions defined in the
@functions global, we do an assignment like this one:
*{"IO\:\:Socket\:\:if_addr"} = \&if_addr;
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Broadcasting 473
This idiom is an obscure way to create an alias in another module's namespace corresponding
to a function defined in the current one. It is documented, albeit scantily, in the perlsub and
perlref manual pages and is the basic operation performed by Exporter module. We need to
do this for a whole list of functions, so we have a little loop that creates the namespace aliases
on each function in turn:
{
no strict 'refs';
*{"IO\:\:Socket\:\:$_"} = \&$_ foreach @functions;
}
no strict 'refs' temporarily turns off the checks that prevent this type of namespace ma-
nipulation.
Lines 39–45: Get the interface address—At last we're ready to do real work. The if_addr()
function takes a socket and the name of an interface and returns the interface's IP address as
a dotted-quad string. We begin by creating a packed ifreq structure containing the name of
the requested interface. The IFREQ_NAME format packs the name into the first IFNAMSIZ bytes
of the structure, and zeroes out the rest of it. We now call ioctl() using the SIOCGIFADDR
command, passing the socket and the newly created ifreq structure. If the ioctl() fails for
some reason, we return undef.
Otherwise, ioctl() has returned the requested information in the ifreq structure. We unpack
$ifreq using the IFREQ_ADDR format. This returns the name of the interface, its address fam-
ily, and the address itself. We ignore the other values, but turn the address into dotted-quad
form using inet_ntoa() and return it to the caller.
Lines 46–77: Fetch broadcast address, destination address, hardware address, and netmask
—The next few functions are very similar, but instead of asking the operating system for the
address of the interface, they use different ioctl() function codes to fetch the broadcast ad-
dress, point-to-point destination address, hardware address, and netmask.
We perform a little bit of extra work in several of these routines in order to prevent the function
from returning the address 0.0.0.0, a behavior I discovered on some Linux systems when query-
ing interfaces that don't support a particular type of addressing (for example, asking the loopback
interface for its broadcast address). It's better to return undef when there is no address than to
return a nonsensical one.
Lines 78–84: Return interface flags—The if_flags() function initializes an ifreq structure
with the name of the desired interface, and passes it to ioctl() with the SIOCGIFFLAGS
command. If successful, we unpack the result using the IFREQ_FLAGS format and return the
flags to the caller.
Lines 85–100: Fetch list of all interfaces—The if_list() function, which returns all active
network interfaces, is the most complex of the bunch. We will create a packed ifconf structure
consisting of a pointer to a buffer and the buffer length. The buffer is initially empty (filled with
zeros) but will be populated after the ioctl() call with an array of ifreq structures, each
containing the name of a different interface.
We'll need to make a data structure large enough to hold as many interfaces as we're likely to see.
We create a local variable filled with zeroes that is large enough to hold information on 20 interfaces.
We need a format to pack the ifconf structure. Because of alignment constraints, this format will
be different on machines whose pointers are 32 bits and those with 64-bit architectures. In the first
case, the format is simply "ip", for an integer followed by a pointer. In the second case, the format
is "ix4p", for an integer, 4 bytes of padding, and a pointer.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Broadcasting 474
We create the ifconf structure by invoking pack() with the buffer length and the buffer itself. The
" p " format takes care of incorporating the memory address of the buffer into the packed structure.
We now call ioctl() with the SIOCGIFCONF function code to populate the buffer with the interface
names. In case of failure, we return undef.
Otherwise, we unpack $ifclist to recover the buffer size. This tells us how much of the buffer
the operating system used to store its results. We now step through the buffer, calling substr()
to extract one ifreq segment after another. For each segment, we unpack the interface name and
stuff it into a hash, using the IFREQ_NAME format defined earlier.
After the loop is finished, we return the sorted keys of the hash. I added this step to the process
after I discovered that some operating systems return the same interface multiple times in response
to the SIOCGIFCONF request. Stuffing the interface names into a hash forces the list to contain only
unique names.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Broadcasting 475
Line 8: Load the IO::Interface module—We will use IO::Interface to derive the client's subnet-
directed broadcast address(es), so we load the module, importing the interface flag constants
at the same time.
Line 37: No default server address—In previous incarnations of this client, we defaulted to
localhost if the chat host was not specified on the command line. In this version, we assume no
default, using an empty string for the server name if none was specified on the command line.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Broadcasting 476
Lines 40–41: Call find_server() to search for a server—If no server address was specified
on the command line, we call a new internal subroutine find_server() to locate one. If
find_server() returns undef, we die.
Lines 64–85: Find a server via broadcasts—All the interesting work is in the find_server()
subroutine. It begins by creating a new UDP socket. This socket happens to be distinct from the
one that will ultimately be used to communicate with the server, but there's no reason they can't
be the same. After creating the socket, we set its SO_BROADCAST option to a true value so that
we can broadcast over it.
We now look for network interfaces to broadcast on. We get the list of interfaces by calling the
socket's if_list() method and loop over them, looking for those that have the IFF_BROAD-
CAST option set in their interface flags. For each broadcast-capable interface, we fetch its broadcast
IP address, create a packed target address using the specified chat server port number, and send
a message to it.
It doesn't matter what message we send to the server, because we care only whether a server
responds at all. In this case, we send a message containing binary 0 in network order. Since this
corresponds to none of the chat messages defined in the ChatCodes package, we expect the server
to respond with a message code of ERROR. A more formal way to do this would be to define explicit
messages that client and server could exchange for this purpose, but that would have required
changes at the server end as well.
The client has broadcast the request to all its attached broadcast-capable interfaces, and now it
must wait for responses. We use IO::Select to wait for up to 3 seconds for incoming messages. If
no response is received before the timeout, we return undef. Otherwise, we read the first message,
unpack it, and see if it contains the expected ERROR code from the chat server (if not, it may indicate
that some other type of server is listening on the port). We now return the address of the sender by
calling sockaddr_in() to unpack the peer name returned from recv(), and inet_ntoa() to
turn the address into human-readable dotted-quad form.
If two or more chat servers received the broadcast, the client binds to the first one. The responses
sent by other servers are discarded along with the socket when the subroutine goes out of scope.
When we run the modified chat client on a host that is attached to two networks, we see the client
send broadcast packets to both networks. After a short interval, the client receives a response from
a server on one of the networks and selects it. The remainder of the chat session proceeds as usual.
% broadcast_chat_client.pl
Broadcasting for a server on 192.168.3.255
Broadcasting for a server on 192.168.8.255
Found a server at 192.168.3.2
Your nickname: lincoln
trying to log in (1)...
Log in successful. Welcome lincoln.
Summary
Broadcasting is a powerful technique for discovering resources on the local area network. Sending
broadcasts is simple, provided that you know the correct subnet-directed broadcast IP address to
use. If not, you can determine it at runtime using the IO::Interface module (either the pure-Perl
version developed here or the C extension module available from CPAN).
Receiving broadcasts is even easier. Any datagram-based server will receive broadcasts without
any overt action on the programmer's part.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Broadcasting 477
Broadcasting has some important limitations. It is useful only in the local area subnetwork because
routers will not forward broadcast packets. Broadcasting is not selective. A host machine cannot
opt out of receiving broadcasts any more than a TV antennae can opt out of receiving television
broadcasts. The operating system receives and processes every broadcast sent to it, even those
that no user-level application is interested in reading. For this reason, avoid overuse of broadcasts.
The way around these limitations is to use multicasting, to which we turn in the next chapter.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
478
In the previous chapter, we discussed using broadcasting to transmit a UDP message to all hosts
on the local area network. The examples in that chapter revealed two of broadcasting's greatest
limitations: the fact that it cannot be routed beyond the local subnet and its inability to be targeted
to selected hosts. Broadcasting is strictly an all-or-nothing affair and works only across the local
subnetwork.
This chapter discusses multicasting, a newer technology designed specifically for streaming video,
audio, and conferencing applications. Unlike broadcasting, multicast messages are routable; that
is, they can be transmitted across subnet boundaries or even across the Internet. Furthermore,
multicasting gives you great flexibility in selecting which hosts will receive particular messages. A
single multicast message created by a host will be cleverly replicated by routers as needed, and
delivered to a single recipient, or a dozen, or thousands.
This chapter describes multicasting, how it works, and how to use it in your applications. As a prac-
tical example, we use multicasting to reimplement the chat server from Chapter 19.
Multicast Basics
Multicasting relies on a series of reserved IP addresses in the upper end of the IP address space
between addresses 224.0.0.0 and 239.255.255.255. When a packet is sent to one of these ad-
dresses, it is not routed in the normal way to a single machine, but instead is distributed through the
network to all machines that have registered their interest in receiving transmissions on that address.
These IP addresses are known in the multicasting world as "groups" because each address refers
to a group of machines.
In effect, multicast groups act much like mailing lists. A process joins one or more groups, and the
multicasting system makes sure that copies of the messages directed to the group are routed to
each member of the group. Later, the process can drop its membership, and the incoming messages
will cease.
Like all other TCP/IP applications, multicasting uses the combination of port number and address
to find the correct program to deliver a packet to. Before a socket can receive a multicast message,
it must bind to a port just as a socket in a conventional unicast server application must do. This
means that the same multicast group can be used for different applications (or different components
of the same application) so long as everyone agrees in advance on which ports to use. For example,
multicast address 226.0.1.8 can be used to receive a video stream on port 1908 and simultaneously
to run an interactive whiteboard application on port 2455.
There are more than 26 million multicast addresses in the reserved range and 65,536 port numbers,
which gives the Internet about 17 trillion channels to use in multicasting. However, the number of
multicast groups that a single socket can join simultaneously is usually limited by the operating
systems to about 20.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Multicasting 479
A variety of applications that require one-to-many connectivity use multicasting. Examples of mul-
ticast applications for which source code is available include VIC, a videoconferencing system from
Lawrence Berkeley National Laboratory; RAT, an audio streaming system from University College
London; and WB, a networked whiteboard system also from Lawrence Berkeley. In addition, the
network time protocol daemon, xntpd, can be configured to multicast the current time throughout
the LAN. Used in conjunction with LAN-wide multicast routing, this allows one to synchronize all the
machines in an organization to a single network time signal. You can find these and a large number
of other Open Source multicast-related tools at http://www-mice.cs.ucl.ac.uk/multimedia/software/.
Like broadcasting, current implementation of TCP/IP multicasting is compatible with only the UDP
protocol. A number of active research projects are addressing the need for a reliable connection-
oriented multicasting facility. Multicasting is discussed in RFCs 1112, 2236, 1458, and others listed
among the references of Appendix D.
224.0.0.1 is the "all-hosts" group. A message sent to this address is transmitted to all the hosts on
the local area network, but is not forwarded by any multicast routers. Thus the all-hosts group is the
multicast equivalent of the broadcast address.
224.0.0.2 is the "all-routers" group. All multicast-capable routers are required to join this group at
startup time.
Other addresses in this range are reserved for the use of specific router types. For example,
224.0.0.4 is the "all DVMRP routers" group, joined by routers using the DVMRP protocol. 224.0.0.5
is reserved for OSPF routers, 224.0.0.9 for RIP2 routers, and so on.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Multicasting 480
Other addresses in multicast space are reserved for well-known applications, and although some
of them are not much used, you're advised to avoid them. You'll find a more comprehensive list of
well-known addresses at http://www.isi.cdn/in-notes/iana/assignments/mnln-castaddresses. Three
large blocks of multicast addresses are unassigned and are safe for you to use for development:
• 224.3.0.0–224.251.255.255 (16,318,464 addresses)
• 225.0.0.0–231.255.255.255 (117,440,512 addresses)
• 233.0.0.0–238.255.255.255 (100,663,296 addresses)
Licensed by
packets to the operating system, which then delivers them to the correct application (Figure 21.1).
Stjepan Maric
4218908
Figure 21.1. Multicast packets are filtered by the interface card and passed through routers
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Multicasting 481
The filtering performed by the interface card is "imperfect" because it uses a hashing scheme to
choose which packets to accept. This scheme occasionally allows some irrelevant packets (those
bound for groups the host has not joined) through as well. However, any irrelevant packets are
discarded by the operating system in a second "perfect software filtering" step. Hence, multicasting
is not as efficient as unicasting, in which the network card perfectly filters out packets bound for
irrelevant IP addresses; but it is much more efficient than broadcasting, in which the card exercises
no discrimination.
From the application programming standpoint, you do not have to worry about multicast hardware
filtering, except to know that heavy use of multicasting would not have the same impact on your
network that a similar level of broadcasting would.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Multicasting 482
Multicast TTLs
Since multicast messages can be routed, you need a way to control how far they can go. You
wouldn't want a whiteboard application intended for interdepartmental conferences in your organi-
zation to be multicast across the Internet.
Multicasting uses a simple but effective technique to control the scope of messages. Each packet
contains a time-to-live (TTL) field that is set to an arbitrary positive integer between 1 and 255. Every
time the packet crosses a router, its TTL is decremented by 1. When the TTL reaches 0, the packet
is discarded.
By default, multicast packets have a TTL of 1, meaning that they won't be routed across subnets.
As soon as they hit the first router, their TTL reaches 0 and they expire. To arrange for a packet to
be forwarded, set its TTL to a higher value. In general, a packet can cross TTL-1 routers.
To provide finer control over routing of multicast packets, an organization can assign "threshold"
values to each outgoing interface of a multicast router. The router will forward the packet only if its
TTL matches or exceeds the threshold. To illustrate this, consider the hypothetical company in
Figure 21.2. It has three departments, each of which is large enough to contain several subnets.
Each department's subnets are connected with a departmental router (labeled A, B, and C), and
the departments are interconnected via the central interdepartmental router "D." Router D also acts
as the gateway to the Internet. Each departmental router uses the default threshold of 1 on the
subnet interfaces, but a threshold of 3 on the interface that connects it to the central router. Similarly,
the central router has a threshold of 31 on the interface that connects it to the Internet. This setup
allows the scope of a packet to be precisely controlled by its TTL. Packets with TTLs between 1 and
3 are forwarded within a department's subnets, but can't travel to other departments because they
don't meet the threshold criterion of 3 required to be forwarded beyond the departmental router.
Packets with TTLs between 4 and 32 can travel among the departments, but won't be forwarded to
the Internet. The router threshold values control the scope of multicast applications, preventing
applications intended for use only within a subnet, department, or organization from spilling over
into places they're not wanted.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Multicasting 483
Figure 21.2. The thresholds on routers' outgoing interfaces control how far multicast messages can propagate.
Table 21.2 lists common TTL thresholds and their associated scopes. These values are conven-
tions, and the exact definitions of "site," "organization," and "department" are up to you to determine.
Table 21.2. Conventional TTL Thresholds
TTL Scope
0 Restricted to the same host
1 Restricted to the same subnet
<32 Restricted to the same site, organization, or department
<64 Restricted to the same region
<128 Restricted to the same continent
<255 Unrestricted in scope; global
Using Multicast
The remainder of this chapter shows you how to use multicasting in Perl applications.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Multicasting 484
% ping 224.0.0.1
PING 224.0.0.1 (224.0.0.1): 56 data bytes
64 bytes from 127.0.0.1: icmp_seq=0 ttl=64 time=1.0 ms
64 bytes from 143.48.31.47: icmp_seq=0 ttl=255 time=1.2 ms (DUP!)
64 bytes from 143.48.31.46: icmp_seq=0 ttl=255 time=1.3 ms (DUP!)
64 bytes from 143.48.31.55: icmp_seq=0 ttl=255 time=1.5 ms (DUP!)
64 bytes from 143.48.31.43: icmp_seq=0 ttl=255 time=1.7 ms (DUP!)
64 bytes from 143.48.31.61: icmp_seq=0 ttl=255 time=1.9 ms (DUP!)
64 bytes from 143.48.31.33: icmp_seq=0 ttl=255 time=2.1 ms (DUP!)
64 bytes from 143.48.31.36: icmp_seq=0 ttl=255 time=2.2 ms (DUP!)
64 bytes from 143.48.31.45: icmp_seq=0 ttl=255 time=2.8 ms (DUP!)
64 bytes from 143.48.31.32: icmp_seq=0 ttl=255 time=3.0 ms (DUP!)
64 bytes from 143.48.31.48: icmp_seq=0 ttl=255 time=3.1 ms (DUP!)
64 bytes from 143.48.31.58: icmp_seq=0 ttl=255 time=3.3 ms (DUP!)
64 bytes from 143.48.31.57: icmp_seq=0 ttl=255 time=3.5 ms (DUP!)
64 bytes from 143.48.31.40: icmp_seq=0 ttl=255 time=3.7 ms (DUP!)
64 bytes from 143.48.31.39: icmp_seq=0 ttl=255 time=3.9 ms (DUP!)
64 bytes from 143.48.31.31: icmp_seq=0 ttl=255 time=4.0 ms (DUP!)
64 bytes from 143.48.31.34: icmp_seq=0 ttl=255 time=4.5 ms (DUP!)
64 bytes from 143.48.31.37: icmp_seq=0 ttl=255 time=4.7 ms (DUP!)
64 bytes from 143.48.31.38: icmp_seq=0 ttl=255 time=4.9 ms (DUP!)
64 bytes from 143.48.31.41: icmp_seq=0 ttl=64 time=5.1 ms (DUP!)
64 bytes from 143.48.31.35: icmp_seq=0 ttl=255 time=5.2 ms (DUP!)
As in the earlier broadcast example, a variety of machines responded to the ping, including the
loopback device (127.0.0.1) and a mixture of UNIX and Windows machines. Unlike the broadcast
example, two laser printers on the subnetwork did not respond to the multicast call, presumably
because they are not multicast capable. Similarly, we could ping 224.0.0.2, the all-routers group, to
discover all multicast-capable routers on the LAN, 224.0.0.4 to discover all DVMRP routers, and so
forth.
For a Perl script to send a multicast message, it has only to create a UDP socket and send to the
desired group address. To illustrate this, we can use the broadcast echo client from the previous
chapter (Figure 20.2) to discover all multicast-capable hosts on the local subnet that are running an
echo server. The program doesn't need modification; instead of giving the broadcast address as
the command-line argument, we just use the address for the all-hosts group:
broadcast_echo_cli.pl 224.0.0.1
hi there
received 9 bytes from 143.48.31.42:7
received 9 bytes from 143.48.31.30:7
received 9 bytes from 143.48.31.40:7
Interestingly, the list of servers that respond to the echo client is much smaller than it was for either
the multicast ping test or the broadcast ping test of the previous chapter. After some investigation,
the difference turned out to be nine Solaris machines whose kernels were not configured for multi-
casting. Apparently there was sufficient low-level multicasting code built into the kernel of these
machines to allow them to respond to ICMP ping messages, but not to higher-level multicasts.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Multicasting 485
Because the IO::Socket->sockopt() method assumes the SOL_SOCKET level, you cannot use
it for multicast options. However, you can use IO::Socket's setsockopt() and getsockopt()
methods, which are just thin wrappers around the underlying Perl function calls.
The multicast option constants are defined in the system header file netinet/in.h. To get access to
the proper values for your operating system, you must use the h2ph tool to convert the system
header files.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Multicasting 486
The outgoing multicast interface (set by IP_MULTICAST_IF) is not tied in any way to the interface
used to receive multicast packets. You can send multicast packets from one network interface and
receive them on another.
IP_DROP_MEMBERSHIP—Leave a multicast group, terminating membership in the group. The
argument is identical to the one used by IP_ADD_MEMBERSHIP.
As with IP_MULTICAST_IF and the other options discussed earlier, the IP_ADD_MEMBERSHIP
and IP_DROP_MEMBERSHIP options apply to the IP layer, so you must pass setsockopt() an
option level equal to the IP protocol number returned by getprotobyname().
These two options may sound more complicated than they are. The only tricky part is creating the
ip_mreq argument to pass to setsockopt(). You can do by passing the group address to
inet_aton() and then concatenating the result with the INADDR_ANY constant. This code snippet
shows how to join a multicast group, in this case the one with address 225.1.1.3:
my $mcast_addr = inet_aton('225.1.1.3');
my $local_addr = INADDR_ANY;
my $ip_mreq = $mcast_addr . $local_addr;
my $ip_level = getprotobyname('IP') or die "Can't get protocol: $!";
setsockopt($sock,$ip_level,IP_ADD_MEMBERSHIP,$ip_mreq)
or die "Can't join group: $!";
You drop membership in a group in the same way, using the IP_DROP_MEMBERSHIP constant. You
do not have to drop membership in all groups before exiting the program. The operating system will
take care of this for you when the socket is destroyed.
Oddly, there is no way to ask the operating system what multicast groups a socket is a member of.
You have to keep track of this yourself.
$socket->mcast_ttl([$ttl])
Get or set the socket's multicast time to live. If you provide an integer argument, it will be used to set the TTL
and the method returns true if the attempt was successful. Without an argument, mcast_ttl() returns the
current value of the TTL.
$socket->mcast_loopback([$boolean])
Get or set the loopback property on outgoing multicast packets. Provide a true value to enable loopbacking,
false to inhibit it. The method returns true if it was successful. Without an argument, the method returns the
current loopback setting.
$socket->mcast_if([$if])
Get or set the interface for outgoing multicast packets. For your convenience, you can use either the interface
device name, such as eth0, or its dotted-quad interface address. The method returns true if the attempt to set
the interface was successful. Without any argument, it returns the current interface, or if no interface is set, it
returns undef (in which case the operating system chooses an appropriate interface automatically).
$socket->mcast_add($multicast_group [,$interface])
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Multicasting 487
Join a multicast group, allowing the socket to receive messages multicast to that group.
Specify the group address using dotted-quad form (e.g., "225.0.0.3"). The optional second argument allows
you to tell the operating system which network interface to use to receive the group. If not specified, the OS
listens on all multicast-capable interfaces. For your convenience, you can use either the interface device name
or the interface address.
This method returns true if the group was successfully added; otherwise, it returns false. In case of failure,
$! contains additional information.
You may call mcast_add() more than once in order to join multiple groups.
$socket->mcast_drop($multicast_group [,$interface])
Drop membership in a multicast group, disabling the socket's reception of messages to that group. Specify the
group using its dotted-quad address. If you specified the interface in mcast_add(), you must again specify
that interface when leaving the group.1 You may use either a device name or an IP address to specify the
interface.
This method returns true if the group was added successfully, and false in case of an error, such as dropping
a group to which the socket does not already belong.
Figure 21.3 contains the complete code for the IO::Socket:: Multicast module. We'll walk through
the relevant bits.
1This behavior varies somewhat among operating systems. With some, if you omit the interface, the operating system drops the first
matching multicast group it finds. With others, the interface argument to mcast_add() must exactly match mcast_drop().
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Multicasting 488
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Multicasting 489
Lines 1–7: Module setup—The first part of the module consists of boilerplate module declara-
tions and bookkeeping. Among other things, we bring in the IO::Interface module developed in
the previous chapter and declare this module a subclass of IO::Socket::INET.
Lines 8–12: Bring in Socket and .ph definitions—We next load functions from the Socket module
and from netinet/in.ph. As in the IO::Interface module, to avoid clashing prototype warnings from
duplicate functions defined in the .ph file, we call the Socket module's import() method man-
ually. netinet/in.ph contains definitions for the various IFF_MULTICAST socket options.
We call getprotobyname() to retrieve the IP protocol number for use with setsockopt()
and getsockopt(). If the protocol number isn't available for some reason, we default to 0,
which is a common value for this constant.
Lines 13–22: The new() and configure() methods—We override the IO::Socket::INET
new() and configure() methods so as to make UDP the default protocol if no Proto argument
is given explicitly.
Lines 23–29: mcast_add()—The mcast_add() method receives the socket, the multicast
group address, and the optional local interface to receive on. If an interface is specified, the
method calls the internal function get_if_addr() to deal appropriately with the alternative
ways that the interface can be specified. If no interface is specified, then get_if_addr() re-
turns "0.0.0.0", the dotted-quad form of the INADDR_ANY wildcard address.
We then build an ip_mreq structure by concatenating the binary forms of the group and local
IP address, and pass this to setsockopt() with a socket level of $IP_LEVEL and a command
of IP_ADD_MEMBERSHIP.
Lines 30–36: mcast_drop() —This method contains the same code as mcast_add(), except
that at the very end it calls setsockopt() with a command of IP_DROP_MEMBERSHIP.
Lines 37–47: mcast_if() —This method assigns or retrieves the interface for outgoing multi-
cast messages. If the caller has specified an interface, we turn it into an address by calling
get_if_addr(), translate it into its packed binary version using inet_aton(), and call
setsockopt() with the IP_MULTICAST_IF command.
For retrieving the interface, things are slightly more complicated because of buggy behavior
under the Linux operating system, where getsockopt() returns a 12-byte ip_mreqn structure
rather than the expected 4-byte packed IP address of the interface (I found this out by examining
the kernel source code). The desired information resides in the second field of this structure,
beginning at byte number 4. We test the length of the getsockopt() result, and if it is larger
than 4, we extract the address using substr(). We then call an internal routine named
find_interface() to turn this IP address into an interface device name.
Lines 48–56: The mcast_loopback () method—The mcast_loopback() method is more
straightforward. If a second argument is supplied, it calls setsockopt() with a command of
IP_MULTICAST_LOOP and an argument of 1 to turn loopback on and 0 to turn loopback off. If
no argument is supplied, then the method calls getsockopt() to retrieve the loopback setting.
getsockopt() returns the setting as a packed binary string, so we convert it into a human-
readable number by unpacking it using the " I " (unsigned integer) format.
Lines 57–65: mcast_ttl()—The mcast_ttl() method gets or sets the TTL on outgoing
multicast messages. If a TTL value is specified, we pack it into a binary integer with the " I "
format and pass it to setsockopt() with the IP_MULTICAST_TTL command. If no value is
passed, we reverse the process.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Multicasting 490
Lines 66–75: get_if_addr() function—The last two functions are used internally.
get_if_addr() allows the caller to specify network interfaces using either a dotted IP address
or the device name. The function takes two arguments consisting of the socket and the interface.
If the interface argument is empty, then the function returns "0.0.0.0," which is the dotted-quad
equivalent of the INADDR_ANY wildcard. If the interface looks like a dotted-quad address by
pattern match, then the function returns it unmodified.
Otherwise, we assume that the argument is a device name. We call the socket's if_addr()
method (created by IO::Interface) to retrieve the corresponding interface address. If this is un-
successful, we die with an error message. As a consistency check, we call the if_flags()
method to confirm that the interface is multicast-capable; if it is not, we die. Otherwise, we return
the interface address.
Lines 76–82: find_interface() function—The last function performs the reverse of
get_if_addr(), returning the interface device name corresponding to an IP address. It re-
trieves the list of device names by calling the socket's if_list() method (defined in IO::In-
Licensed by
terface) and loops over them until it finds the one with the desired IP address.
Stjepan Maric
We'll look at two example multicast applications. One is a simple time-of-day server, which inter-
mittently broadcasts the current time to whoever is interested. The other is a reworking of Chapter
19's chat system.
4218908
Time-of-Day Multicasting Server
The first example application is a server that intermittently transmits its hostname and the time of
day to a predetermined port and multicast address. Client applications that wish to receive these
time-of-day messages join the group and echo what they receive to standard output. You might use
something like this to monitor the status of your organization's servers; if a server stops sending
status messages, it might be an early warning that it had gone offline.
Thanks to the IO::Socket::Multicast module, both client and server applications are less than 25
lines of code. We'll look at the server first (Figure 21.4).
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Multicasting 491
Lines 1–4: Load modules—We load the IO::Socket and IO::Socket::Multicast modules. We also
bring in the Sys::Hostname module, a standard part of the Perl distribution that allows you to
determine the hostname in a OS-independent way.
Lines 5–8: Get arguments—We choose an interval of 15 seconds between transmissions. We
then read the port, multicast group address, and the TTL for transmissions from the command
line; if they're not defined, we assume reasonable defaults. For the port, we arbitrarily choose
2070. For the multicast group, we choose 224.225.226.227, one of the many unassigned groups.
For TTL, we choose 31, which, by convention, is an organization-wide scope (messages will
stay within the organization but will not be forwarded to the outside world).
Lines 9–12: Set up socket—We create a new multicasting UDP socket by calling
IO::Socket::Multicast->new() and set the multicast TTL for outgoing messages by call-
ing the socket's mcast_ttl() method.
Lines 13–16: Prepare to transmit messages—We create a packed destination address using
inet_aton() and sockaddr_in(>), using the multicast address and port specified on the
command line. We also retrieve the name of the host and store it in a variable for later use.
Lines 17–24: Main loop—The server now enters its main loop. We want to transmit on even
multiples of PERIOD seconds, so we use the % operator to compute the modulus of time() over
PERIOD. If we are at an even multiple of PERIOD, then we create a status message consisting
of the local time followed by a slash and the hostname, producing this type of format:
Mon May 29 19:05:15 2000/pesto.cshl.org
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Multicasting 492
We send a copy of the message to the socket using send() with the multicast destination set
up previously. After transmitting the message, we sleep for 1 second and loop again.
Lines 1–3: Load modules—We bring in IO::Socket and IO::Socket::Multicast modules as before.
Lines 4–5: Retrieve command-line arguments—We fetch the port and multicast address from
the command line. If these arguments are not provided, we default to the values used by the
server.
Lines 7–10: Set up socket—Next we set up the socket we'll use for receiving multicast messages.
We create a UDP socket using IO::Socket::Multicast->new, passing the LocalPort ar-
gument to bind() the socket to the desired port. The newly created socket is now ready to
receive unicast messages directed to that port, but not multicasts. To enable reception of group
messages, we call mcast_add() with the specified multicast group address.
Lines 11–16: Client main loop—The remainder of the client is a simple loop that calls recv()
to receive messages on the socket. We unpack the sender's address using sockaddr_in()
and print the address and the message body to standard output.
To test the client, I ran the server on several machines on my LAN, and the client on my desktop
system. The client's output over a period of 45 seconds was this (blank lines have been inserted
between intervals to aid readability):
% time_of_day_cli.pl
143.48.31.66: Wed Aug 23 13:31:00 2000/swiss
143.48.31.45: Wed Aug 23 13:31:00 2000/feta.cshl.org
143.48.31.54: Wed Aug 23 10:31:00 2000/pesto
143.48.31.47: Wed Aug 23 13:31:00 2000/turunmaa.cshl.org
143.48.31.43: Wed Aug 23 13:31:00 2000/romano.cshl.org
143.48.31.69: Wed Aug 23 13:31:00 2000/munster.cshl.org
143.48.31.63: Wed Aug 23 13:31:00 2000/whey.cshl.org
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Multicasting 493
All the machines on my office network are supposed to have their internal clocks synchronized by
the network time protocol. The fact that " pesto " is off by several hours relative to the others
suggests that something is wrong with this machine's time-zone setting. The example client was
unexpectedly useful in identifying a problem.
Another thing to notice is that we don't see a transmission from edam.cshl.org in the first group but
transmissions from it appear later. It may have missed a time interval (the sleep() function is only
accurate to plus or minus 1 second), or the multicast message from that machine may have been
lost. Multicast messages, like other UDP messages, are unreliable.
Given a message code and message body, send_to_all() looks up each registered user and
sends it a copy of the message. The socket transmission is done by a ChatObjects::User object,
which maintains a copy of the client's address and port number.
The weakness of this system is that if there are a great many registered users, the server sends out
an equally large number of UDP packets, loading its local network and routers. This system can
probably scale to support thousands of registered users, but not tens of thousands (depending on
how "chatty" they are).
In the reimplemented version, we'll replace the server's send_to_all() method with a version
that looks like this:
sub send_to_all {
my $self = shift;
my ($code,$text) = @_;
my $dest = $self->mcast_dest;
my $comm = $self->comm;
$comm->send_event($code,$text,$dest) || warn $!;
}
Instead of looking up each client and sending it a unicast message, we make one call to the com-
munication object's send_event() method, using as the destination a multicast group address.
We'll go over the details of this method when we walk through the code.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Multicasting 494
Let's look at the revised chat protocol from the client's point of view. In the original version of this
system, the client did all its communication via a single UDP socket permanently assigned to the
server. In the new version, we alter this paradigm:
1. The client creates a socket for communicating with the server. This is the same as the original
application. One socket will be used for all messages sent by the client to the server; we'll call
this the control socket.
2. The client creates a second socket for receiving multicasts. When the client logs in, the server
responds with two messages, one acknowledging successful login and the other providing the
port number on which to listen for multicasts. The client responds by creating a second socket
and binding it to the indicated port. The client now select()s over the multicast socket as
well as over standard input and the control socket.
3. The client adds multicast groups to subscribe to channels. There is a one-to-one correspond-
ence between chat channels and multicast groups. When the client subscribes to a new chat
channel, the server responds with an acknowledgment that contains the multicast group ad-
dress on which public messages to that group will be transmitted. The client adds the group
to the socket using mcast_add().
4. The client drops multicast groups to depart channels. The client calls mcast_drop() when
it wants to depart a channel.
5. The client sends public messages as before. To send a public message, the client sends it to
the server and the server retransmits it as a multicast. Therefore, the client code for sending
a public message is unchanged from the original version.
From the server's point of view, the following changes are needed:
1. The server has both a port and a multicast port. In addition to the port used to receive control
messages from clients, the server is configured with a port used for its multicast messages.
This could have been the same as the control port, but it was cleaner to keep the two distinct.
2. The multicast port is sent to the client at login time. We need a new message to send to the
client at login time to tell it what port to use for receiving multicasts.
3. Each chat channel has a multicast group address. Each chat group has a distinct multicast
address. To send a message to all members of a channel, the server looks up its correspond-
ing group address and sends a single message to that address.
A feature of this design is that the client sends public messages to the server using conventional
unicasting, and the server retransmits the message to members of the channel via multicast. A
reasonable alternative would be to make the client responsible for sending public messages directly
to the relevant multicast address. Either architecture would work, and both would achieve the main
goal of avoiding congestion on the server's side of the connection.
I chose the first architecture for two reasons. First, I wanted to avoid too radical a rewriting of the
client, which would have been necessary if the burden of keeping track of which channels the user
belonged to had been shifted to the client side. Second, I wanted to leave the way open for the
server to exercise editorial control over the clients' content. Many chat systems have a "muzzling"
function that allows the server administrator to silence a user who is becoming abusive. Because
all public messages are forced to pass through the server, it would be possible to add this feature
later. A final consideration is the TTL on outgoing multicasts, which could have different meanings
on different clients' networks. Having the server issue all the multicasts enforces uniformity on the
scope of public messages.
We'll walk through the server first, and then the client. The first change is very minor (Figure 21.6).
We add a new event code constant named SET_MCAST_PORT to ChatObjects::ChatCodes. This is
the message sent by the server to the client to tell it what port to bind to in order to receive multicast
transmissions.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Multicasting 495
Next we look at the server script (Figure 21.7). It is very similar to the original version, so we'll just
go over the parts that are different.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Multicasting 496
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Multicasting 497
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Multicasting 498
Lines 1–6: Load modules—We tell Perl that ChatObjects::MComm is a subclass of ChatOb-
jects::Comm and load ChatObjects::Comm and IO::Socket. We also load IO::Socket::Multicast
so as to have access to the various mcast_ methods.
Lines 7–15: Override new() method—We replace ChatObjects::Comm->new() with a new
version. We begin this version by invoking the parent class's new() method to construct the
control socket. When this is done, we remember the multicast port argument in the object hash
and set the TTL on outgoing messages by calling mcast_ttl() on the control socket.
Line 16: The create_socket() method—We override our parent's create_socket()
method with one that creates a suitable IO::Socket::Multicast object, rather than
IO::Socket::INET.
Line 17: The mport() method—This new method looks up the multicast port in the object hash
and returns it.
Lines 18–23: The mcast_event() method—This new method is responsible for sending an
event message, given the event code, the event text, and the multicast destination address. We
use sockaddr_in() to create a suitable packed destination address using our multicast port
and multicast IP address, and pass the event code, text, and address to our inherited
send_event() method.
We turn now to the ChatObjects::MChannel module (Figure 21.9). This module, which is responsible
for transmitting public messages to all currently enrolled members of a channel, requires the most
extensive changes.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Multicasting 499
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Multicasting 500
Lines 16–20: info() method—We override the channel's info() method, which sends de-
scriptive information about the channel to the client. Previously this method returned the name
of the channel, the number of users enrolled, and the description. We modify this slightly so that
the dotted-quad multicast IP address for the channel occupies a position between the user count
and the description.
Lines 21–26: mcast_dest() method—The mcast_dest() method returns the packed binary
destination address for the multicast group. It retrieves the multicast port from the server object
and uses sockaddr_in() to combine it with the dotted-quad address returned by
mcast_addr(). We explicitly put sockaddr_in() into a scalar context so that it packs the
port and IP address together, rather than attempting to unpack its argument.
Lines 27–33: send_to_all() method—The send_to_all() method is called whenever it's
necessary to send a message to all members of a channel. Such messages are sent when a
user joins or departs a channel, as well as when a user sends a public message to the channel.
We call mcast_dest() to get the packed binary address for multicasts directed to the channel,
Licensed by
and then pass this destination, along with the event code and content, to the comm object's
send_event() method.
Note that the ChatObject::MComm class doesn't itself define the send_event() method. This is
inherited from the parent class and is used to send both unicast messages to individual clients and
Stjepan Maric
multicast messages to all channel subscribers.
Only a few parts of the client application need to be modified to support multicasting, so we list only
the relevant portions of the source code (Figure 21.10). The full source code for the modified client
is in Appendix A.
4218908
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Multicasting 501
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Multicasting 502
Lines 1–9: Load modules—In addition to the IO::Socket and IO::Select modules, we now load
ChatObjects::MComm and IO::Socket::Multicast in order to gain access to mcast_add() and
friends.
Lines 23–36: Define handlers for server events—The %MESSAGES hash maps server events to
subroutines that are invoked to handle the events. We add SET_MCAST_PORT to the list of
handled events, making its handler the new create_msocket() subroutine.
Lines 37–42: Initialize the control and multicast sockets—We read the command-line arguments
to get the default server address and control port. We then create a standard ChatOb-
jects::Comm object, which holds the server unicast address and port. We store this in $comm.
This will be used to exchange chat messages with the server. For multicast messages we will
later create a ChatObjects::MComm object.
Lines 41–54: Log in and enter select loop—We now attempt to log into the server. If successful,
we create an IO::Select object on the control socket and STDIN and enter the main loop of the
client, handling user commands and server messages. This part of the program hasn't changed
from the original but is repeated here in order to provide context.
Lines 59–67: Handle the SET_MCAST_PORT message—The create_msocket() subroutine
is responsible for handling SET_MCAST_PORT messages sent from the server. It must do two
things: create a new ChatObjects::MComm object bound to the indicated port and add the new
comm object's socket to the list of filehandles monitored by the client's main select() loop.
The function first examines the port number sent by the server in the message body and refuses
to handle the message unless it is numeric. If the $msocket global variable is already defined,
the function removes it from the list of handles monitored by the global IO::Select object (cur-
rently, this never happens, but a future iteration of this server might change the multicast port
dynamically).
The next step is to create a new comm object to handle incoming multicasts. We call ChatOb-
jects::MComm->new() to create a new communications object wrapped around a multicast-
ing UDP socket.
The last step is to add the newly created socket to the list that the global IO::Select object
monitors.
Lines 124–136: Join and part channels—The join_part() subroutine is called to handle the
server's JOIN_ACK and PART_ACK message codes. The subroutine parses the message from
the server, which contains the affected channel's multicast address. In the case of a
JOIN_ACK message, we tell the multicast socket to join the group by calling its
mcast_add() method. Otherwise, we call mcast_drop().
Lines 137–142: List a channel—A last, trivial change is to the list_channel() method, which
lists information about a channel in response to a CHANNEL_ITEM message. The format of this
message was changed to include the channel's multicast address, so the regular expression
that parses it must change accordingly.
The new multicast-enabled version of the chat server works well on a local area network and be-
tween subnets separated by multicast routers. It will not work across the Internet unless the ISPs
at both ends route multicast packets or you set up a multicast tunnel with mrouted or equivalent.
One limitation of this client is that only one user can run it on the same machine at the same time.
This is because only one socket can be bound to the multicast port at a time. We could work around
this limitation by setting the Reuse option during creation of the multicast socket. This would allow
multiple sockets to bind to the same port but would create a situation in which, whenever one user
subscribed to a channel, all other users on the machine would start to receive messages on that
channel as well. To prevent this, the client would have to keep track of the channels it subscribed
to and filter out messages coming from irrelevant ones.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Multicasting 503
Perhaps a better solution would be to allocate a range of ports for use by the chat system and have
each client run through the allowed ports until it finds a free one that it can bind to. Alternatively, the
server could keep track of the ports and IP addresses used by each client and use the
SET_MCAST_PORT message to direct the client toward an unclaimed port.
Summary
Multicasting is an attractive alternative to unicasting or broadcasting for sending one-to-many mes-
sages across subnet boundaries. Despite the fact that multicasting is more complex than unicasting,
it requires surprisingly few additions to the socket API, making multicasting applications easy to
write.
The main "gotcha" with multicasting is the uneven support for multicast routing on the Internet, which
limits its use to in-house applications and experimental networks like the MBONE.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
504
In previous chapters we focused on TCP/IP sockets, which were designed to allow processes on
different hosts to communicate. Sometimes, however, you'd like two or more processes on the same
host to exchange messages. Although TCP/IP sockets can be used for this purpose (and often are),
an alternative is to use UNIX-domain sockets, which were designed to support local communica-
tions.
The advantage of UNIX-domain sockets over TCP/IP for local interprocess communication is that
they are more efficient and are guaranteed to be private to the machine. A TCP/IP-based service
intended for local communications would have to check the source address of each incoming client
to accept only those originating from the local host.
Once set up, UNIX-domain sockets look and act much like TCP/IP sockets. The process of reading
and writing to them is the same, and the same concurrency-managing techniques that work with
TCP/IP sockets apply equally well to UNIX-domain sockets. In fact, you can write an application for
UNIX-domain sockets and then reengineer it for use on the network just by changing the way it sets
up its sockets.
The socket files are not automatically removed after the socket is closed, and must be unlinked
manually.
The Perl documentation occasionally refers to these files as "fifo's" because they follow first-in-first-
out rules: The first byte of data written by a sending application is the first byte of data read by the
receiver. UNIX-domain sockets are similar in many ways to UNIX pipes (Chapter 2), and in fact the
two are frequently implemented on top of a common code base.
The "UNIX" in UNIX-domain sockets is apt. Although a few platforms, such as OS/2, have facilities
similar to UNIX-domain sockets, most operating systems, including Windows and Macintosh, do not
support them. However, Windows users can get UNIX-domain sockets by installing the free Cyg-
win32 compatibility library. This library is available from http://www.cygnus.com/cygwin/.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
UNIX-Domain Sockets 505
UNIX-domain sockets are used by the standard UNIX syslog daemon (Chapter 12), the Berkeley
lpd printer service, and a number of newer applications such as the XMMS MP3 player (http://
www.xmms.org). In the syslog system, client applications write log messages to a UNIX-domain
socket, such as /dev/log. As described in Chapter 14, the syslog daemon reads these messages,
filters them according to their severity, and writes them to one or several log files. The lpd printer
daemon uses a similar strategy to receive print jobs from clients.
XMMS has a more interesting use for UNIX-domain sockets. By creating and monitoring a UNIX-
domain socket, XMMS can exchange information with clients. Among other things, clients can send
XMMS commands to play a song or change its volume, or retrieve information from XMMS about
what it's currently doing. Doug MacEachern's Xmms module, available from CPAN, provides a Perl
interface to XMMS sockets.
Perl provides both a function-oriented and an object-oriented interface to UNIX-domain sockets.
We'll look at each in turn.
Having created the socket, we can make an outgoing connection to a waiting server by calling
connect(). The chief difference is that we must create the rendezvous address using a pathname
and the utility function sockaddr_un (). This code fragment tries to connect to a server listening
at the address /tmp/daytime:
my $dest = sockaddr_un('/tmp/daytime');
connect(S,$dest) or die "Can't connect: $!";
A UNIX-domain address is simply a pathname that has been padded to a fixed length with nulls and
can be created with sockaddr_un(). The members of the sockaddr_un() family of functions
are similar to their IP counterparts:
$packed_addr = sockaddr_un($path)
($path) = sockaddr_un($packed_addr)
In a scalar context, sockaddr_un() takes a file pathname and turns it into a UNIX-domain destination address
suitable for bind() and connect(). In an array context, the sockaddr_un() reverses this operation, which
is handy for interpreting the return value of recv() and getsockname().
If this context-specific behavior makes you nervous, you can use the pack_sockaddr_un() and
unpack_sockaddr_un() functions instead:
$packed_addr = pack_sockaddr_un($path)
pack_sockaddr_un() packs a file path into a UNIX domain address regardless of array or scalar context.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
UNIX-Domain Sockets 506
$path = unpack_sockaddr_un($packed_addr)
unpack_sockaddr_un() transforms a packed UNIX-domain socket into a file path, regardless of array or
scalar context.
Servers must bind to a UNIX-domain address by calling bind() with the desired rendezvous ad-
dress. This example binds to the socket named /tmp/daytime:
bind(S,sockaddr_un('/tmp/daytime')) or die "Can't bind: $!";
If successful, bind() returns a true value. Common reasons for failure include:
"address already in use" (EADDRINUSE)—The rendezvous point already exists, as a regular file,
a regular directory, or a socket created by a previous invocation of your script. UNIX-domain servers
must unlink the socket file before they exit.
"permission denied" (EACCES)—Permissions deny the current process the ability to create the
socket file at the selected location. The same rules that apply to creating a file for writing apply to
UNIX-domain sockets. On UNIX systems the /tmp directory is often chosen by unprivileged scripts
as the location for sockets.
"not a directory" (ENOTDIR)—The selected path included a component that was not a valid direc-
tory. Additional errors are possible if the selected path is not local. For example, socket addresses
on read-only filesystems or network-mounted filesystems are disallowed.
Once a UNIX-domain socket is created and initialized, it can be used like a TCP/IP socket. Programs
can call read(), sysread(), print(), or syswrite() to communicate in a stream-oriented
fashion, or send() and recv() to use a message-oriented API. Servers may accept new incoming
connections with listen() and accept().
The functions that return socket addresses, such as getpeername(), getsockname(), and
recv(), return packed UNIX-domain addresses when used with UNIX-domain sockets. These
must be unpacked with sockaddr_un() or unpack_sockaddr_un() to retrieve a human-read-
able file path.
You should be aware that some versions of Perl have a bug in the routines that return socket names.
On such versions, the array forms of sockaddr_un() and unpack_sockaddr_un() will fail. This
is not as bad as it sounds because UNIX-domain applications don't need to recover this information
as frequently as TCP/IP applications do. However, if you do need to recover the pathname of the
local or remote socket, you can work around the Perl bug by applying unpack() with a format of
"x2z" to the value returned by getpeername() or getsockname():
$path = unpack "x2z",getpeername(S);
Another thing to be aware of is that a UNIX-domain socket created by a client can connect()
without calling bind(), just as one can with a TCP/IP socket. In this case, the system creates an
invisible endpoint for communication, and getsockname() returns a path of length 0. This is
roughly equivalent to the operating system's method of using ephemeral ports for outgoing TCP/IP
connections.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
UNIX-Domain Sockets 507
$socket = IO::Socket::UNIX-new('/path/to/socket')
The single-argument form of IO::Socket::UNIX->new() attempts to connect to the indicated UNIX-domain
socket, assuming a socket type of SOCK_STREAM. If successful, it returns an IO::Socket::UNIX object.
$socket = IO::Socket::UNIX-new(arg1 => val1, arg2 => val2,...)
The named-argument form of new() takes a set of name=> value pairs and creates a new IO::Socket::UNIX
object. The recognized arguments are listed in Table 22.1.
$path = $socket->hostpath()
The hostpath() method returns the path to the UNIX socket at the local end. The method returns undef for
unbound sockets.
$path = $socket- peerpat>()
peerpath() returns the path to the UNIX socket at the remote end. The method returns undef for uncon-
nected sockets.
Table 22.1 lists the arguments recognized by IO:: Socket::UNIX->new(). Typical scenarios
include:
• Create a socket and connect() it to the process listening on /var/log.
$socket = IO::Socket::UNIX->new(Type=>SOCK_STREAM,
Peer=>'/dev/log');
• Create a UNIX-domain socket bound to /tmp/mysock for use with incoming datagram transmis-
sions.
$socket = IO::Socket::UNIX->new(Type => SOCK_DGRAM,
Local=> '/tmp/mysock');
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
UNIX-Domain Sockets 508
A "Wrap" Server
As a sample application we'll use the standard Text::Wrap module to create a simple text-formatting
server. The server accepts a chunk of text input, reformats it into narrow 30-column paragraphs,
and returns the reformatted text to the client. The server, named wrap_serv.pl uses the standard
forking architecture and the IO::Socket::UNIX library. The client, wrap_cli.pl, uses a simple design
that sends the entire input file to the server, shuts down the socket for writing, and then reads back
the reformatted data. The following is an example of the output from the client after feeding it an
excerpt from the beginning of this chapter:
% wrap_cli.pl ../ch22.txt
Connected to /tmp/wrapserv...
In previous chapters we have focused on TCP/IP sockets, which were designed to allow processes on different
hosts to communicate. Sometimes, however, you'd like two or more processes on the same host to exchange
messages. Although TCP/IP sockets can be used for this purpose (and often are), an alternative is to use UNIX-
domain sockets, which were designed to support local communications.
The advantage of UNIX-domain sockets over TCP/IP for local interprocess communication...
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
UNIX-Domain Sockets 509
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
UNIX-Domain Sockets 510
Lines 1–4: Import modules—We load the IO::Socket module and import the fill() subroutine
from Text::Wrap. Since this is a forking server, we import the WNOHANG constant from POSIX
for use in the CHLD handler. We also bring in the POSIX :signal_h set to block and unblock
signals. This facility will be used in the call to fork().
Lines 5–8: Define constants—We define a SOCK_PATH constant containing the UNIX-domain
socket path and various format settings to be passed to Text::Wrap.
Lines 9–12: Set up variables—We retrieve the socket path from the command line or default to
the one in SOCK_PATH. We set the Text::Wrap $columns variable to the column width defined
in COLUMNS.
Lines 13–16: Install signal handlers—The CHLD signal reaps all child processes using a variant
of the waitpid() loop that we saw earlier. This server must also unlink the UNIX-domain socket
file before it terminates, and for this reason we intercept the INT and TERM signals with a handler
that unlinks the file and then terminates normally.
Licensed by
Lines 17–18: Set umask—We explicitly set the umask to octal 0111 so that the listening socket
will be created world readable and writable. This allows any process on the local host to com-
municate with the server. (The leading 0 is crucial for making 0111 interpreted as an octal
constant. If omitted, Perl interprets this as decimal 111, which is something else entirely.)
Stjepan Maric
Lines 19–21: Create listening socket—We call IO::Socket::UNIX->new() to create a UNIX-
domain listening socket on the selected socket address path. The Listen argument is set to the
SOMAXCONN constant exported by the Socket and IO::Socket modules.
Lines 22–32: accept() loop—The accept() loop is identical to similar loops used in TCP/IP
4218908
servers. We do, however, call fork() through a launch_child() wrapper for reasons that
we will discuss next. The interact() function is responsible for communication with the client
and is run in the child process.
Lines 33–42: launch_child() subroutine— launch_child() is a wrapper around
fork(). Because the parent server process has INT and TERM handlers that unlink the socket
file, we must be careful to remove these handlers from the children; otherwise, the file might be
unlinked prematurely. Using the same strategy we developed in the Daemon module of Chapter
14, we create a POSIX:: SigSet containing the INT, CHLD, and TERM signals and invoke sig-
procmask() to block the signals temporarily. With the signals now safely blocked, we
fork(), and reset each of the handlers to the default behavior in the child. We now unblock
signals by calling sigprocmask() again and return the child's PID.
Lines 43–48: interact() subroutine—The routine that does all the real work is only six lines
long. It retrieves the connected socket from its argument list, reads the list of text lines to format
from the socket, and calls chomp() to remove the newlines, if any. It then passes the lines to
the Text::Wrap fill() function, sends the result across the socket, and closes the socket.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
UNIX-Domain Sockets 511
Lines 1–3: Import modules—We bring in the IO::Socket and Getopt::Long modules. The latter
is used for processing command-line switches.
Line 4: Define SOCK_PATH constant—We define a constant containing the default path to the
UNIX-domain socket.
Lines 5–7: Process command-line arguments—The client allows the user to manually set the
path to the socket by providing a $path argument. We call GetOptions() to parse the com-
mand-line looking for this argument. If not provided, we default to the value of SOCK_PATH.
Lines 8–9: Open socket—We call the one-argument form of IO::Socket::UNIX->new() to
create a new UNIX-domain socket and attempt to connect to the address at $path. We don't
need to set our umask before calling new(), because we will not be binding to a local address.
Lines 10–12: Read text lines and send them to server—We use <> to read all the lines from
STDIN and/or the command-line argument list into an array named @lines, and send them
over the socket to the server. We then invoke shutdown(1) to close the write-half of the socket
and indicate to the server that we have no more data to submit.
Line 13: Print the results—We read the reformatted lines from the socket and print them to
STDOUT.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
UNIX-Domain Sockets 512
To illustrate using datagrams across UNIX-domain sockets, we'll develop a simple variation on the
daytime server. This server acts much like the standard daytime server by returning a string con-
taining the current local date and time in response to incoming requests. However, in a nod to
globalization, it also looks at the incoming message for a string indicating the time zone, and if the
string is present, it returns the date and time relative to that zone.
The server is called localtime_serv.pl and the client localtime_cli.pl. The client takes an optional
time-zone argument on the command line. The following excerpt shows the client being used to
fetch the time in the current time zone, in Eastern Europe, and in Anchorage, Alaska:
% ./localtime_cli.pl
Sat Jun 17 18:06:14 2000
% ./localtime_cli.pl Europe/Warsaw
Sat Jun 17 22:06:24 2000
% ./localtime_cli.pl America/Anchorage
Sat Jun 17 14:06:57 2000
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
UNIX-Domain Sockets 513
Lines 1–6: Server setup—We load the IO::Socket module and choose a default path for the
socket. We then read the command line for an alternative socket path, should the user desire
to change it.
Line 7: Install TERM and INT handlers—As in the connection-oriented example, we need to
delete the socket file before exiting. In the previous case we did this by unlinking the file in the
TERM and INT signal handlers.
For variety, in this example we will accomplish the same thing by defining an END{} block that
unlinks the path when the script terminates. However, to prevent the script from terminating
prematurely, we must still install an interrupt handler that intercepts the TERM and INT signals
and calls exit() so that the process terminates in an orderly fashion.
Lines 8–12: Create socket—We set our umask to 0111 so that the socket will be world writable
and call IO::Socket::UNIX->new() to create the socket and bind it to the designated path.
Unlike the previous example, where we allowed IO::Socket::UNIX to default to a connection-
oriented socket, we pass new() a Type argument of SOCK_DGRAM. Because this is a message-
oriented socket, we do not provide a Listen argument.
Lines 13–22: Transaction loop—We enter an infinite loop. Each time through the loop we call
recv() to return a message of up to 128 bytes (which is as long as a time zone specifier is
likely to get). The value returned from recv() is the path to the peer's socket.
We examine the contents of the message, and if its format is compatible with a time-zone speci-
fier, we use it to set the TZ environment variable, which contains the current time zone. Other-
wise, we delete this variable, which causes Perl to default to the local time zone.
Using the peer's path, we now call send() to return to the peer a datagram containing the output
of localtime(). If for some reason send() returns a false value, we issue a warning.
Line 23: END{} block—The script's END{} block unlinks the socket file if $path is not empty.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
UNIX-Domain Sockets 514
Lines 1–4: Load modules—We load the IO::Socket and Getopt::Long modules. We also bring
in the tmpnam() function from the POSIX module. This handy routine chooses unique names
for temporary files; we'll use it to generate a file path for our local socket.
Lines 5–6: Constants—We define a constant containing the default path to use for the server's
socket, and a TIMEOUT value containing the maximum time we will wait for a response from the
server.
Lines 7–10: Select pathnames for local and remote sockets—We process command-line options
looking for a --path argument. If none is defined, we default to the same path for the server
socket that the server uses.
We also need a pathname for the local socket so that the server can talk back to us, but we don't
want to hard code the path because another user might want to run the client at the same time.
Instead, we call POSIX::tmpnam() to return a unique temporary filename for the local socket.
Line 11: Signal handlers—We will unlink the local socket in an END{} block as in the server. For
this reason, we intercept the INT and TERM signals.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
UNIX-Domain Sockets 515
Lines 12–16: Create socket—We set our umask as before and call IO::Socket::UNIX-
>new() to create the socket, providing both Local and Type arguments to create a
SOCK_DGRAM socket bound to the temporary pathname returned by tmpnam().
Lines 17–18: Prepare to transmit request—We recover the requested time zone from the com-
mand line. If none is provided, we create a message consisting of a single space (we must send
at least 1 byte of data to the server in order for it to respond). We use sockaddr_un() to create
a valid destination address for use with send().
Lines 19–27: Send request and receive response—We call send() to send the message con-
taining the requested time zone to the server.
We now want to call recv() to read the response from the server, but we don't know for sure
that the server is listening. So instead of calling recv() and waiting indefinitely for a response,
we wrap the call in an eval{} block using the technique shown in Chapter 5. On entry into the
eval{}, we set a handler for the ALRM signal, which calls die(). We then set an alarm clock
for TIMEOUT seconds using alarm() and call recv(). If recv() returns before the alarm
expires, we print the returned data. Otherwise, we die with an error message.
Line 28: END{} block—As in the server, we unlink the local socket after we are done.
If you wish to watch the client's timeout mechanism work, start the server and immediately suspend
it using the suspend key (^Z on UNIX systems). When the client sends a request to the server, it
will not get a response and will issue a timeout error.
Summary
UNIX-domain sockets can be used for communication between two or more processes on the same
host. Instead of using IP addresses and port numbers as the rendezvous points, UNIX-domain
sockets use physical file names on the local filesystem. This allows file permissions to be used for
access control, but also complicates server code by requiring servers to unlink the file after the
socket is closed.
Compared to INET-domain (TCP/IP) sockets, UNIX-domain sockets provide greater efficiency in
interprocess communication and security against network-based attacks. However, an important
disadvantage is that UNIX-domain sockets are not implemented as widely as TCP/IP sockets.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
516
Net::NetmaskLite (Chapter 3)
This module contains utilities for working with odd-sized netmasks. With it you can easily determine
the appropriate broadcast and network addresses for any combination of netmask and IP address.
Examine the hostpart(), netpart(), network(), and broadcast() methods to learn the
numeric relationships among these parts of the IP address and its netmask.
David Sharnoff's Net::Netmask module, available on CPAN, provides more functionality and is rec-
ommended for production work.
0 package Net::NetmaskLite;
1 # file: Net/NetmaskLite.pm;
2 use strict;
3 use Carp 'croak';
4 use overload '""'=>netmask;
5 sub new {
6 my $pack = shift;
7 my $mask = shift or croak "Usage: Netmask->new(\$dotted_IP_addr)\n";
8 my $num = ($mask =~ /^\d+$/ && $mask <= 32)
9 ? _tomask($mask)
10 : _tonum($mask);
11 bless \$num,$pack;
12 }
13 sub hostpart {
14 my $mask = shift;
15 my $addr = _tonum(shift)
or croak "Usage: \$netmask->hostpart(\$dotted_IP_addr)\n";
16 _toaddr($addr & ~$$mask);
17 }
18 sub netpart{
19 my $mask = shift;
20 my $addr = _tonum(shift)
or croak "Usage: \$netmask->hostpart(\$dotted_IP_addr)\n";
21 _toaddr($addr & $$mask);
22 }
23 sub broadcast {
24 my $mask = shift;
25 my $addr = _tonum(shift)
or croak "Usage: \$netmask->hostpart(\$dotted_IP_addr)\n";
26 _toaddr($addr | ($$mask ^ 0xffffffff));
27 }
28 sub network {
29 my $mask = shift;
30 my $addr = _tonum(shift)
or croak "Usage: \$netmask->hostpart(\$dotted_IP_addr)\n";
31 _toaddr($addr & ($$mask & 0xffffffff));
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Additonal Source Code 517
32 }
34 # utilities
35 sub _tomask {
36 my $ones = shift;
37 unpack "L",pack "b*",('1' x $ones) . ('0' x (32-$ones));
38 }
39 sub _tonum { unpack "L",pack("C4",split /\./,shift) }
40 sub _toaddr { join '.',unpack("C4",pack("L",shift)) }
41 1;
42 __END__
43 =head1 NAME
45 =head1 SYNOPSIS
46 use Net::NetmaskLite;
47 $mask = Net::NetmaskLite->new('255.255.255.248');
48 $broadcast = $mask->broadcast('64.7.3.42');
49 $network = $mask->network('64.7.3.42');
50 $hostpart = $mask->hostpart('64.7.3.42');
51 $netpart = $mask->netpart('64.7.3.42');
52 =head1 DESCRIPTION
53 This package provides an object that can be used for deriving the
54 broadcast and network addresses given an Internet netmask.
55 =head1 CONSTRUCTOR
56 =over 4
63 =back
64 =head1 METHODS
65 =over 4
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Additonal Source Code 518
79 This just returns the original netmask in dotted decimal form. The
80 quote operator is overloaded to call netmask() when the object is used
81 in a string context.
82 =back
83 =head2 Example:
86 netmask: 255.255.255.248
87 broadcast: 64.7.3.47
88 network: 64.7.3.40
89 hostpart: 0.0.0.2
90 netpart: 64.7.3.40
92 L<Socket>
93 L<perl>
94 =head1 AUTHOR
96 =head1 COPYRIGHT
97 Copyright (c) 2000 Lincoln Stein. All rights reserved. This program is
98 free software; you can redistribute it and/or modify it under the same
99 terms as Perl itself.
100 =cut
2 use strict;
3 require Exporter;
4 eval "use Term::ReadKey";
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Additonal Source Code 519
9 sub get_passwd {
10 my ($user,$host) = @_;
11 print STDERR "$user\@$host "
12 if $user && $host;
13 print STDERR "password: ";
14 echo ('off');
15 chomp(my $pass = <>);
16 echo ('on');
17 print STDERR "\n";
18 $pass;
19 }
20 # print a prompt
21 sub prompt {
22 local($|) = 1;
23 my $prompt = shift;
24 my $default = shift;
25 print "$prompt ('q' to quit) [$default]: ";
26 chomp(my $response = <>);
27 exit 0 if $response eq 'q';
28 return $response || $default;
29 }
30 sub echo {
31 my $mode = shift;
32 if (defined &ReadMode) {
33 ReadMode( $mode eq 'off' ? 'noecho' : 'restore' );
34 } else {
35 if ($mode eq 'off') {
36 chomp($stty_settings = `/usr/bin/stty -g`);
37 system "/usr/bin/stty -echo </dev/tty";
38 } else {
39 $stty_settings =~ /^([:\da-fA-F]+)$/;
40 system "/usr/bin/stty $1 </dev/tty";
41 }
42 }
43 }
44 1;
45 =head1 NAME
47 =head1 SYNOPSIS
48 use PromptUtil;
51 =head1 DESCRIPTION
52 This package exports two utilities that are handy for prompting for
53 user input.
55 =over 4
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Additonal Source Code 520
59 the input line (minus the newline). If the user hits return without
60 typing anything, it returns the default specified by C<$default>.
62 Turns off terminal echo and prompts the user to enter password.
63 If C<$user> and C<$host> are provided, the prompt is in the format
64 [email protected] password:
66 password:
67 The function returns the password, or undef it the user typed return
68 without entering a password.
69 =back
Licensed by
71 attempts to use that. Otherwise, it calls the UNIX stty
72 program, which is not available on non-UNIX systems.
74 L<Term::ReadKey>, L<perl>
75
76
77
=head1 AUTHOR
Stjepan Maric
Lincoln Stein <[email protected]>
=head1 COPYRIGHT
4218908
78 Copyright (c) 2000 Lincoln Stein. All rights reserved. This program is
79 free software; you can redistribute it and/or modify it under the same
80 terms as Perl itself.
81 =cut
2 use strict;
3 use Carp;
4 use IO::SessionSet;
5 use IO::LineBufferedSessionData;
6 use vars '@ISA','$VERSION';
7 @ISA = 'IO::SessionSet';
8 $VERSION = '1.00';
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Additonal Source Code 521
20 1;
21 =head1 NAME
23 =head1 SYNOPSIS
24 use IO::LineBufferedSet;
25 my $set = IO::LineBufferedSet->new();
26 $set->add($_) foreach ($handle1,$handle2,$handle3);
27 my $line;
28 while ($set->sessions) {
29 my @ready = $set->wait;
30 for my $h (@ready) {
31 unless (my $bytes = $h->getline($line)) { # fetch a line
32 $h->close; # EOF or an error
33 next;
34 }
35 next unless $bytes > 0; # skip zero-length line
36 my $result = process_data($line); # do some processing on the line
37 $line->write($result); # write result to handle
38 }
39 }
40 =head1 DESCRIPTION
41 This package provides support for sets of nonblocking handles for use
42 in multiplexed applications.
43 =head1 CONSTRUCTOR
44 =over 4
51 =back
53 =over 4
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Additonal Source Code 522
69 This method deletes the indicated handle from the monitored set. You
70 may use either the handle itself, or the corresponding
71 IO::LineBufferedSessionData.
80 Sessions are always ready for writing, since they are nonblocking.
81 =back
85 =head1 AUTHOR
87 =head1 COPYRIGHT
88 Copyright (c) 2000 Lincoln Stein. All rights reserved. This program is
89 free software; you can redistribute it and/or modify it under the same
90 terms as Perl itself.
91 =cut
2 use strict;
3 use Carp;
4 use IO::SessionData;
5 use Errno 'EWOULDBLOCK';
6 use IO::SessionData;
7 use IO::LineBufferedSet;
8 use vars '@ISA','$VERSION';
9 @ISA = 'IO::SessionData';
10 $VERSION = 1.00;
19 # line_mode is set to true if the package detects that you are doing
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Additonal Source Code 523
34 # Add three new methods to tell us when there's buffered data available.
35 sub buffered { return length shift->{inbuffer} }
36 sub lines_pending {
37 my $self = shift;
38 return index($self->{inbuffer},$/,$self->{index}) >= 0;
39 }
40 sub has_buffered_data {
41 my $self = shift;
42 return $self->line_mode ? $self->lines_pending : $self->buffered;
43 }
60 # $bytes = $reader->getline($data);
61 # returns bytes read on success
62 # returns undef on error
63 # returns 0 on EOF
64 # returns 0E0 if would block
65 sub getline {
66 my $self = shift;
67 croak "usage: getline(\$scalar)\n" unless @_ == 1;
77 # If the line end character is not there and the buffer is below the
78 # read length, then fetch more data.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Additonal Source Code 524
108 # remove the line from the input buffer and reset the search
109 # index.
110 $_[0] = substr($self->{inbuffer},0,$i+1); # save the line
111 substr($self->{inbuffer},0,$i+1) = ' '; # and chop off the rest
112 $self->{index} = 0;
113 return length $_[0];
114 }
115 1;
122 my $line;
123 while ($set->sessions) {
124 my @ready = $set->wait;
125 for my $h (@ready) {
126 unless (my $bytes = $h->getline($line)) { # fetch a line
127 $h->close; # EOF or an error
128 next;
129 }
130 next unless $bytes > 0; # skip zero-length line
131 my $result = process_data($line); # do some processing on the line
132 $line->write($result); # write result to handle
133 }
134 }
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Additonal Source Code 525
136 This package provides support for sets of nonblocking handles for use
137 in multiplexed applications. It is used in conjunction with
138 IO::LineBufferedSet, and inherits from IO::SessionData.
143 The new() constructor is not normally called by user applications, but
144 by IO::LineBufferedSet.
146 =over 4
165 This method has the same semantics as read() except that it returns
166 whole lines, observing the current value of C<$/>. Be very alert for
167 the 0E0 result code (indicating that the operation would block)
168 because these occur whenever a partial line is read.
178 This method closes the session, and removes itself from the list of
179 sessions monitored by the IO::LineBufferedSet object that owns it.
180 The handle may not actually be closed until later, when
181 pending writes are finished.
182 Do B<not> call the handle's close() method yourself, or pending writes
183 may be lost.
184 The return code indicates whether the session was successfully closed.
185 Note that this returns true on delayed closes, and thus is not of
186 much use in detecting whether the close was actually successful.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Additonal Source Code 526
192 Called with a single argument, the method sets the write limit.
193 Called with no arguments, returns the current value. Call with 0 to
194 disable the limit.
196 The set_choke() method gets or sets the I<choke function>, which is
197 invoked when the size of the write buffer exceeds the size set by
198 write_limit(). Called with a coderef argument, set_choke() sets the
199 function; otherwise it returns its current value.
200 When the choke function is invoked, it will be called with two
201 arguments consisting of the session object and a flag indicating
202 whether writes should be choked or unchoked. The function should take
203 whatever action is necessary, and return. The default choke action is
204 to disallow further reads on the session (by calling readable() with a
205 false value) until the write buffer has returned to acceptable size.
206 Note that choking a session has no effect on the write() method, which
207 can continue to append data to the buffer.
209 This method flags the session set that this filehandle should be
210 monitored for reading. C<$flag> is true to allow reads, and false
211 to disallow them.
213 This method flags the session set that this filehandle should be
214 monitored for writing. C<$flag> is true to allow writes, and false
215 to disallow them.
216 =back
223 Copyright (c) 2000 Lincoln Stein. All rights reserved. This program is
224 free software; you can redistribute it and/or modify it under the same
225 terms as Perl itself.
226 =cut
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Additonal Source Code 527
0 package DaemonDebug;
1 use strict;
2 use vars qw(@EXPORT @ISA @EXPORT_OK $VERSION);
18 sub init_server {
19 $pidfile = shift;
20 $pidfile ||= getpidfilename();
21 my $fh = open_pid_file($pidfile);
22 print $fh $$;
23 close $fh;
24 $SIG{CHLD} = \&reap_child;
25 return $pid = $$;
26 }
27 sub launch_child {
28 my $callback = shift;
29 my $signals = POSIX::SigSet->new(SIGINT,SIGCHLD,SIGTERM,SIGHUP);
30 sigprocmask(SIG_BLOCK,$signals); # block inconvenient signals
31 log_die("Can't fork: $!") unless defined (my $child = fork());
32 if ($child) {
33 $CHILDREN{$child} = $callback || 1;
34 } else {
35 $SIG{HUP} = $SIG{INT} = $SIG{CHLD} = $SIG{TERM} = 'DEFAULT';
36 }
37 sigprocmask(SIG_UNBLOCK,$signals); # unblock signals
38 return $child;
39 }
40 sub reap_child {
41 while ( (my $child = waitpid(-1,WNOHANG)) > 0) {
42 $CHILDREN{$child}->($child) if ref $CHILDREN{$child} eq 'CODE';
43 delete $CHILDREN{$child};
44 }
45 }
46 sub kill_children {
47 kill TERM => $_ foreach keys %CHILDREN;
48 # wait until all the children die
49 sleep while %CHILDREN;
50 }
56 sub getpidfilename {
57 my $basename = basename($0,'.pl');
58 return PIDPATH . "/$basename.pid";
59 }
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Additonal Source Code 528
60 sub open_pid_file {
61 my $file = shift;
62 if (-e $file) { # oops. pid file already exists
63 my $fh = IO::File->new($file) || return;
64 my $pid = <$fh>;
65 croak "Invalid PID file" unless $pid =~ /^(\d+)$/;
66 croak "Server already running with PID $1" if kill 0 => $1;
67 cluck "Removing PID file for defunct server process $pid.\n";
68 croak "Can't unlink PID file $file" unless -w $file && unlink $file;
69 }
70 return IO::File->new($file,O_WRONLY|O_CREAT|O_EXCL,0644)
71 || die "Can't create $file: $!\n";
72 }
74 1;
75 __END__
1 use strict;
2 use Text::Wrap qw(fill);
3 use IO::File;
4 sub new {
5 my $pack = shift;
6 return bless {
7 words => [],
8 lookup => {},
9 num => {},
10 a => ' ', p=> ' ', n=>' ',
11 },$pack;
12 }
13 sub add {
14 my $self = shift;
15 my $string = shift;
16 my ($words,$lookup,$num,$a,$p,$n) =
17 @{$self}{qw(words lookup num a p n)};
18 for my $w (split /\s+/,$string) {
19 ($a,$p) = ($p,$n);
20 unless (defined($n = $num->{$w})) {
21 push @{$words},$w;
22 $n = pack 'S',$#$words;
23 $num->{$w} = $n;
24 }
25 $lookup->{"$a$p"} .= $n;
26 }
27 @{$self}{'a','p','n'} = ($a,$p,$n);
28 }
29 sub analyze_file {
30 my $self = shift;
31 my $file = shift;
32 unless (defined (fileno $file)) {
33 $file = IO::File->new($file) || croak("Couldn't open $file: $!\n");
34 }
35 $self->add($_) while defined ($_ = <$file>);
36 }
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Additonal Source Code 529
37 sub generate {
38 my $self = shift;
39 my $word_count = shift || 1000;
55 sub words {
56 return @{shift->{words}};
57 }
58 sub pretty_text {
59 my $self = shift;
60 my $text = $self->generate(@_);
61 return fill("\t",' ',$text) . "\n";
62 }
63 sub reset {
64 my $self= shift;
65 @{$self}{qw(lookup num)} = ({},{});
66 $self->{words} = [];
67 delete $self->{a};
68 delete $self->{p};
69 }
70 1;
71 =head1 NAME
73 =head1 SYNOPSIS
74 use Text::Travesty
75 my $travesty = Text::Travesty->new;
76 $travesty->analyze_file('for_whom_the_bell_tolls.txt');
77 print $travesty->generate(1000);
78 =head1 DESCRIPTION
82 =head1 CONSTRUCTOR
83 =over 4
88 =back
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Additonal Source Code 530
90 =over 4
91 =item $travesty->add($text);
92 This method splits the provided text into words and adds them to the
93 internal frequency tables. You will typically call add() multiple
94 times during the analysis of a longer text.
98 =item $travesty->analyze_file($file)
99 This method adds the entire contents of the indicated file to the
100 frequency tables. C<$file> may be an opened filehandle, in which case
101 analyze_file() reads its contents through to EOF, or a file path, in
102 which case the method opens it for reading.
Licensed by
103 =item $text = $travesty->generate([$count])
104 The generate() method spews back a travesty of the input text
105 based on a Markov model built from the word-frequency tables.
106 C<$count>, if provided, gives the length of the text to generate in
107 words. If not provided, the count defaults to 1000.
108
109
110
Stjepan Maric
=item $text = $travesty->pretty_text([$count])
4218908
111 =item @words = $travesty->words
112 This method returns a list of all the unique words in the frequency
113 tables. Punctuation and capitalization count for uniqueness.
115 Reset the travesty object, clearing out its frequency tables and
116 readying it to accept a new text to analyze.
117 =back
123 Copyright (c) 2000 Lincoln Stein. All rights reserved. This program is
124 free software; you can redistribute it and/or modify it under the same
125 terms as Perl itself.
126 =cut
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Additonal Source Code 531
0 #!/usr/bin/perl -w
1 # file: chat_client.pl
2 # chat client using UDP
3 use strict;
4 use IO::Socket;
5 use IO::Select;
6 use ChatObjects::ChatCodes;
7 use ChatObjects::MComm;
8 use IO::Socket::Multicast;
9 use Sys::Hostname;
43 # Try to log in
44 $nickname = do_login();
45 die "Can't log in.\n" unless $nickname;
46 # Read commands from the user and messages from the server
47 my $select = IO::Select->new($comm->socket,\*STDIN);
48 LOOP:
49 while (1) {
50 my @ready = $select->can_read;
51 foreach (@ready) {
52 if ($_ eq \$STDIN) {
53 do_user(\*STDIN) || last LOOP;
54 } else {
55 do_server($_);
56 }
57 }
58 }
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Additonal Source Code 532
102 }
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Additonal Source Code 533
178 END {
179 if (defined $comm) {
180 $comm->send_event(LOGOFF,$nickname);
181 $comm->close;
182 }
183 }
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
534
The tables in this appendix list some of Perl's special variables and constants.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Perl Error Codes and Special Variables 535
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Perl Error Codes and Special Variables 536
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Perl Error Codes and Special Variables 537
When these globals are used as IO::Handle methods, some act as class methods and are global
to all filehandles and IO::Handle objects. The output_field_separator() method is an ex-
ample of a class method. Other methods are specific to individual filehandle objects and should be
called as object methods. The input_line_number() method, which gives the number of the
last line read from the filehandle, is an example of this:
$lineno = $fh->input_line_number();
In Table B.2, the first column is the punctuation variable, the second column is its English equivalent,
and the third indicates whether it is also available as an IO::Handle method() call. The value of
the third column is "class" if the method should be invoked as a class method global to all IO::Handle
objects, "object" if it is available on a per-filehandle basis, or "no" if the global is not available as a
method.
See the perlformat POD documentation for an explanation of how to use Perl's built-in formatted
report generator.
Table B.2. Global I/O Variables
Variable English Method Description
$_ $ARG no Default destination for <> operator and other I/O functions
$, $OUTPUT_FIELD_SEPARATOR class Character to print between members of a list (default: none)
$| $OUTPUT_AUTOFLUSH object If set to nonzero, causes a flush on the currently selected file-
handle with each output operation. Use the autoflush()
method with IO::Handle objects
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Perl Error Codes and Special Variables 538
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
539
This appendix lists values for assigned ports, IP addresses, and other Internet reference information.
They are adapted from RFC 1700, Assigned Numbers.
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Internet Reference Tables 540
Licensed by
dsp 33/tcp Display Support Protocol
dsp 33/udp Display Support Protocol
35/tcp Any private printer server
35/udp Any private printer server
Stjepan Maric
time 37/tcp Time
time 37/udp Time
rap 38/tcp Route Access Protocol
rap 38/udp Route Access Protocol
4218908
rlp 39/tcp Resource Location Protocol
rlp 39/udp Resource Location Protocol
graphics 41/tcp Graphics
graphics 41/udp Graphics
nameserver 42/tcp Host Name Server
nameserver 42/udp Host Name Server
nicname 43/tcp Who Is
nicname 43/udp Who Is
mpm-flags 44/tcp MPM FLAGS Protocol
mpm-flags 44/udp MPM FLAGS Protocol
mpm 45/tcp Message Processing Module [recv]
mpm 45/udp Message Processing Module [recv]
mpm-snd 46/tcp MPM [default send]
mpm-snd 46/udp MPM [default send]
ni-ftp 47/tcp NI FTP
ni-ftp 47/udp NI FTP
auditd 48/tcp Digital Audit Daemon
auditd 48/udp Digital Audit Daemon
login 49/tcp Login Host Protocol
login 49/udp Login Host Protocol
re-mail-ck 50/tcp Remote Mail Checking Protocol
re-mail-ck 50/udp Remote Mail Checking Protocol
la-maint 51/tcp IMP Logical Address Maintenance
la-maint 51/udp IMP Logical Address Maintenance
xns-time 52/tcp XNS Time Protocol
xns-time 52/udp XNS Time Protocol
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Internet Reference Tables 541
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Internet Reference Tables 542
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Internet Reference Tables 543
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Internet Reference Tables 544
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Internet Reference Tables 545
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Internet Reference Tables 546
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Internet Reference Tables 547
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Internet Reference Tables 548
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Internet Reference Tables 549
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Internet Reference Tables 550
Licensed by
at-rtmp 201/ AppleTalk Routing Maintenance
tcp
at-rtmp 201/ AppleTalk Routing Maintenance
udp
at-nbp 202/ AppleTalk Name Binding
tcp
at-nbp
at-3
202/
udp
203/
tcp
Stjepan Maric
AppleTalk Name Binding
AppleTalk Unused
4218908
at-3 203/ AppleTalk Unused
udp
at-echo 204/ AppleTalk Echo
tcp
at-echo 204/ AppleTalk Echo
udp
at-5 205/ AppleTalk Unused
tcp
at-5 205/ AppleTalk Unused
udp
at-zis 206/ AppleTalk Zone Information
tcp
at-zis 206/ AppleTalk Zone Information
udp
at-7 207/ AppleTalk Unused
tcp
at-7 207/ AppleTalk Unused
udp
at-8 208/ AppleTalk Unused
tcp
at-8 208/ AppleTalk Unused
udp
tam 209/ Trivial Authenticated Mail Protocol
tcp
tam 209/ Trivial Authenticated Mail Protocol
udp
z39.50 210/ ANSI Z39.50
tcp
z39.50 210/ ANSI Z39.50
udp
914c/g 211/ Texas Instruments 914C/G Termi-
tcp nal
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Internet Reference Tables 551
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Internet Reference Tables 552
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Internet Reference Tables 553
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Internet Reference Tables 554
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Internet Reference Tables 555
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Internet Reference Tables 556
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Internet Reference Tables 557
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Internet Reference Tables 558
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Internet Reference Tables 559
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Internet Reference Tables 560
Licensed by
meter 571/ udemon
udp
ipcserver 600/ Sun IPC server
tcp
ipcserver 600/ Sun IPC server
udp
Stjepan Maric
nqs 607/ nqs
tcp
nqs 607/ nqs
udp
4218908
urm 606/ Cray Unified Resource Manager
tcp
urm 606/ Cray Unified Resource Manager
udp
sift-uft 608/ Sender-Initiated/Unsolicited File
tcp Transfer
sift-uft 608/ Sender-Initiated/Unsolicited File
udp Transfer
npmp-trap 609/ npmp-trap
tcp
npmp-trap 609/ npmp-trap
udp
npmp-local 610/ npmp-local
tcp
npmp-local 610/ npmp-local
udp
npmp-gui 611/ npmp-gui
tcp
npmp-gui 611/ npmp-gui
udp
ginad 634/ ginad
tcp
ginad 634/ ginad
udp
mdqs 666/
tcp
mdqs 666/
udp
doom 666/ doom Id Software
tcp
doom 666/ doom Id Software
tcp
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Internet Reference Tables 561
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Internet Reference Tables 562
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Internet Reference Tables 563
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Internet Reference Tables 564
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Internet Reference Tables 565
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Internet Reference Tables 566
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Internet Reference Tables 567
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Internet Reference Tables 568
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Internet Reference Tables 569
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Internet Reference Tables 570
Licensed by
oceansoft-lm 1466/tcp Ocean Software License Manager
oceansoft-lm 1466/udp Ocean Software License Manager
csdmbase 1467/tcp CSDMBASE
csdmbase 1467/udp CSDMBASE
Stjepan Maric
csdm 1468/tcp CSDM
csdm 1468/udp CSDM
aal-lm 1469/tcp Active Analysis Limited License Man-
ager
aal-lm 1469/udp Active Analysis Limited License Man-
4218908
ager
uaiact 1470/tcp Universal Analytics
uaiact 1470/udp Universal Analytics
csdmbase 1471/tcp csdmbase
csdmbase 1471/udp csdmbase
csdm 1472/tcp csdm
csdm 1472/udp csdm
openmath 1473/tcp OpenMath
openmath 1473/udp OpenMath
telefinder 1474/tcp Telefinder
telefinder 1474/udp Telefinder
taligent-lm 1475/tcp Taligent License Manager
taligent-lm 1475/udp Taligent License Manager
clvm-cfg 1476/tcp clvm-cfg
clvm-cfg 1476/udp clvm-cfg
ms-sna-server 1477/tcp ms-sna-server
ms-sna-server 1477/udp ms-sna-server
ms-sna-base 1478/tcp ms-sna-base
ms-sna-base 1478/udp ms-sna-base
dberegister 1479/tcp dberegister
dberegister 1479/udp dberegister
pacerforum 1480/tcp PacerForum
pacerforum 1480/udp PacerForum
airs 1481/tcp AIRS
airs 1481/udp AIRS
miteksys-lm 1482/tcp Miteksys License Manager
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Internet Reference Tables 571
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Internet Reference Tables 572
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Internet Reference Tables 573
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Internet Reference Tables 574
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Internet Reference Tables 575
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Internet Reference Tables 576
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Internet Reference Tables 577
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Internet Reference Tables 578
address Description
224.0.0.0 Base Address (Reserved)
224.0.0.1 All Systems on this Subnet
224.0.0.2 All Routers on this Subnet
224.0.0.3 Unassigned
224.0.0.4 DVMRP Routers
224.0.0.5 OSPFIGP All Routers
224.0.0.6 OSPFIGP Designated Routers
224.0.0.7 ST Routers
224.0.0.8 ST Hosts
224.0.0.9 RIP2 Routers
224.0.0.10 IGRP Routers
224.0.0.11 Mobile-Agents
224.0.0.12–224.0.0.255 Unassigned
224.0.1.0 VMTP Managers Group
224.0.1.1 NTP Network Time Protocol
224.0.1.2 SGI-Dogfight
224.0.1.3 Rwhod
224.0.1.4 VNP
224.0.1.5 Artificial Horizons—Aviator
224.0.1.6 NSS—Name Service Server
224.0.1.7 AUDIONEWS—Audio News Multicast
224.0.1.8 SUN NIS+ Information Service
224.0.1.9 MTP Multicast Transport Protocol
224.0.1.10 IETF-1-LOW-AUDIO
224.0.1.11 IETF-1-AUDIO
224.0.1.12 IETF-1-VIDEO
224.0.1.13 IETF-2-LOW-AUDIO
224.0.1.14 IETF-2-AUDIO
224.0.1.15 IETF-2-VIDEO
224.0.1.16 MUSIC-SERVICE
224.0.1.17 SEANET-TELEMETRY
224.0.1.18 SEANET-IMAGE
224.0.1.19 MLOADD
224.0.1.20 any private experiment
224.0.1.21 DVMRP on MOSPF
224.0.1.22 SVRLOC
224.0.1.23 XINGTV
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Internet Reference Tables 579
address Description
224.0.1.24 Microsoft-ds
224.0.1.25 NBC-pro
224.0.1.26 NBC-pfn
224.0.1.27–224.0.1.255 Unassigned
224.0.2.1 "rwho" Group (BSD) (unofficial)
224.0.2.2 SUN RPC PMAPPROC_CALLIT
224.0.3.000–224.0.3.255 RFE Generic Service
224.0.4.000–224.0.4.255 RFE Individual Conferences
224.0.5.000–224.0.5.127 CDPD Groups
224.0.5.128–224.0.5.255 Unassigned
224.0.6.000–224.0.6.127 Cornell ISIS Project
224.0.6.128–224.0.6.255 Unassigned
224.1.0.0–224.1.255.255 ST Multicast Groups
224.2.0.0–224.2.255.255 Multimedia Conference Calls
224.252.0.0–224.255.255.255 DIS transient groups
232.0.0.0–232.255.255.255 VMTP transient groups
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
580
Appendix D. Bibliography
Perl Programming
The books and online sources listed here are recommended as guides to Perl.
Books
Licensed by Bibliography
Stjepan Maric
4218908
T, ChristiansenN, TorkingtonL Wall (1998). Perl Cookbook. O'Reilly & Associates (ISBN
1565922433).
D Conway (1999). Object Oriented Perl. Manning Publications (ISBN 1884777791).
J Hall (1998). Effective Perl Programming: Writing Better Programs with Perl. Addison-Wesley
(ISBN 0201419750).
R, SchwartzT, ChristiansenR Wall (1997). Learning Perl, 2nd ed. O'Reilly & Associates (ISBN
1565922840).
L, WallT, ChristiansenJ Orwant (2000). Programming Perl, 3rd ed. O'Reilly & Associates (ISBN
0596000278).
Online Resources
Bibliography
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Bibliography 581
Books
Bibliography
D Comer (2000). Internetworking with TCP/IP, Vol. 1: Principles, Protocols, and Architecture. Pren-
tice-Hall (ISBN 0130183806).
D Comer (1998). Internetworking with TCP/IP, Vol. 2: ANSI C Version: Design, Implementation, and
Internals. Prentice-Hall (ISBN 0139738436).
D Comer (1996). Internetworking with TCP/IP, Vol. 3: Client-Server Programming and Applications
—BSD Socket Version. Prentice-Hall (ISBN 013260969X).
C Hunt (1998). TCP/IP Network Administration. O'Reilly & Associates (ISBN 1565923227).
WR Stevens (1994). TCP/IP Illustrated, Volume 1: The Protocols. Addison-Wesley (ISBN
0201633469).
WR Stevens (1996). TCP/IP Illustrated, Volume 3: TCP for Transactions, HTTP, NNTP, and the
UNIX© Domain Protocols. Addison-Wesley (ISBN 0201634953).
WR Stevens (1997). UNIX Network Programming, Volume 1: Networking APIs: Sockets and XTI.
Prentice-Hall (ISBN 013490012X).
GR, WrightWR Stevens (1995). TCP/IP Illustrated, Volume 2: The Implementation. Addison-Wes-
ley (ISBN 020163354X).
Online Resources
Bibliography
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Bibliography 582
S, DeeringR Hinden (1998). Internet Protocol, Version 6 (IPv6) Specification. RFC 2460—http://
www.faqs.org/rfcs/rfc2460.html
G, KesslerS Shepard (1997). A Primer on Internet and TCP/IP Tools and Utilities. RFC 2151—http://
www.faqs.org/rfcs/rfc2151.html
J Postel (1980). User Datagram Protocol. RFC 768—http://www.faqs.org/rfcs/rfc0768.html
J Postel (1981). Transmission Control Protocol. RFC 793—http://www.faqs.org/rfcs/rfc0793.html
J Postel (1983). Echo Protocol. RFC 862—http://www.faqs.org/rfcs/rfc0862.html
J Postel (1983). Daytime Protocol. RFC 867—http://www.faqs.org/rfcs/rfc0867.html
J, ReynoldsJ. Postel (1994). Assigned Numbers. RFC 1700—http://www.faqs.org/rfcs/rfc1700.html
TJ, SocolofskyCJ Kale (1991). TCP/IP Tutorial. RFC 1180—http://www.faqs.org/rfcs/rfc1180.html
Bibliography
D Comer (1996). Internetworking with TCP/IP Vol. 3: Client-Server Programming and Applications
—BSD Socket Version. Prentice-Hall (ISBN013260969X).
WR Stevens (1992). Advanced Programming in the UNIX Environment. Addison-Wesley (ISBN
0201563177).
WR Stevens (1997). UNIX Network Programming, Volume 1: Networking APIs: Sockets and XTI.
Prentice-Hall (ISBN 013490012X).
Multicasting
Books
Bibliography
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Bibliography 583
Online Resources
Bibliography
Application-Level Protocols
FTP
Online Resources
Bibliography
Telnet
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Bibliography 584
Online Resources
Bibliography
Secure Shell
Books
Bibliography
DJ, BarrettR Silverman (2000). SSH, The Secure Shell: The Definitive Guide. O'Reilly & Asso-
ciates (ISBN 0596000111).
Online Resources
Bibliography
FreeSSH Project—http://www.freessh.org
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Bibliography 585
OpenSSH Project—http://www.openssh.com
Secure Shell, Inc.—http://www.ssh.com
SMTP
Books
Bibliography
Online Resources
Bibliography
J, KlensinN, FreedM, RoseE, StefferudD Crocker (1995). SMTP Service Extensions. RFC 1869—
http://www.faqs.org/rfcs/rfc1869.html
J Postel (1982). Simple Mail Transfer Protocol. RFC 821—http://www.faqs.org/rfcs/rfc0821.html
procmail—ftp://ftp.informatik.rwth-aachen.de/pub/packages/procmail/procmail.tar.gz
MIME
Online Resources
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Bibliography 586
Bibliography
N, FreedN Borenstein (1996). MIME (Multipurpose Internet Mail Extensions (MIME), Part 1: Format
of Internet Message Bodies. RFC 2045—http://www.faqs.org/rfcs/rfc2045.html
N, FreedN Borenstein (1996). MIME (Multipurpose Internet Mail Extensions), Part 2: Media
Types. RFC 2046—http://www.faqs.org/rfcs/rfc2046.html
N, FreedN Borenstein (1996). MIME (Multipurpose Internet Mail Extensions), Part 3: Message
Header Extensions for Non-ASCII Text. RFC 2047—http://www.faqs.org/rfcs/rfc2047.html
N, FreedN Borenstein (1996). Multipurpose Internet Mail Extensions (MIME), Part 4: Registration
Procedures. RFC 2048—http://www.faqs.org/rfcs/rfc2048.html
N, FreedN Borenstein (1996). Multipurpose Internet Mail Extensions (MIME), Part 5: Conformance
Criteria and Examples. RFC 2049—http://www.faqs.org/rfcs/rfc2049.html
POP
Online Resources
Bibliography
IMAP
Online Resources
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Bibliography 587
Bibliography
M Crispin (1996). Internet Message Access Protocol, Version 4rev1. RFC 2060—http://
www.faqs.org/rfcs/rfc2060.html
J, KlensinR, CatoeP Krumviede (1997). IMAP/POP AUTHorize Extension for Simple Challenge/
Response. RFC 2195—http://www.faqs.org/rfcs/rfc2195.html
NNTP
Book
Bibliography
Online Resource
Bibliography
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Bibliography 588
Books
Bibliography
Online Resources
Bibliography
T, BrayJ, PaoliCM Sperberg-McQueen (1998). Extensible Markup Language (XML) 1.0 http://
www.w3.org/TR/REC-xml
R, FieldingJ, GettysJ, MogulH, FrystykL, MasinterP, LeachT Berners-Lee (1999). Hypertext Trans-
fer Protocol—HTTP/1.1. RFC 2616—http://www.faqs.org/rfcs/rfc2616.html
J, FranksP, Hallam-BakerJ, HostetlerS, LawrenceP, LeachA, LuotonenL Stewart (1999). HTTP Au-
thentication: Basic and Digest Access Authentication. RFC 2617—http://www.faqs.org/rfcs/
rfc2617.html
Network Security
Books
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.
Bibliography 589
Bibliography
Anonymous (1998). Maximum Security: A Hacker's Guide to Protecting Your Internet Site and Net-
work. Sams. (ISBN 0672313413).
S, GarfinkelG Spafford (1996). Practical UNIX and Internet Security. O'Reilly & Associates (ISBN
1565921488).
D, RussellGT Gangemi (1991). Computer Security Basics. O'Reilly & Associates (ISBN
0937175714).
Network Programming with Perl. Network Programming with Perl, ISBN: 0-201-61571-1
Prepared for [email protected], Stjepan Maric
Copyright © 2001 by Addison-Wesley. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from
the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.