2013 DeepIntoPharo EN PDF
2013 DeepIntoPharo EN PDF
http://deepintopharo.com
Copyright © 2013 by Alexandre Bergel, Damien Cassou, Stéphane Ducasse and Jannik Laval.
The contents of this book are protected under Creative Commons Attribution-ShareAlike 3.0
Unported license.
You are free:
Attribution. You must attribute the work in the manner specified by the author or licensor (but
not in any way that suggests that they endorse you or your use of the work).
Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting
work only under the same, similar or a compatible license.
• For any reuse or distribution, you must make clear to others the license terms of this
work. The best way to do this is with a link to this web page: creativecommons.org/licenses/
by-sa/3.0/
• Any of the above conditions can be waived if you get permission from the copyright
holder.
• Nothing in this license impairs or restricts the author’s moral rights.
Your fair dealing and other rights are in no way affected by the above. This
is a human-readable summary of the Legal Code (the full license):
creativecommons.org/licenses/by-sa/3.0/legalcode
1 Preface 1
I Libraries
4 Sockets 31
4.1 Basic Concepts . . . . . . . . . . . . . . . . . . 31
4.2 TCP Client . . . . . . . . . . . . . . . . . . . 33
4.3 TCP Server . . . . . . . . . . . . . . . . . . . 37
iv Contents
4.4 SocketStream . . . . . . . . . . . . . . . . . . 43
4.5 Tips for Networking Experiments . . . . . . . . . . . 48
4.6 Chapter summary. . . . . . . . . . . . . . . . . 49
II Source Management
III Frameworks
10 Glamour 191
10.1 Installation and first browser . . . . . . . . . . . . . 191
10.2 Presentation, Transmission and Ports . . . . . . . . . . 194
10.3 Composing and Interaction . . . . . . . . . . . . . 200
10.4 Chapter summary. . . . . . . . . . . . . . . . . 207
IV Language
V Tools
19 Biographies 411
Chapter 1
Preface
and object technology. Thomas is perhaps best known as the founder and past CEO of Object
Technology International, Inc., now IBM OTI Labs. OTI was responsible for initial development
of the Eclipse open source IDE and the Visual Age Java development environment.
2 freely available from http://pharobyexample.org
2 Preface
nies the reader for a fantastic journey into exciting parts of Pharo. It covers
new libraries such as FileSystem, frameworks such as Roassal and Glamour,
complex of the system aspects such as exceptions and blocks.
The book is divided into 5 parts and 17 chapters. The first part deals
with truly object-oriented libraries. The second part is about source code
management. The third part is about advanced frameworks. The fourth
part covers advanced topics of the language, in particular exception, blocks
and numbers. The fifth and last part is about tooling, including profiling and
parsing.
Pharo is supported by a strong community that grows daily. Pharo’s
community is active, innovative, and is always pushing limits of software
engineering. The Pharo community consists of software engineering soft-
ware, casual programmers but also high-level consultants, researchers, and
teachers. This book exists because of the Pharo community and we naturally
dedicate this book to this group of people that many of us consider as our
second family.
Acknowledgments
We would like to thank various people who have contributed to this book.
In particular, we would like to thank:
• Alain Plantec for his effort in the Setting Framework chapter and his
effort to integrate it into Pharo.
• Dale Henrichs and Mariano Martinez Peck for their participation in the
Metacello chapter.
• Tudor Doru Girba for the Glamour chapter and the first documenta-
tion.
• Nicolas Cellier for his participation in the Fun with Floats chapter.
• Lukas Renggli for PetitParser and his work on the refactoring engine
and smallLint rules.
3
• Jan Kurs and Guillaume Larcheveque for their participation in the Pe-
titParser chapter.
• Colin Putney for the initial version of FileSystem and Camillo Bruni
for his review of FileSystem and his rewrite of the Pharo Core.
• Vanessa Peña for her participation in the Roassal and Mondrian chap-
ters.
We would like to also thank Hernan Wilkinson and Carlos Ferro for their
reviews, Nicolas Cellier for the feedback on the number chapter, and Vassili
Bykov for permission to adapt his Regex documentation
We thank Inria Lille Nord Europe for supporting this open-source project
and for hosting the web site of this book. We also thank Object Profile for
sponsoring the cover.
And last but not least, we also thank the Pharo community for its enthu-
siastic support of this project, and for informing us of the errors found in the
first edition of this book.
We are also grateful to our respective institutions and national research
agencies for their support and offered facilities. In particular, we thank
Program U-INICIA 11/06 VID 2011, University of Chile, and FONDECYT
project 1120094. We also thank the Plomo Équipe Associée.
Part I
Libraries
Chapter 2
Weren’t you fed up not be able to install Pharo from a single command
line or to pass it arguments? Using a nice debugger and an interactive envi-
ronment development does not mean that Pharo developers do not value au-
tomatic scripts and love the command line. Yes we do and we want the best
of both worlds! We really wanted it to free our mind of retaining arbitrary
information. A zero configuration is a script that automatically downloads
everything you need to get started. Since version 2.0, Pharo also supports a
way to define and handle command line arguments.
This chapter shows how to get the zeroconf scripts for Pharo as well as
how you can pass arguments to the environment from the command-line.
wget get.pharo.org/20+vm
If you do not have wget installed you can use curl -L instead.
8 Zero Configuration Scripts and Command-Line Handlers
To execute the script that we just downloaded, you should change its
permissions using chmod a+x or invoke it via bash as follows.
Looking at the help. Now let’s have a look at the script help.
bash 20+vm --help
The help says that the 20+vm command downloads the current virtual
machine and puts it into the pharo-vm folder. In addition, it creates several
scripts: pharo to launch the system, pharo-ui a script to launch the image in
UI mode. Finally, it also downloads the latest image and changes files.
This script downloads the latest Pharo 20 Image.
This script downloads the latest Pharo VM.
Grabbing and executing it. If you just want to directly execute the script
you can also do the following
wget -O - get.pharo.org/20+vm | bash
The option -O - will output the downloaded bash file to standard out, so
we can pipe it to bash. If you do not like the log of web, use --quiet.
wget --quiet -O - get.pharo.org/20+vm | bash
Note for the believers in automated tasks. The scripts are fetched au-
tomatically from our Jenkins server (https://ci.inria.fr/pharo/job/Scripts-download/)
from the gitorious server https://gitorious.org/pharo-build/pharo-build. Yes we be-
lieve in automated tasks that free our energy.
Getting the VM only 9
Figure 2.1 shows the list of scripts available that you can get at
http://get.pharo.org.
Documentation:
A DefaultCommandLineHandler handles default command line arguments and options.
The DefaultCommandLineHandler is activated before all other handlers.
It first checks if another handler is available. If so it will activate the found handler.
10 Zero Configuration Scripts and Command-Line Handlers
The --version argument gives the version of the virtual machine. If you
wish to obtain the version of the image, then you need to open the image,
use the World menu, and select About.
List of available handlers. The command line option --list lists of the cur-
rent option handlers. This list depends on the handlers that are currently
loaded in the system. In particular, it means that you can simply add a han-
dler for your specific situation and wishes.
The following list shows the available handlers.
./pharo Pharo.image --list
Note that this help is the one of the associated handler, not one of the
command line generic system.
Usage: config [--help] <repository url> [<configuration>] [--install[=<version>]] [--
group=<group>] [--username=<username>] [--password=<password>]
--help show this help message
<repository url> A Monticello repository name
<configuration> A valid Metacello Configuration name
<version> A valid version for the given configuration
<group> A valid Metacello group name
<username> An optional username to access the configuration's repository
<password> An optional password to access the configuration's repository
12 Zero Configuration Scripts and Command-Line Handlers
Examples:
# display this help message
pharo Pharo.image config
Evaluating Pharo Expressions. You can use the command line to evaluate
expressions as follows: ./pharo Pharo.image eval '1+2'
./pharo Pharo.image eval --help
Usage: eval [--help] <smalltalk expression>
--help list this help message
<smallltalk expression> a valid Smalltalk expression which is evaluated and
the result is printed on stdout
Documentation:
A CommandLineHandler that reads a string from the command line, outputs the
evaluated result and quits the image.
We then define the commandName on the class side as well as the method
isResponsibleFor:.
EvaluateCommandLineHandler class>>commandName
^ 'eval'
EvaluateCommandLineHandler class>>description
^ 'Directly evaluates passed in one line scripts'
Then we define the method activate which will be executed when the op-
tion matches.
EvaluateCommandLineHandler>>activate
self activateHelp.
self arguments ifEmpty: [ ^ self evaluateStdIn ].
self evaluateArguments.
self quit.
For example here is the command that we use in Jenkins for the project
XMLWriter (which is hosted on PharoExtras).
# Jenkins puts all the params after a / in the job name as well :(
export JOB_NAME=`dirname $JOB_NAME`
REPO=http://smalltalkhub.com/mc/PharoExtras/$JOB_NAME/main
./pharo $JOB_NAME.image config $REPO ConfigurationOf$JOB_NAME --install=
$VERSION --group='Tests'
./pharo $JOB_NAME.image test --junit-xml-output "XML-Writer-.*"
The library for dealing with files in Pharo is called FileSystem. It offers
an expressive and elegant object-oriented design. This chapter presents the
key aspects of the API to cover most of the needs one may have.
FileSystem is the result of long and hard work from many people. FileSys-
tem was originally developed by Colin Putney and the library is distributed
under the MIT license, as for most components of Pharo. Camillo Bruni
made some changes to the original design. Camillo Bruni integrated it into
Pharo with the help of Esteban Lorenzano and Guillermo Polito. This chap-
ter would not exist without the previous work of all the contributors of
FileSystem. We are grateful to all of them.
Notice that children returns the direct files and folders. To recursively ac-
cess all the children of the current directory you should use the message
allChildren as follows:
working allChildren.
'/Users/ducasse/Workspace/FirstCircle/Pharo/20' asFileReference
Note that no error is raised if the string does not point to an existing file.
You can however check whether the file exists or not:
'foobarzork' asFileReference exists
−→ false
All ’.st’ files. Filtering is realized using standard pattern matching on file
name. To find all st files in the working directory, simply execute:
working allChildren select: [ :each | each basename endsWith: 'st' ]
The basename message returns the name of the file from a full name (i.e.,
/foo/gloops.taz basename is 'gloops.taz').
Accessing a given file or directory. Use the slash operator to obtain a ref-
erence to a specific file or directory within your working directory:
| working cache |
working := FileSystem disk workingDirectory.
cache := working / 'package-cache'.
Getting to the parent folder. Navigating back to the parent is easy using
the parent message:
| working cache |
working := FileSystem disk workingDirectory.
cache := working / 'package-cache'.
parent := cache parent.
parent = working
−→ true
−→ '/Users/ducasse/Workspace/FirstCircle/Pharo/20/package-cache'
cache parent fullName
−→ '/Users/ducasse/Workspace/FirstCircle/Pharo/20/'
The methods exists, isFile, isDirectory, and basename are defined on the
FileReference class. Notice that there is no message to get the path without
the basename and that the idiom is to use parent fullName to obtain it. The
message path returns a Path object which is internally used by FileSystem
and is not meant to be publicly used.
Note that FileSystem does not really distinguish between files and folders
which often leads to cleaner code and can be seen as an application of the
Composite design pattern.
FileLocator desktop.
FileLocator home.
FileLocator imageDirectory.
FileLocator vmDirectory.
If you save a location with your image and move the image to a differ-
ent machine or operating system, a location will still resolve to the expected
directory or file. Note that some file locations are specific to the virtual ma-
chine.
Opening read and write Streams 19
Please note that writeStream overrides any existing file and readStream
throws an exception if the file does not exist. Forgetting to close stream is
a common mistake, for which even advanced programmers regularly fall
into. Closing a stream frees low level resources, which is a good thing to do.
The messages readStreamDo: and writeStreamDo: frees the programmer from
explicitly closing the stream. Consider:
| working |
working := FileSystem disk workingDirectory.
working / 'foo.txt' writeStreamDo: [ :stream | stream nextPutAll: 'Hello World' ].
working / 'foo.txt' readStreamDo: [ :stream | stream contents ].
Keep in mind that file may be easily overridden without giving any warn-
ing. Consider the following situation:
| working |
working := FileSystem disk workingDirectory.
working / 'authors.txt' readStreamDo: [ :stream | stream contents ].
−→ 'stephane alexandre damien jannik'
We can also use the message openFilestream: aString writable: aBoolean to get
a stream with the corresponding write status.
20 Files with FileSystem
| stream |
stream := FileSystem disk openFileStream: 'authors.txt' writable: true.
stream nextPutAll: 'stephane alexandre damien jannik'.
| working |
working := FileSystem disk workingDirectory.
working / 'bar.txt' readStreamDo: [ :stream | stream contents ].
−→ 'Hello World'
| working |
working := FileSystem disk workingDirectory.
working / 'foo.txt' renameTo: 'skweek.txt'.
| working |
working := FileSystem disk workingDirectory.
working / 'skweek.txt' readStreamDo: [ :stream | stream contents ].
−→ 'Hello World'
Copy everything. You can copy the contents of a directory using the mes-
sage copyAllTo:. Here we copy the complete package-cache to the backup di-
rectory using copyAllTo::
cache copyAllTo: backup.
Note that before copying the target directory is created if it does not exist.
| pf |
pf := (FileSystem disk workingDirectory / 'package-cache' ) children second.
−→ /Users/ducasse/Pharo/PharoHarvestingFixes/20/package-cache/AsmJit-
IgorStasenko.66.mcz
pf fullName
−→ '/Users/ducasse/Pharo/PharoHarvestingFixes/20/package-cache/AsmJit-
IgorStasenko.66.mcz'
pf basename
−→ 'AsmJit-IgorStasenko.66.mcz'
22 Files with FileSystem
pf basenameWithoutExtension
−→ 'AsmJit-IgorStasenko.66'
pf base
−→ 'AsmJit-IgorStasenko'
pf extension
−→ 'mcz'
pf extensions
−→ an OrderedCollection('66' 'mcz')
pf pathSegments
−→ #('Users' 'ducasse' 'Pharo' 'PharoHarvestingFixes' '20' 'package-cache' '
AsmJit-IgorStasenko.66.mcz')
pf path
−→ Path / 'Users' / 'ducasse' / 'Pharo' / 'PharoHarvestingFixes' / '20' / 'package-
cache' / 'AsmJit-IgorStasenko.66.mcz'
Sizes. FileReference provides also some way to access the size of the file.
pf humanReadableSize
−→ '182.78 kB'
pf size
−→ 182778
The main entry point: FileReference 23
File Information. You can get limited information about the file entry itself
using creationTime and permissions. To get the full information you should
access the entry itself using the message entry.
| pf |
pf := (FileSystem disk workingDirectory / 'package-cache' ) children second.
pf creationTime.
−→ 2012-06-10T10:43:19+02:00
pf modificationTime.
−→ 2012-06-10T10:43:19+02:00
pf permissions
−→ rw-r--r--
Entries are objects that represent all the metadata of a single file.
| pf |
pf := (FileSystem disk workingDirectory / 'package-cache' ) children second.
pf entry
pf parent entries
"returns all the entries of the children of the receiver"
Operating on files
There are several operations on files.
Deleting. delete, deleteAll, deleteAllChildren, all delete the receiver and raise an
error if it does not exist. delete deletes the file, deleteAll deletes the directory
and its contents , deleteAllChildren (which only deletes children of a directory).
In addition, deleteIfAbsent: executes a block when the file does not exist.
Finally ensureDelete deletes the file but does not raise error if the file does
not exist. Similarly ensureDeleteAllChildren, ensureDeleteAll do not raise excep-
tion when the receiver does not exist.
(FileSystem disk workingDirectory / 'paf') delete.
−→ error
(FileSystem disk workingDirectory / 'fooFolder') deleteAll.
−→ error
(FileSystem disk workingDirectory / 'fooFolder') ensureCreateDirectory.
(FileSystem disk workingDirectory / 'fooFolder') deleteAll.
(FileSystem disk workingDirectory / 'paf') deleteIfAbsent: [Warning signal: 'File did not
exist'].
Moving/Copying files around. We can move files around using the mes-
sage moveTo: which expects a file reference.
(FileSystem disk workingDirectory / 'targetFolder') exist
−→ false
(FileSystem disk workingDirectory / 'paf') exist
−→ false
(FileSystem disk workingDirectory / 'paf' ) moveTo: (FileSystem disk workingDirectory / '
targetFolder')
−→ Error
Besides moving files, we can copy them. We can also use copyAllTo: to
copy files. Here, we copy the files contained in the source folder to the target
one.
The message copyAllTo: performs a deep copy of the receiver, to a location
specified by the argument. If the receiver is a file, the file is copied. If the re-
ceiver is a directory, the directory and its contents will be copied recursively.
The main entry point: FileReference 25
The argument must be a reference that does not exist; it will be created by
the copy.
(FileSystem disk workingDirectory / 'sourceFolder') createDirectory.
(FileSystem disk workingDirectory / 'sourceFolder' / 'pif') ensureCreateFile.
(FileSystem disk workingDirectory / 'sourceFolder' / 'paf') ensureCreateFile.
(FileSystem disk workingDirectory / 'targetFolder') createDirectory.
(FileSystem disk workingDirectory / 'sourceFolder') copyAllTo: (FileSystem disk
workingDirectory / 'targetFolder').
(FileSystem disk workingDirectory / 'targetFolder' / 'pif') exists.
−→ true
(FileSystem disk workingDirectory / 'targetFolder' / 'paf') exists.
−→ true
Locator
Locators are late-bound references. They are left deliberately fuzzy, and are
only resolved to a concrete reference when some file operation is performed.
Instead of a filesystem and path, locators are made up of an origin and a
path. An origin is an abstract filesystem location, such as the user’s home
directory, the image file, or the VM executable. When it receives a message
like isFile, a locator will first resolve its origin, then resolve its path against
the origin.
Locators make it possible to specify things like "an item named ’package-
cache’ in the same directory as the image file" and have that specification
remain valid even if the image is saved and moved to another directory, pos-
sibly on a different computer.
locator := FileLocator imageDirectory / 'package-cache'.
locator printString. −→ ' {imageDirectory}/package-cache'
locator resolve. −→ /Users/ducasse/Pharo/PharoHarvestingFixes/20/
package-cache
locator isFile. −→ false
26 Files with FileSystem
• desktop - the directory that holds the contents of the user’s desktop
• documents - the directory where the user’s documents are stored (e.g.
’/Users/colin/Documents’)
Applications may also define their own origins, but the system will not
be able to resolve them automatically. Instead, the user will be asked to man-
ually choose a directory. This choice is then cached so that future resolution
requests will not require user interaction.
absolutePath vs. path. The message absolutePath returns the absolute path
of the receiver. When the file reference is not virtual the messages path and
absolutePath provide similar results. When the file is a late bound reference
(instance of FileLocator), absolutePath resolves the file and returns the absolute
path, while path returns an unresolved file reference as shown below.
(FileLocator image parent / 'package-cache') path
−→ {image}/../package-cache
References and Locators also provide simple methods for dealing with
whole directory trees.
Looking at FileSystem internals 27
FileSystem
A filesystem is an interface to access hierarchies of directories and files.
"The filesystem," provided by the host operating system, is represented by
DiskStore and its platform-specific subclasses. However, the user should not
access them directly but instead use FileSystem as we showed previously.
Other kinds of filesystems are also possible. The memory filesystem pro-
vides a RAM disk filesystem where all files are stored as ByteArrays in the
image. The zip filesystem represents the contents of a zip file.
Each filesystem has its own working directory, which is used to resolve
any relative paths that are passed to it. Some examples:
fs := FileSystem memory.
fs workingDirectoryPath: (Path / 'plonk').
griffle := Path / 'plonk' / 'griffle'.
nurp := Path * 'nurp'.
fs resolve: nurp.
−→ Path/plonk/nurp
Path
Paths are the most fundamental element of the FileSystem API. They rep-
resent filesystem paths in a very abstract sense, and provide a high-level
protocol for working with paths without having to manipulate strings. Here
are some examples showing how to define absolute paths (/), relative paths
(*), file extension (,), parent navigation (parent). Normally you do not need to
use Path but here are some examples.
| fs griffle nurp |
fs := FileSystem memory.
griffle := fs referenceTo: (Path / 'plonk' / 'griffle').
nurp := fs referenceTo: (Path * 'nurp').
griffle isFile.
−→ false
griffle isDirectory.
−→ false
griffle parent ensureCreateDirectory.
griffle ensureCreateFile.
griffle exists & griffle isFile.
−→ true
griffle copyTo: nurp.
nurp exists.
−→ true
griffle delete
"absolute path"
Path / 'plonk' / 'feep' −→ /plonk/feep
"relative path"
Path * 'plonk' / 'feep' −→ plonk/feep
"parent directory"
(Path / 'plonk' / 'griffle') parent −→ /plonk
"resolving a string"
(Path * 'griffle') resolve: 'plonk' −→ griffle/plonk
"comparing"
(Path / 'plonk') contains: (Path / 'griffle' / 'nurp')
−→ false
Note that some of the path protocol (messages like /, parent and resolve:)
are also available on references.
Visitors
The above methods are sufficient for many common tasks, but application
developers may find that they need to perform more sophisticated opera-
tions on directory trees.
The visitor protocol is very simple. A visitor needs to implement visitFile:
and visitDirectory:. The actual traversal of the filesystem is handled by a guide.
A guide works with a visitor, crawling the filesystem and notifying the vis-
itor of the files and directories it discovers. There are three Guide classes,
PreorderGuide, PostorderGuide and BreadthFirstGuide , which traverse the filesys-
tem in different orders. To arrange for a guide to traverse the filesystem with
a particular visitor is simple. Here’s an example:
BreadthFirstGuide show: aReference to: aVisitor
Sockets
written by:
Noury Bouraqadi ([email protected])
Luc Fabresse ([email protected])
Socket
A remote communication involves at least two system processes exchang-
ing some data bytes through a network. Each process accesses the network
32 Sockets
through at least one socket (see Figure 4.1). A socket can then be defined as
a plug on a communication network.
showing the use of client sockets to interact with a web server. Next, Sec-
tion 4.3 presents server sockets. We describe their life-cycle and how to use
them to implement a server that can handle concurrent connections. Last,
we introduce socket streams in Section 4.4. We give an overview of their
benefits by describing their use on both client and server side.
which is the generic address to refer to the machine that runs your software
(Pharo here).
Now we can connect our TCP socket to the server as shown in Script 4.2.
Message connectTo:port: attempts to connect the socket to the server using the
server address and port provided as parameters. The server address refers
to the address of the network interface (e.g. ethernet, wifi) used by the server.
The port refers to the communication endpoint on the network interface.
Each network interface has for each IP transport protocol (e.g. TCP, UDP)
a collection of ports that are numbered from 0 to 65535. For a given protocol,
a port number on an interface can only be used by a single process.
Script 4.3: Exchanging Data with some Server through a TCP Socket.
| clientSocket data |
... "create and connect the TCP clientSocket"
clientSocket sendData: 'Hello server'.
data := clientSocket receiveData.
... "Process data"
Script 4.3 shows the protocol to send and receive data through a client
socket. Here, we send the string 'Hello server!' to the server using the sendData:
message. Next, we send the receiveData message to our client socket to read
the answer. Note that reading the answer is blocking, meaning receiveData
returns when a response has been read. Then, the contents of variable data is
processed.
Script 4.4: Bounding the Maximum Time for Data Reception.
|clientSocket data|
... "create and connect the TCP clientSocket"
[data := clientSocket receiveDataTimeout: 5.
... "Process data"
] on: ConnectionTimedOut
do: [ :timeOutException |
self
crLog: 'No data received!';
crLog: 'Network connection is too slow or server is down.']
Note that by using receiveData, the client waits until the server ei-
ther sends no more data, or closes the connection. This means that the
client may wait indefinitely. An alternative is to have the client signal a
ConnectionTimedOut exception if client had waited for too long as shown in
Script 4.4. We use message receiveDataTimeout: to ask the client socket to wait
for 5 seconds. If data is received during this period of time, it is processed
silently. But if no data is received during the 5 seconds, a ConnectionTimedOut
is signaled. In the example we log a description of what happened.
Close a Socket
A TCP socket remains alive while devices at both ends are connected. A
socket is closed by sending the message close to it. The socket remains con-
nected until the other side closes it. This may last indefinitely when there
is a network failure or when the other side is down. This is why sockets
also accept the destroy message, which frees system resources required by
the socket.
In practice we use closeAndDestroy. It first attempts to close the socket by
sending the close message. Then, if the socket is still connected after a du-
ration of 20 seconds, the socket is destroyed. Note that there exist a variant
36 Sockets
5. Close interactionSocket.
6. Close connectionSocket when we decide to kill the server and stop ac-
cepting client connections.
while exchanging data with possibly multiple clients through multiple in-
teractionSockets (one per client). In the following, we first illustrate the socket
serving machinery. Then, we describe a complete server class and explain
the server life-cycle and related concurrency issues.
"Build a new socket for interaction with a client which connection request is accepted"
interactionSocket := connectionSocket waitForAcceptFor: 60.
"Get rid of the connection socket since it is useless for the rest of this example"
connectionSocket closeAndDestroy.
First, we create the socket that we will use for handling incoming con-
nections. We configure it to listen on port 9999. The backlogSize is set to 10,
meaning that we ask the Operating System to allocate a buffer for 10 connec-
tion requests. This backlog will not be actually used in our example. But, a
more realistic server will have to handle multiple connections and then store
pending connection requests into the backlog.
Once the connection socket (referenced by variable connectionSocket) is set
up, it starts listening for client connections. The waitForAcceptFor: 60 message
makes the socket wait connection requests for 60 seconds. If no client at-
tempts to connect during these 60 seconds, the message answers nil. Other-
wise, we get a new socket interactionSocket connected the client’s socket. At
this point, we do not need the connection socket anymore, so we can close it
(connectionSocket closeAndDestroy message).
TCP Server 39
Since the interaction socket is already connected to the client, we can use
it to exchange data. Messages receiveData and sendData: presented above (see
Section 4.2) can be used to achieve this goal. In our example, we wait for
data from the client and next display it on the Transcript. Lastly, we send it
back to the client prefixed with the 'ECHO: ' string, finishing the interaction
with the client by closing the interaction socket.
There are different options to test the server of Script 4.6. The first simple
one is to use the nc (netcat) utility discussed in Section 4.5. First run the
server script in a workspace. Then, in a terminal, evaluate the following
command line:
As a result, on the Transcript of the Pharo image, the following line should
be displayed:
Hello Pharo
A pure Pharo alternative relies on using two different images: one that
runs the server code and the other for client code. Indeed, since our exam-
ples run within the user interaction process, the Pharo UI will be frozen at
some points, such as during the waitForAcceptFor:. Script 4.7 provides the code
to run on the client image. Note that you have to run the server code first.
Otherwise, the client will fail. Note also that after the interaction, both the
client and the server terminate. So, if you want to run the example a second
time you need to run again both sides.
As we can see in the definition labelled class 4.8, the EchoServer declares
three instance variables. The first one (connectionSocket) refers to the socket
used for listening to client connections. The two last instance variables
(isRunning holding a boolean and isRunningLock holding a Mutex) are used to
manage the server process life-cycle while dealing with synchronization is-
sues.
The isRunning instance variable is a flag that is set to true while the server
is running. As we will see below, it can be accessed by different processes.
Therefore, we need to ensure that the value can be read in presence of multi-
ple write accesses. This is achieved using a lock (isRunningLock instance vari-
able) that guarantees that isRunning is accessed by only by a single process
each time.
4.4 SocketStream
Script 4.17: Getting the first line of a web page using SocketStream.
| stream httpQuery result |
stream := SocketStream
openConnectionToHostNamed: 'www.pharo-project.org'
port: 80.
httpQuery := 'GET / HTTP/1.1', String crlf,
'Host: www.pharo-project.org:80', String crlf,
'Accept: text/html', String crlf.
[ stream sendCommand: httpQuery.
stream nextLine crLog ] ensure: [ stream close ]
The first line creates a stream that encapsulates a newly created socket
connected to the provided server. It is the responsibility of message
openConnectionToHostNamed:port:. It suspends the execution until the connec-
tion with the server is established. If the server does not respond, the socket
stream signals a ConnectionTimedOut exception. This exception is actually sig-
naled by the underlying socket. The default timeout delay is 45 seconds
(defined in method Socket class»standardTimeout). One can choose a different
value using the SocketStream»timeout: method.
Once our socket stream is connected to the server, we forge and send an
HTTP GET query. Notice that compared to script 4.5 (page 36), we skipped
one final String crlf (Script 4.17). This is because the SocketStream»sendCommand:
method automatically inserts CR and LF characters after sending data to
mark line ending.
Reception of the requested web page is triggered by sending the nextLine
message to our socket stream. It will wait for a few seconds until data is
received. Data is then displayed on the transcript. We safely ensure that the
connection is closed.
In this example, we only display the first line of response sent by the
server. We can easily display the full response including the html code by
sending the upToEnd message to our socket stream. Note however, that you
will have to wait a bit longer compared to displaying a single line.
A server relying on socket streams still uses a socket for handling incom-
ing connection requests. Socket streams come into action once a socket is
created for interaction with a client. The socket is wrapped into a socket
stream that eases data exchange using messages such as sendCommand: or
nextLine. Once we are done, we close and destroy the socket handling con-
nections and we close the interaction socket stream. The latter will take care
of closing and destroying the underlying interaction socket.
openConnectionToHostNamed: 'localhost'
port: 9999.
interactionStream binary.
interactionStream nextPutAllFlush: #[65 66 67].
interactionStream upToEnd.
Note that the client manages strings (ascii mode) or byte arrays (binary
mode) have no impact on the server. Indeed in ascii mode, the socket stream
handles instances of ByteString. So, each character maps to a single byte.
Delimiting Data
SocketStream acts simply as a gateway to some network. It sends or reads
bytes without giving them any semantics. The semantics, that is the organi-
zation and meaning of exchanged data should be handled by other objects.
Developers should decide on a protocol to use and enforce on both interact-
ing sides to have correct interaction.
A good practice is to reify a protocol that is to materialize it as an ob-
ject which wraps a socket stream. The protocol object analyzes exchanged
data and decides accordingly which messages to send to the socket stream.
Involved entities in any conversation need a protocol that defines how to or-
ganize data into a sequence of bytes or characters. Senders should conform
to this organization to allow receivers to extract valid data from received
sequence of bytes.
One possible solution is to have a set of delimiters inserted between bytes
or characters corresponding to each data. An example of delimiter is the se-
quence of ASCII characters CR and LF. This sequence is considered so use-
ful that the developers of the SocketStream class introduced the sendCommand:
message. This method (illustrated in script 4.5) appends CR and LF after sent
data. When reading CR followed by LF the receiver knows that the received
sequence of characters is complete and can be safely converted into valid
data. A facility method nextLine (illustrated in script 4.17) is implemented by
SocketStream to perform reading until the reception of CR+LF sequence. One
can however use any character or byte as a delimiter. Indeed, we can ask a
socket stream to read all characters/bytes up to some specific one using the
upTo: message.
The advantage of using delimiters is that it handles data of arbitrary size.
The cons is that we need to analyze received bytes or characters to find out
the limits, which is resource consuming. An alternative approach is to ex-
change bytes or characters organized in chunks of a fixed size. A typical use
of this approach is for streaming audio or video contents.
nc (netcat)
nc allows one to set up either a client or a server for both TCP (default pro-
tocol) and UDP. It redirects the content of its stdin to the other side. The fol-
lowing snippet shows how to send 'Hello from a client' to a server on the local
machine listening on port 9090.
echo Hello from a client | nc 127.0.0.1 9090
The command line below starts a server listening on port 9090 that sends
'Hi from server' to the first client to connect. It terminates after the interaction.
You can keep the server running by means of option -k. But, the string
produced by the preceding echo is sent only to the first client to connect. An
alternative solution is to make the nc server send text while you type. Simply
evaluate the following command line:
echo nc -lk 9090
Type in some text in the same terminal where you started the server.
Then, run a client in another terminal. Your text will be displayed on the
client side. You can repeat these two last actions (type text at the server side,
then start client) as many times as needed.
You can even go more interactive by making the connection between a
client and a server more persistent. By evaluating the following command
4 http://zn.stfx.eu/zn/index.html
Chapter summary 49
line, the client sends every line (ended with "Enter"). It will terminate when
sending the EOF signal (ctl-D).
echo cat | nc -l 9090
netstat
lsof
The lsof command lists all files open in your system. This of course includes
sockets, since everything is a file in Unix. Why is lsof useful, you would ask,
if we already have netstat? The answer is that lsof shows the link between
processes and sockets. So you can find sockets related to your program.
The example provided by following command line lists TCP sockets. The
n and P options force lsof to display host addresses and ports as numbers.
• Messages sendData: and receiveData are the socket primitives to send and
receive data.
50 Sockets
5 http://smalltalkhub.com/#!/~CAR/rST/
Chapter 5
The control flow of a subsystem does not involve Settings. This is the ma-
jor point of difference between Settings and the preference system available
in Pharo1.0.
Vocabulary
Figure 5.1 shows important points of the architecture put in place by Set-
tings: The Settings package can be unloaded and a package defining pref-
erences does not depend on the Settings package. This architecture is sup-
ported by the following points:
SettingBrowser open
The settings are presented in several trees in the middle panel. Setting search-
ing and filtering is available from the top tool-bar whereas the bottom panels
show currently selected setting descriptions (left bottom panel) and current
package set (right bottom panel).
54 The Settings Framework
Setting declarations are organized in trees which can be browsed in the mid-
dle panel. To get a description for a setting, just click on it: the setting is
selected and the left bottom panel is updated with information about the
selected setting.
Changing a preference value is simply done through the browser: each
line holds a widget on the right with which you can update the value. The
kind of widget depends on the actual type of the preference value. Whereas
a preference value can be of any kind, the setting browser is currently able
to present a specific input widget for the following types: Boolean, Color, File-
Name, DirectoryName, Font, Number, Point and String. A drop-list, a pass-
word field or a range input widget using a slider can also be used. Of course,
the list of possible widgets is not closed as it is possible to make the setting
browser support new kinds of preference values or use different input wid-
gets. This point is explained in Section 5.8.
If the actual type of a setting is either String, FileName, Directory-
Name,Number or Point, to change a value, the user has to enter some text
in an editable drop-list widget. In such a case, the input must be confirmed
The Settings Browser 55
by hitting the return key (or with cmd-s). If such a setting value is changed
often, the drop-list widget comes in handy because you can retrieve and use
previously entered values in one click! Moreover, in case of a FileName or
a DirectoryName, a button is added to open a file name or a directory name
chooser dialog.
Other possible actions are all accessible from the contextual menu. De-
pending on the selected setting, they may be different. The two possible
versions are shown in Figure 5.3.
• Expand all (a): expand all the setting tree nodes recursively. It is also
accessible via the keyboard shortcut cmd-a.
• Collapse all (a): collapse all the setting tree nodes recursively. It is also
accessible via the keyboard shortcut cmd-A.
• Expand all from here: Expand the currently selected setting tree node
recursively.
• Browse (b): open a system browser on the method that declares the
setting. It is also accessible via the keyboard shortcut cmd-b or if you
double-click on a setting. It is very handy if you want to change the
setting implementation or simply see how it is implemented to under-
stand the framework by investigating some examples (how to declare
a setting is explained in Section 5.3).
• Display export action string: a setting can be exported as a start-up
action, this menu option allow to display how the start-up action is
coded (Start-up action management is explained in Section 5.7).
• Set to default (d): set the selected setting value to the default one. It is
useful if, as an example, you have played with a setting to observe its
effect and finally decide to come back to its default.
• Empty list (e): If the input widget is an editable drop-list, this menu
item allows one to forget previously entered values by emptying the
recorded list.
56 The Settings Framework
on Settings and that you will be able to remove Setting if you want to define
extremely small footprint applications.
To define a setting for this preference (i.e., for the CaseSensitiveFinds class
variable) and be able to see it and change it from the Settings Browser, the
method below is implemented. The result is shown in the screenshot of the
Figure 5.4.
CodeHolderSystemSettings class>>caseSensitiveFindsSettingsOn: aBuilder
<systemsettings>
(aBuilder setting: #caseSensitiveFinds)
target: TextEditor;
label: 'Case sensitive search' translated;
description: 'If true, then the "find" command in text will always make its searches in
a case-sensitive fashion' translated;
parent: #codeEditing.
The header
The pragma
A setting declaration is tagged with the <systemsettings> pragma.
CodeHolderSystemSettings class>>caseSensitiveFindsSettingsOn: aBuilder
<systemsettings>
...
In fact, when the settings browser is opened, it first collects all settings dec-
larations by searching all methods with the <systemsettings> pragma. In addi-
tion, if you compile a setting declaration method while a Settings Browser is
opened then it is automatically updated with the new setting.
argument is considered as the selector used by the Settings Browser to get the
preference value. The selector for changing the preference value is by default
built by adding a colon to the getter selector (i.e., it is caseSensitiveFinds: here).
These selectors are sent to a target which is by default the class in which the
method is implemented (i.e., CodeHolderSystemSettings). Thus, this one line
setting declaration is sufficient if caseSensitiveFinds and caseSensitiveFinds: ac-
cessors are implemented in CodeHolderSystemSettings.
In fact, very often, these default initializations will not fit your need. Of
course you can adapt the setting node configuration to take into account
your specific situation. For example, the corresponding getter and setter ac-
cessors for the caseSensitiveFinds setting are implemented in the class TextEdi-
tor. Then, we should explicitly set that the target is TextEditor. This is done by
sending the message target: to the setting node with the target class TextEditor
passed as argument as shown by the updated definition:
CodeHolderSystemSettings class>>caseSensitiveFindsSettingsOn: aBuilder
<systemsettings>
(aBuilder setting: #caseSensitiveFinds)
target: TextEditor
This very short version is fully functional and enough to be compiled and
taken into account by the Settings Browser as shown by Figure 5.5.
• the label shown in the settings browser is the identifier (the symbol
used to build accessors to access it),
• the new setting is simply added at the root of the setting tree.
Don’t forget to send translated to the label and the description strings, it
will greatly facilitate the translation into other languages.
Concerning the classification and the settings tree organization, there are
several ways to improve it. This point is fully detailed in the next section.
One can use this expression to configure the target of a corresponding setting.
As an example the #glyphContrast preference could be declared as follow:
(aBuilder setting: #glyphContrast)
target: FreeTypeSettings current;
label: 'Glyph contrast' translated;
...
This is simple, but unfortunately, declaring such a singleton target like this
is not a good idea. This declaration is not compatible with the Setting style
functionalities (see Section ??). In such a case, one would have to separately
indicate the target class and the message selector to send to the target class
to get the singleton. Thus, as shown in the example below, you should use
the targetSelector: message:
(aBuilder setting: #glyphContrast)
target: FreeTypeSettings;
Organizing your settings 61
targetSelector: #current;
label: 'Glyph contrast' translated;
...
The way the Settings Browser builds a setting input widget depends on the
actual value type of a preference. Having nil as a value for a preference is a
problem for the Settings Browser because it can’t figure out which input wid-
get to use. So basically, to be properly shown with the good input widget,
a preference must always be set with a non nil value. You can set a default
value to a preference by initializing it as usual, with a #initialize method or
with a lazy initialization programed in the accessor method of the prefer-
ence.
Regarding the Settings Browser, the best way is the lazy initialization (see
the example of the #caseSensitiveFinds preference given in Section 5.3). In-
deed, as explained in Section 5.2, from the Settings Browser contextual menu,
you can reset a preference value to its default or globally reset all preference
values. In fact, it is done by setting the preference value to reset to nil. As a
consequence, the preference is automatically set to its default value as soon
as it is get by using its dedicated accessor.
It is not always possible to change the way an accessor is implemented.
A reason for that could be that the preference accessor is maintained within
another package which you are not allowed to change. As shown in the
example below, as a workaround, you can indicate a default value from the
declaration of the setting by sending the message default: to the setting node:
Within the Settings Browser, settings are organized in trees where related set-
tings are shown as children of the same parent.
62 The Settings Framework
Declaring a parent
The simplest way to declare your setting as a child of another setting is to
use the parent: message with the identifier of the parent setting passed as ar-
gument. In the example below, the parent node is an existing node declared
with the #codeEditing identifier.
CodeHolderSystemSettings class>>caseSensitiveFindsSettingsOn: aBuilder
<systemsettings>
(aBuilder setting: #caseSensitiveFinds)
target: TextEditor;
label: 'Case sensitive search' translated;
description: 'If true, then the "find" command in text will always make its searches in
a case-sensitive fashion' translated;
parent: #codeEditing.
The #codeEditing node is also declared somewhere in the system. For example,
it could be defined as a group as we will see now.
Declaring a group
A group is a simple node without any value and which is only used for chil-
dren grouping. The node identified by #codeEditing is created by sending the
group: message to the builder with its identifier passed as argument. Notice
also that, as shown in Figure 5.4, the #codeEditing node is not at root because
it has declared itself as a child of the #codeBrowsing node.
CodeHolderSystemSettings class>>codeEditingSettingsOn: aBuilder
<systemsettings>
(aBuilder group: #codeEditing)
label: 'Editing' translated;
parent: #codeBrowsing.
Declaring a sub-tree
Being able to declare its own settings as a child of a pre-existing node is very
useful when a package wants to enrich existing standard settings. But it can
also be very tedious for settings which are very application specific.
Thus, directly declaring a sub-tree of settings in one method is also pos-
sible. Typically, a root group is declared for the application settings and the
children settings themselves are also declared within the same method. This
is simply done through the sending of the with: message to the root group.
The with: message takes a block as argument. In this block, all new settings
are implicitly declared as children of the root group (the receiver of the with:
message).
Organizing your settings 63
Figure 5.6: Declaring a subtree in one method: the Configurable formatter set-
ting example.
As an example, take a look at Figure 5.6, it shows the settings for the
refactoring browser configurable formatter. This sub-tree of settings is fully
declared in the method RBConfigurableFormatter class>>settingsOn: given below.
You can see that it declares the new root group #configurableFormatter with two
children, #formatCommentWithStatements and #indentString:
RBConfigurableFormatter class>>settingsOn: aBuilder
<systemsettings>
(aBuilder group: #configurableFormatter)
target: self;
parent: #refactoring;
label: 'Configurable Formatter' translated;
description: 'Settings related to the formatter' translated;
with: [
(aBuilder setting: #formatCommentWithStatements)
label: 'Format comment with statements' translated.
(aBuilder setting: #indentString)
label: 'Indent string' translated]
Optional sub-tree
• on the left, the Gradient widget is unchecked, meaning that its actual
value is false; in this case, it has no children,
• on the right, the Gradient widget is checked, then the setting value is
set to true and as a consequence, the settings useful to set a gradient
background are shown.
64 The Settings Framework
appearanceSettingsOn: aBuilder
<systemsettings>
(aBuilder group: #appearance)
label: 'Appearance' translated;
description: 'All settings concerned with the look''n feel of your system' translated;
noOrdering;
with: [... ]
You can indicate the order of a setting node among its siblings by sending
the message order: to it with a number passed as argument. The number can
be an Integer or a Float. Nodes with an order number are always placed before
others and are sorted according to their respective order number. If an order
is given to an item, then no ordering is applied for other siblings.
As an example, take a look at how the #standardFonts group is declared:
(aBuilder group: #standardFonts)
label: 'Standard fonts' translated;
target: StandardFonts;
parent: #appearance;
with: [
(aBuilder launcher: #updateFromSystem)
order: 1;
targetSelector: #current;
script: #updateFromSystem;
label: 'Update fonts from system' translated.
(aBuilder setting: #defaultFont)
label: 'Default' translated.
(aBuilder setting: #codeFont)
label: 'Code' translated.
(aBuilder setting: #listFont)
...
allowed. In these cases, it is much more comfortable if the widget can only
accept particular values. To address this issue, the domain value set can be
constrained either with a range or with a list of values.
Its value is an integer, but it makes no sense to set -100 or 5000 to it. In-
stead, a minimum of -5 and a maximum of 100 constitute a good range of
values. One can use this range to constrain the setting widget. As shown by
the example below, comparing it to a simple setting, the only two differences
are that:
• the new setting node is created with the range: message instead of the
setting: message and
• the valid range is given by sending the range: message to the setting
node, an Interval is given as argument;
screenMarginSettingOn: aBuilder
<systemsettings>
(aBuilder range: #fullScreenMargin)
target: SystemWindow;
parent: #windows;
label: 'Full screen margin' translated;
description: 'Specify the amount of space that is left around a windows when it''s
opened fullscreen' translated;
range: (-5 to: 100).
The example below shows a simplified declaration for the window position
strategy setting.
windowPositionStrategySettingsOn: aBuilder
<systemsettings>
(aBuilder pickOne: #usedStrategy)
label: 'Window position strategy' translated;
target: RealEstateAgent;
domainValues: #(#'Reverse Stagger' #Cascade #Standard)
• the new setting node is created with the pickOne: message instead of the
#setting: message and
Concerning this window strategy example, the value set to the preference
would be either #’Reverse Stagger’ or #Cascade or #Standard.
Unfortunately, these values are not very handy. A programmer may ex-
pect another value. For example, some kind of strategy object or a Symbol
which could directly serve as a selector. In fact, this second solution has
been chosen by the RealEstateAgent class maintainers. If you inspect the value
returned by RealEstateAgent usedStrategy you will realize that the result is not
a Symbol among #’Reverse Stagger’, #Cascade, or #Standard but another symbol.
Then, if you look at the way the window position strategy setting is really
implemented you will see that the declaration differs from the basic solu-
tion given previously: the domainValues: argument is not a simple array of
Symbols but an array of Associations as you can see in the declaration below:
windowPositionStrategySettingsOn: aBuilder
<systemsettings>
(aBuilder pickOne: #usedStrategy)
68 The Settings Framework
...
domainValues: {'Reverse Stagger' translated -> #staggerFor:initialExtent:world:. '
Cascade' translated -> #cascadeFor:initialExtent:world:. 'Standard' translated ->
#standardFor:initialExtent:world:};
From the Settings Browser point of view, the content of the list is exactly
the same and the user can not notice any difference because, if an array of
Associations is given as argument to domainValues:, then the keys of the Asso-
ciations are used for the user interface.
Concerning the value of the preference itself, if you inspect
RealEstateAgent usedStrategy, you should notice that the result is a value
among #staggerFor:initialExtent:world:, #cascadeFor:initialExtent:world: and #stan-
dardFor:initialExtent:world:. In fact, the values of the Associations are used to
compute all possible real values for the setting.
The list of possible values can be of any kind. As another example, let’s
take a look at the way the user interface theme setting is declared in the
PolymorphSystemSettings class:
It is possible to run this script from the Settings Browser. The corresponding
launcher is shown in Figure 5.10. The integration of such a launcher is quite
Start-up actions management 69
simple. You simply have to declare a setting for it! For example, look at how
the launcher for the TT fonts is declared:
GraphicFontSettings class>> standardFontsSettingsOn:
<systemsettings>
(aBuilder group: #standardFonts)
...
(aBuilder launcher: #updateFromSystem) ...
target: FreeTypeFontProvider;
targetSelector: #current;
script: #updateFromSystem;
label: 'Update fonts from system' translated.
• the new setting node is created by sending the launcher: message to the
builder and
• the message script: is sent to the setting node with the selector of the
script passed as argument.
Scripting settings
Because preference variables are all accessible with accessor methods, it is
naturally possible to initialize a set of preferences in a simple script. For the
sake of simplicity, let’s implement it in a Setting style.
As an example, a script can be implemented to change the background
color and to set all fonts to a bigger one than the default. Let’s create a Setting
style class for that. We can call it MyPreferredStyle. The script is defined by a
method of MyPreferredStyle. We call this method loadStyle because this selector
is the standard hook for settings related script evaluating.
MyPreferredStyle>>loadStyle
|fn|
"Desktop color"
PolymorphSystemSettings desktopColor: Color white.
"Bigger font"
n := StandardFonts defaultFont. "get the current default font"
f := LogicalFontfamilyName: n familyName pointSize: 12. "font for my preferred size"
StandardFonts setAllStandardFontsTo: f "reset all fonts"
implement a method named styleName on the class side of your style class.
Concerning the example of previous section, it should be implemented as
follows:
MyPreferredStyle class>>styleName
"The style name used by the SettingBrowser"
<settingstyle>
^ 'My preferred style'
Figure 5.11: The dialog for loading style with your own style
The Figure 5.12 shows these setting declarations in the Settings Browser. The
look and feel is clean but in fact two observations can be made:
1. it takes three lines for each selection kind. This is a little bit uncomfort-
able because the view for one selection takes a lot of vertical space,
2. the underlying model is not explicitly designed. The settings for one
selection kind are grouped together in the Settings Browser, but cor-
responding preference values are declared as separated instance vari-
ables of ThemeSettings. In the next section we see how to improve this
first solution with a better design.
Figure 5.12: The secondary selection settings declared with basic setting val-
ues
Here, you can notice that the preference is declared as optional and with no
text color.
For these preferences to be changeable from the Settings Browser, we have
to declare two methods. The first one is for the setting declaration and the
second is to implement the view.
The setting declaration is implemented as follow:
TextSelectionPreference class>>selectionPreferenceOn: aBuilder
74 The Settings Framework
<systemsettings>
(aBuilder group: #selectionColors)
label: 'Text selection colors' translated;
parent: #appearance;
target: self;
with: [(aBuilder setting: #primarySelection) order: 1;
label: 'Primary'.
(aBuilder setting: #secondarySelection)
label: 'Secondary'.
(aBuilder setting: #findReplaceSelection)
label: 'Find/replace'.
(aBuilder setting: #selectionBar)
label: 'Selection bar']
As you can see, there is absolutely nothing new in this declaration. The only
thing that changes is that the value of the preferences are of a user defined
class. In fact, in case of user defined or application specific preference class,
the only particular thing to do is to implement one supplementary method
for the view. This method must be named settingInputWidgetForNode: and must
be implemented as a class method.
The method settingInputWidgetForNode: responsibility is to build the input
widget for the Settings Browser. This method takes a SettingDeclaration as argu-
ment. SettingDeclaration is basically a model and its instances are managed by
the Settings Browser.
Each SettingDeclaration instance serves as a preference value holder. In-
deed, each setting that you can view in the Settings Browser is internally rep-
resented by a SettingDeclaration instance.
For each of our text selection preferences, we want to be able to change
their colors and if the selection is optional, have the possibility to enable or
disable their. Regarding the colors, depending on the selection preference
value, only the background color is always shown. Indeed, if the text color
of the preference value is nil, this means that having a text color does not
make sense and then the corresponding color chooser is not built.
The settingInputWidgetForNode: method can be implemented as below:
TextSelectionPreference class>>settingInputWidgetForNode: aSettingDeclaration
| preferenceValue backColorUI usedUI uiElements |
preferenceValue := aSettingDeclaration preferenceValue.
usedUI := self usedCheckboxForPreference: preferenceValue.
backColorUI := self backgroundColorChooserForPreference: preferenceValue.
uiElements := {usedUI. backColorUI},
(preferenceValue textColor
ifNotNil: [ { self textColorChooserForPreference: preferenceValue } ]
ifNil: [{}]).
^ (self theme newRowIn: self world for: uiElements)
cellInset: 20;
Chapter summary 75
yourself
This method simply adds some basic elements in a row and returns the
row. First, you can notice that the actual preference value, an instance of
TextSelectionPreference, is obtained from the SettingDeclaration instance by send-
ing #preferenceValue to it. Then, the user interface elements can be built based
on the actual TextSelectionPreference instance.
The first element is a checkbox or an empty space returned by the #used-
CheckboxForPreference: invocation. This method is implemented as follow:
TextSelectionPreference class>>usedCheckboxForPreference: aSelectionPreference
^ aSelectionPreference optional
ifTrue: [self theme
newCheckboxIn: self world
for: aSelectionPreference
getSelected: #used
setSelected: #used:
getEnabled: #optional
label: ''
help: 'Enable or disable the selection']
ifFalse: [Morph new height: 1;
width: 30;
color: Color transparent]
The next elements are two color choosers. As an example, the background
color chooser is built as follows:
TextSelectionPreference class>>backgroundColorChooserForPreference:
aSelectionPreference
^ self theme
newColorChooserIn: self world
for: aSelectionPreference
getColor: #backgroundColor
setColor: #backgroundColor:
getEnabled: #used
help: 'Background color' translated
Now, in the Settings Browser, the user interface looks as shown in Figure 5.13,
with only one line for each selection kind instead of three as in our previous
version.
Figure 5.13: The text selection settings implemented with a specific prefer-
ence class
If you do not have a web site on your machine, copy a few HTML files to a local
directory to serve as a test bed.
We will develop two classes, WebDir and WebPage, to represent directories
and web pages. The idea is to create an instance of WebDir which will point
to the root directory containing our web site. When we send the message
makeToc, it will walk through the files and directories inside it to build up the
site map. It will then create a new file, called toc.html, containing links to all
the pages in the web site.
One thing we will have to watch out for: each WebDir and WebPage must
remember the path to the root of the web site, so it can properly generate
links relative to the root.
Define the class WebDir with instance variables webDir and homePath, and de-
fine the appropriate initialization method. Also define class-side methods to prompt
the user for the location of the web site on your computer, as follows:
WebDir class>>selectHome
^ self onDir: FileList modalFolderSelector
The last method opens a browser to select the directory to open. Now, if
you inspect the result of WebDir selectHome, you will be prompted for giving
the directory containing your web pages, and you will be able to verify that
2 The original documentation can be found on the class side of RxParser.
Tutorial example — generating a site map 79
webDir and homePath are properly initialized to the directory holding your
web site and the full path name of this directory.
It would be nice to be able to programmatically instantiate a WebDir, so
let’s add another creation method.
Add the following methods and try it out by inspecting the result of
WebDir onPath: ’path to your web site’.
The * (known as the “Kleene star”, after Stephen Kleene, who invented it)
is a regex operator that will match the preceding regex any number of times
(including zero).
'' matchesRegex: 'x*' −→ true
'x' matchesRegex: 'x*' −→ true
'xx' matchesRegex: 'x*' −→ true
'y' matchesRegex: 'x*' −→ false
Now let’s check our regex to see if HTML files work as expected.
80 Regular Expressions in Pharo
Add the following method to WebDir and try it out on your test web site.
WebDir>>htmlFiles
^ webDir fileNames select: [ :each | each matchesRegex: '.*\.html' ]
If you send htmlFiles to a WebDir instance and print it , you should see some-
thing like this:
(WebDir onPath: '...') htmlFiles −→ #('index.html' ...)
WebDir>>htmlFiles
^ webDir fileNames select: [ :each | htmlRegex matches: each ]
Now listing the HTML files should work just as it did before, except that
we reuse the same regex object many times.
Define a class WebPage with instance variables path, to identify the HTML file,
and homePath, to identify the root directory of the web site. (We will need this to
Tutorial example — generating a site map 81
correctly generate links from the root of the web site to the files it contains.) Define
an initialization method on the instance side and a creation method on the class side.
A WebDir instance should be able to return a list of all the web pages it
contains.
Add the following method to WebDir, and inspect the return value to verify that
it works correctly.
WebDir>>webPages
^ self htmlFiles collect:
[ :each | WebPage
on: webDir fullName, '/', each
forHome: homePath ]
String substitutions
That’s not very informative, so let’s use a regex to get the actual file name
for each web page. To do this, we want to strip away all the characters from
the path name up to the last directory. On a Unix file system directories end
with a slash (/), so we need to delete everything up to the last slash in the file
path.
The String extension method copyWithRegex:matchesReplacedWith: does what
we want:
'hello' copyWithRegex: '[elo]+' matchesReplacedWith: 'i' −→ 'hi'
In this example the regex [elo] matches any of the characters e, l or o. The
operator + is like the Kleene star, but it matches exactly one or more instances
of the regex preceding it. Here it will match the entire substring 'ello' and
replay it in a fresh string with the letter i.
WebPage>>fileName
^ path copyWithRegex: '.*/' matchesReplacedWith: ''
Now you should see something like this on your test web site:
(WebDir onPath: '...') webPages collect: [:each | each fileName ]
−→ #('index.html' ...)
Actually, you might have problems if your web pages contain non-ascii
characters, in which case you might be better off with the following code:
WebPage>>contents
^ (FileStream oldFileOrNoneNamed: path)
converter: Latin1TextConverter new;
contents
Now let’s extract the title. In this case we are looking for the text that
occurs between the HTML tags <title> and </title>.
What we need is a way to extract part of the match of a regular expression.
Subexpressions of regexes are delimited by parentheses. Consider the regex
([ˆaeiou]+)([aeiou]+). It consists of two subexpressions, the first of which will
match a sequence of one or more non-vowels, and the second of which will
match one or more vowels. (The operator ˆ at the start of a bracketed set of
characters negates the set. 3 )
3 NB: In Pharo the caret is also the return keyword, which we write as ^. To avoid confu-
sion, we will write ˆ when we are using the caret within regular expressions to negate sets of
characters, but you should not forget, they are actually the same thing.
Tutorial example — generating a site map 83
Now we will try to match a prefix of the string 'pharo' and extract the sub-
matches:
re := '([ˆaeiou]+)([aeiou]+)' asRegex.
re matchesPrefix: 'pharo' −→ true
re subexpression: 1 −→ 'pha'
re subexpression: 2 −→ 'ph'
re subexpression: 3 −→ 'a'
After successfully matching a regex against a string, you can always send
it the message subexpression: 1 to extract the entire match. You can also send
subexpression: n where n − 1 is the number of subexpressions in the regex.
The regex above has two subexpressions, numbered 2 and 3.
We will use the same trick to extract the title from an HTML file.
As HTML does not care whether tags are upper or lower case, so we must
make our regex case insensitive by instantiating it with asRegexIgnoringCase.
Now we can test our title extractor, and we should see something like
this:
(WebDir onPath: '...') webPages first title −→ 'Home page'
The first result would give us an absolute path, which is probably not
what we want.
WebPage>>link
^ '<a href="', self relativePath, '">', self title, '</a>'
If you want to see the site map generation, just add the following methods.
If our web site has subdirectories, we need a way to access them:
WebDir>>webDirs
^ webDir directoryNames
collect: [ :each | WebDir onPath: webDir pathName , '/' , each home: homePath ]
We need to generate HTML bullet lists containing links for each web page
of a web directory. Subdirectories should be indented in their own bullet
list.
WebDir>>printTocOn: aStream
self htmlFiles
ifNotEmpty: [
aStream nextPutAll: '<ul>'; cr.
self webPages
do: [:each | aStream nextPutAll: '<li>';
nextPutAll: each link;
nextPutAll: '</li>'; cr].
self webDirs
do: [:each | each printTocOn: aStream].
aStream nextPutAll: '</ul>'; cr]
We create a file called “toc.html” in the root web directory and dump the
site map there.
Regex syntax 85
WebDir>>tocFileName
^ 'toc.html'
WebDir>>makeToc
| tocStream |
tocStream := (webDir / self tocFileName) writeStream.
self printTocOn: tocStream.
tocStream close.
We have already seen the Kleene star (*) and the + operator. A regular
expression followed by an asterisk matches any number (including 0) of
matches of the original expression. For example:
'ab' matchesRegex: 'a*b' −→ true
'aaaaab' matchesRegex: 'a*b' −→ true
'b' matchesRegex: 'a*b' −→ true
'aac' matchesRegex: 'a*b' −→ false "b does not match"
The Kleene star has higher precedence than sequencing. A star applies to
the shortest possible subexpression that precedes it. For example, ab* means
a followed by zero or more occurrences of b, not “zero or more occurrences
of ab”:
'abbb' matchesRegex: 'ab*' −→ true
'abab' matchesRegex: 'ab*' −→ false
A bit more complex example is the expression c(a|d)+r, which matches the
name of any of the Lisp-style car, cdr, caar, cadr, ... functions:
'car' matchesRegex: 'c(a|d)+r' −→ true
'cdr' matchesRegex: 'c(a|d)+r' −→ true
'cadr' matchesRegex: 'c(a|d)+r' −→ true
Using plus operator, we can build the following binary number recog-
nizer:
'10010100' matchesRegex: '[01]+' −→ true
'10001210' matchesRegex: '[01]+' −→ false
If the first character after the opening bracket is ˆ, the set is inverted: it
matches any single character not appearing between the brackets:
'0' matchesRegex: '[ˆ01]' −→ false
'3' matchesRegex: '[ˆ01]' −→ true
Character classes
Regular expressions can also include the following backquote escapes to re-
fer to popular classes of characters: \w to match alphanumeric characters, \d
to match digits, and \s to match whitespace. Their upper-case variants, \W, \D
and \S, match the complementary characters (non-alphanumerics, non-digits
and non-whitespace). Table 6.1 gives a summary of the syntax seen so far.
As mentioned in the introduction, regular expressions are especially use-
ful for validating user input, and character classes turn out to be especially
useful for defining such regexes. For example, non-negative numbers can be
matched with the regex d+:
'42' matchesRegex: '\d+' −→ true
'-1' matchesRegex: '\d+' −→ false
Better yet, we might want to specify that non-zero numbers should not
start with the digit 0:
'0' matchesRegex: '0|([1-9]\d*)' −→ true
'1' matchesRegex: '0|([1-9]\d*)' −→ true
'42' matchesRegex: '0|([1-9]\d*)' −→ true
'099' matchesRegex: '0|([1-9]\d*)' −→ false "leading 0"
Regex syntax 89
Floating point numbers should require at least one digit after the dot:
'0' matchesRegex: '(0|((\+|-)?[1-9]\d*))(\.\d+)?' −→ true
'0.9' matchesRegex: '(0|((\+|-)?[1-9]\d*))(\.\d+)?' −→ true
'3.14' matchesRegex: '(0|((\+|-)?[1-9]\d*))(\.\d+)?' −→ true
'-42' matchesRegex: '(0|((\+|-)?[1-9]\d*))(\.\d+)?' −→ true
'2.' matchesRegex: '(0|((\+|-)?[1-9]\d*))(\.\d+)?' −→ false "need digits after ."
Note that these elements are components of the character classes, i.e., they
have to be enclosed in an extra set of square brackets to form a valid regular
expression. For example, a non-empty string of digits would be represented
as [[:digit:]]+. The above primitive expressions and operators are common to
many implementations of regular expressions.
'42' matchesRegex: '[[:digit:]]+' −→ true
90 Regular Expressions in Pharo
Matching boundaries
The last group of special primitive expressions is shown in Table 6.3, and is
used to match boundaries of strings.
regexes.
Enumeration interface
Some applications need to access all matches of a certain regular expression
within a string. The matches are accessible using a protocol modeled after
the familiar Collection-like enumeration protocol.
regex:matchesDo: evaluates a one-argument aBlock for every match of the
regular expression within the receiver string.
list := OrderedCollection new.
'Jack meet Jill' regex: '\w+' matchesDo: [:word | list add: word].
list −→ an OrderedCollection('Jack' 'meet' 'Jill')
Lower-level interface
When you send the message matchesRegex: to a string, the following happens:
The Matcher
If you repeatedly match a number of strings against the same regular expres-
sion using one of the messages defined in String, the regular expression string
is parsed and a new matcher is created for every match. You can avoid this
overhead by building a matcher for the regular expression, and then reusing
the matcher over and over again. You can, for example, create a matcher at a
class or instance initialization stage, and store it in a variable for future use.
You can create a matcher using one of the following methods:
• You can directly instantiate a RxMatcher using one of its class methods:
forString: or forString:ignoreCase: (which is what the convenience methods
above will do).
Matching
A matcher understands these messages (all of them return true to indicate
successful match or search, and false otherwise):
matches: aString – true if the whole argument string (aString) matches.
matchesPrefix: aString – true if some prefix of the argument string (not nec-
essarily the whole string) matches.
'\w+' asRegex matchesPrefix: 'Ignatz hates Krazy' −→ true
search: aString – Search the string for the first occurrence of a matching
substring. (Note that the first two methods only try matching from the very
beginning of the string). Using the above example with a matcher for a+, this
method would answer success given a string 'baaa', while the previous two
would fail.
'\b[a-z]+\b' asRegex search: 'Ignatz hates Krazy' −→ true "finds 'hates'"
The matcher also stores the outcome of the last match attempt and can
report it: lastResult answers a Boolean: the outcome of the most recent match
attempt. If no matches were attempted, the answer is unspecified.
number := '\d+' asRegex.
number search: 'Ignatz throws 5 bricks'.
number lastResult −→ true
Subexpression matches
After a successful match attempt, you can query which part of the original
string has matched which part of the regex. A subexpression is a parenthe-
sized part of a regular expression, or the whole expression. When a regular
expression is compiled, its subexpressions are assigned indices starting from
1, depth-first, left-to-right.
For example, the regex ((\\d+)\\s*(\\w+)) has four subexpressions, including
itself.
1: ((\d+)\s*(\w+)) "the complete expression"
2: (\d+)\s*(\w+) "top parenthesized subexpression"
3: \d+ "first leaf subexpression"
4: \w+ "second leaf subexpression"
The highest valid index is equal to 1 plus the number of matching paren-
theses. (So, 1 is always a valid index, even if there are no parenthesized
subexpressions.)
After a successful match, the matcher can report what part of the original
string matched what subexpression. It understands these messages:
subexpressionCount answers the total number of subexpressions: the high-
est value that can be used as a subexpression index with this matcher. This
value is available immediately after initialization and never changes.
subexpression: takes a valid index as its argument, and may be sent only
after a successful match attempt. The method answers a substring of the
original string the corresponding subexpression has matched to.
subBeginning: and subEnd: answer the positions within the argument string
or stream where the given subexpression match has started and ended, re-
spectively.
items := '((\d+)\s*(\w+))' asRegex.
items search: 'Ignatz throws 1 brick at Krazy'.
items subexpressionCount −→ 4
items subexpression: 1 −→ '1 brick' "complete expression"
items subexpression: 2 −→ '1 brick' "top subexpression"
items subexpression: 3 −→ '1' "first leaf subexpression"
items subexpression: 4 −→ 'brick' "second leaf subexpression"
items subBeginning: 3 −→ an OrderedCollection(14)
items subEnd: 3 −→ an OrderedCollection(15)
Regex API 95
As a more elaborate example, the following piece of code uses a MMM DD,
YYYY date format recognizer to convert a date to a three-element array with
year, month, and day strings:
date := '(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\s+(\d\d?)\s*,\s*19(\d\d)'
asRegex.
result := (date matches: 'Aug 6, 1996')
ifTrue: [{ (date subexpression: 4) .
(date subexpression: 2) .
(date subexpression: 3) } ]
ifFalse: ['no match'].
result −→ #('96' 'Aug' '6')
There are also the following methods for iterating over matches within
streams: matchesOnStream:, matchesOnStream:do:, matchesOnStream:collect:,
copyStream:to:replacingMatchesWith: and copyStream:to:translatingMatchesUsing:.
in := ReadStream on: '12 drummers, 11 pipers, 10 lords, 9 ladies, etc.'.
out := WriteStream on: ''.
numMatch := '\<\d+\>' asRegex.
numMatch
copyStream: in
to: out
translatingMatchesUsing: [:each | each asNumber asFloat asString ].
out close; contents −→ '12.0 drummers, 11.0 pipers, 10.0 lords, 9.0 ladies, etc.'
96 Regular Expressions in Pharo
Error Handling
Several exceptions may be raised by RxParser when building regexes. The
exceptions have the common parent RegexError. You may use the usual
Smalltalk exception handling mechanism to catch and handle them.
Acknowledgments. Since the first release of the matcher, thanks to the in-
put from several fellow Smalltalkers, I became convinced a native Smalltalk
regular expression matcher was worth the effort to keep it alive. For the
advice and encouragement that made this release possible, I want to thank:
Felix Hack, Eliot Miranda, Robb Shecter, David N. Smith, Francis Wolinski
and anyone whom I haven’t yet met or heard from, but who agrees this has
not been a complete waste of time.
Source Management
Chapter 7
Co-written with
Oscar Nierstrasz ([email protected])
A versioning system helps you to store and log multiple versions of your
code. In addition, it may help you manage concurrent accesses to a common
source code repository. It keeps track of all changes to a set of documents
and enables several developers to collaborate. As soon as the size of your
software increases beyond a few classes, you probably need a versioning
system.
Many different versioning systems are available. CVS1 , Subversion2 , and
3
Git are probably the most popular. In principle you could use them to man-
age the development of Pharo software projects, but such a practice would
disconnect the versioning system from the Pharo environment. In addi-
tion, CVS-like tools only version plain text files and not individual packages,
classes or methods. We would therefore lack the ability to track changes at
the appropriate level of granularity. If the versioning tools know that you
store classes and methods instead of plain text, they can do a better job of
supporting the development process.
There are multiple repositories to store your projects. SmalltalkHub4 and
Squeaksource 35 are the two main and free-to-use repositories. They are ver-
sioning systems for Pharo in which classes and methods, rather than lines of
1 http://www.nongnu.org/cvs
2 http://subversion.tigris.org
3 http://git-scm.com/
4 http://smalltalkhub.com/
5 http://ss3.gemstone.com/
102 Versioning Your Code with Monticello
text, are the units of change. In this chapter we will use SmalltalkHub, but
Squeaksource 3 can be use similarly. SmalltalkHub is a central online reposi-
tory in which you can store versions of your applications using Monticello.
SmalltalkHub is the equivalent of SourceForge, and Monticello the equiva-
lent of CVS.
In this chapter, you will learn how to use use Monticello and
SmalltalkHub to manage your software. We have already been acquainted
with Monticello briefly in earlier chapters6 . This chapter delves into the de-
tails of Monticello and describes some additional features that are useful for
versioning large applications.
Define a subclass of TestCase called PerfectTest in the package Perfect, and de-
fine the following test methods in the protocol running:
PerfectTest»testPerfect
self assert: 6 isPerfect.
self assert: 7 isPerfect not.
self assert: 28 isPerfect.
Of course these tests will fail as we have not yet implemented the isPerfect
method for integers. We would like to put this code under the control of
Monticello as we revise and extend it.
Launching Monticello
lists installed packages and the right panes shows known repositories. Vari-
ous operations may be performed via the button pane and the menus of the
two list panes.
Creating a package
Monticello manages versions of packages. A package is essentially a named
set of classes and methods. In fact, a package is an object — an instance of
PackageInfo — that knows how to identify the classes and methods that be-
long to it.
We would like to version our PerfectTest class. The right way to do this
is to define a package — called Perfect — containing PerfectTest and all the re-
lated classes and methods we will introduce later. For the moment, no such
package exists. We only have a category called (not coincidentally) Perfect.
This is perfect, since Monticello will map categories to packages for us.
names start with * (i.e., those belonging to other packages). This in-
cludes our testPerfect method, since it belongs to the protocol running.
Committing changes
Note in Figure 7.2 that the Save button is disabled (greyed out).
Before we save our Perfect package, we need to specify to where we want
to save it. A repository is a package container, which may either be local to
your machine or remote (accessed over the network). Various protocols may
be used to establish a connection between your Pharo image and a reposi-
tory. As we will see later (Section 7.5), Monticello supports a large choice of
repositories, though the most commonly used is HTTP, since this is the one
used by SmalltalkHub.
At least one repository, called package-cache, is set up by default, and
is shown as the first entry in the list of repositories on the right-hand side
of your Monticello browser (see Figure 7.1). The package-cache is created
automatically in the local directory where your Pharo image is located. It
will contain a copy of all the packages you download from remote reposito-
ries. By default, copies of your packages are also saved in the package-cache
when you save them to a remote server.
Each package knows which repositories it can be saved to. To add a new
repository to the selected package, press the +Repository button. This will
offer a number of choices of different kind of repository, including HTTP.
For the rest of the chapter we will work with the package-cache repository, as
this is all we need to explore the features of Monticello.
Select the directory repository named package cache, press Save , enter an ap-
propriate log message, and Accept to save the changes.
Basic usage 105
Figure 7.3: You may set a new version name and a commit message when
you save a version of a package.
Use your favorite file browser (e.g., Windows Explorer, Finder or XTerm) to
confirm that a file Perfect-XX.1.mcz was created in your package cache. XX corre-
sponds to your name or initials.8
A version is an immutable snapshot of a package that has been written
to a repository. Each version has a unique version number to identify it in a
repository. Be aware, however, that this number is not globally unique — in
another repository you might have the same file identifier for a different snap-
shot. For example, Perfect-onierstrasz.1.mcz in another repository might be the
final, deployed version of our project! When saving a version into a reposi-
tory, the next available number is automatically assigned to the version, but
you can change this number if you wish. Note that version branches do
not interfere with the numbering scheme (as with CVS or Subversion). As
we shall see later, versions are by default ordered by their version number
when viewing a repository.
Class extensions
Let’s implement the methods that will make our tests green.
Define the following two methods in the class Integer, and put each method in
a protocol called *perfect. Also add the new boundary tests. Check that the tests are
now green.
8 In the past, the convention was for developers to log their changes using only their initials.
Now, with many developers sharing identical initials, the convention is to use an identifier
based on the full name, such as “apblack” or “AndrewBlack”.
106 Versioning Your Code with Monticello
Integer»isPerfect
^ self > 1 and: [self divisors sum = self]
Integer»divisors
^ (1 to: self - 1 ) select: [ :each | (self rem: each) = 0 ]
PerfectTest»testPerfectBoundary
self assert: 0 isPerfect not.
self assert: 1 isPerfect not.
Try the Browse and Changes buttons to see what they do. Save the changes
to the Perfect package. Confirm that the package is now “clean” again.
Basic usage 107
Select the package-cache repository and open it. You should see something like
Figure 7.5.
All the packages in the repository are listed on the left-hand side of the
inspector:
• a bold underlined name means that the package is installed, but that
there is a more recent version in the repository;
Once a package is selected, the right-hand pane lists the versions of the se-
lected package:
• a bold version name means that this version is not an ancestor of the
installed version. This may mean that it is a newer version, or that it
belongs to a different branch from the installed version;
108 Versioning Your Code with Monticello
Select the Perfect package and its repository in the Monticello browser. Action-
click on the package name and select unload package .
You should now be able to confirm that the Perfect package has vanished
from your image!
You should now be able to verify that only the original (red) tests are
loaded.
Select the second version of the Perfect package in the repository inspector and
Load it. You have now updated the package to the latest version.
Branching
A branch is a line of development versions that exists independently of an-
other line, yet still shares a common ancestor version if you look far enough
back in time.
You may create a new version branch when saving your package. Branch-
ing is useful when you want to have a new parallel development. For exam-
ple, suppose your job is doing software maintenance in your company. One
day a different division asks you for the same software, but with a few parts
tweaked for them, since they do things slightly differently. The way to deal
with this situation is to create a second branch of your program that incorpo-
rate the tweaks, while leaving the first branch unmodified.
From the repository inspector, select version 1 of the Perfect package and Load
it. Version 2 should again be displayed in bold, indicating that it no longer loaded
(since it is not an ancestor of version 1). Now implement the following two Integer
methods and place them in the *perfect protocol, and also modify the existing
PerfectTest test method as follows:
110 Versioning Your Code with Monticello
Integer»isPerfect
self < 2 ifTrue: [ ^ false ].
^ self divisors sum = self
Integer»divisors
^ (1 to: self - 1 ) select: [ :each | (self \\ each) = 0]
PerfectTest»testPerfect
self assert: 2 isPerfect not.
self assert: 6 isPerfect.
self assert: 7 isPerfect not.
self assert: 28 isPerfect.
Once again the tests should be green, though our implementation of per-
fect numbers is slightly different.
Select Cancel to avoid overwriting your new methods. Now Save your
changes. Enter your log message, and Accept the new version.
Congratulations! You have now created a new branch of the Perfect pack-
age.
If you still have the repository inspector open, Refresh it to see the new version
(Figure 7.9).
Merging
You can merge one version of a package with another using the Merge but-
ton in the Monticello browser. Typically, you will want to do this when (i)
Basic usage 111
you discover that you have been working on a out-of-date version, or (ii)
branches that were previously independent have to be re-integrated. Both
scenarios are common when multiple developers are working on the same
package.
Consider the current situation with our Perfect package, as illustrated at
the left of Figure 7.10. We have published a new version 3 that is based
on version 1. Since version 2 is also based on version 1, versions 2 and 3
constitute independent branches.
At this point we realize that there are changes in version 2 that we would
like to merge with our changes from version 3. Since we have version 3
currently loaded, we would like to merge in changes from version 2, and
publish a new, merged version 4, as illustrated at the right of Figure 7.10.
112 Versioning Your Code with Monticello
Select version 2 in the repository browser, as shown in Figure 7.11, and click
the Merge button.
The merge tool is a tool that allows for fine-grained package version
merging. Elements contained in the package to-be-merged are listed in the
upper text pane. The lower text pane shows the definition of a selected ele-
ment.
Figure 7.12: Version 2 of the Perfect package being merged with the current
version 3.
Figure 7.13: All older versions are now ancestors of merged version 4.
Browse
Figure 7.14: The snapshot browser reveals that the Perfect package extends
the class Integer with 2 methods.
For example, Figure 7.14 shows the class extensions defined in the Perfect
package. Note that code cannot be edited here, though by action-clicking, if
your environment has been set up accordingly) on a class or a method name
you can open a regular browser.
Advanced topics 115
Changes
The Changes button computes the difference between the code in the image
and the most recent version of the package in the repository.
Make the following changes to PerfectTest, and then click the Changes button
in the Monticello browser.
PerfectTest»testPerfect
self assert: 2 isPerfect not.
self assert: 6 isPerfect.
self assert: 7 isPerfect not.
self assert: 496 isPerfect.
PerfectTest»testPerfectTo1000
self assert: ((1 to: 1000) select: [:each | each isPerfect]) = #(6 28 496)
Figure 7.15: The patch browser shows the difference between the code in the
image and the most recently committed version.
Figure 7.15 shows that the Perfect package has been locally modified with
one changed method and one new method. As usual, action-clicking on a
change offers you a choice of contextual operations.
History
Select the Perfect package, right click and select the History item.
Figure 7.16: The version history viewer provides information about the vari-
ous versions of a package.
Dependencies
Most applications cannot live on their own and typically require the pres-
ence of other packages in order to work properly. For example, let us have a
look at Pier9 , a meta-described content management system. Pier is a large
piece of software with many facets (tools, documentations, blog, catch strate-
gies, security, etc). Each facet is implemented by a separate package. Most
Pier packages cannot be used in isolation since they refer to methods and
classes defined in other packages. Monticello provides a dependency mech-
anism for declaring the required packages of a given package to ensure that it
will be correctly loaded.
Essentially, the dependency mechanism ensures that all required pack-
ages of a package are loaded before the package is loaded itself. Since re-
quired packages may themselves require other packages, the process is ap-
plied recursively to a tree of dependencies, ensuring that the leaves of the
tree are loaded before any branches that depend on them. Whenever new
9 http://source.lukas-renggli.ch/pier
Advanced topics 117
versions of required packages are checked in, then new versions of the pack-
ages that depend on them will automatically depend on the new versions.
Figure 7.17 illustrates how this works in Pier. Package Pier-All is an empty
package that acts as a kind of umbrella. It requires Pier-Blog, Pier-Caching and
all the other Pier packages.
Because of these dependencies, installing Pier-All causes all the other Pier
packages to be installed. Furthermore, when developing, the only package
that needs to be saved is Pier-All; all dependent dirty packages are saved
automatically.
Let us see how this works in practice. Our Perfect package currently bun-
dles the tests together with the implementation. Suppose we would like
instead to separate these into separate packages, so that the implementation
can be loaded without the tests. By default, however, we would like to load
everything.
• If you modify the PerfectTest class, this will cause the NewPerfect-Tests
and NewPerfect-All packages to both become dirty (but not NewPerfect-
Extensions).
Advanced topics 119
• To commit the change, you should save NewPerfect-All. This will com-
mit a new version of NewPerfect-All which then requires the new version
of NewPerfect-Tests. (It will also depend on the existing, unmodified ver-
sion of NewPerfect-Extensions.) Loading the latest version of NewPerfect-
All will also load the latest version of the required packages.
Class initialization
When Monticello loads a package into the image, any class that defines an
initialize method on the class side will be sent the initialize message. The mes-
sage is sent only to classes that define this method on the class side. A class
that does not define this method will not be initialized, even if initialize is de-
fined by one of its superclasses. NB: the initialize method is not invoked
when you merely reload a package!
Class initialization can be used to perform any number of checks or spe-
cial actions. A particularly useful application is to add new instance vari-
ables to a class.
Class extensions are strictly limited to adding new methods to a class.
Sometimes, however, extension methods may need new instance variables
to exist.
Suppose, for example, that we want to extend the TestCase class of SUnit
with methods to keep track of the history of the last time the test was red.
We would need to store that information somewhere, but unfortunately we
cannot define instance variables as part of our extension.
120 Versioning Your Code with Monticello
TestCaseExtension class>>initialize
(TestCase instVarNames includes: 'lastRedRun')
ifFalse: [TestCase addInstVarName: 'lastRedRun']
When our package is loaded, this code will be evaluated and the instance
variable will be added, if it does not already exist. Note that if you change
a class that is not in your package, the other package will become dirty. In
the previous example, the package SUnit contains TestCase. After installing
TestCaseExtension, the package SUnit will become dirty.
2. Open a change sorter and create a new change set. Let’s name it DiffPer-
fect
3. Load version 2
4. In the change sorter, you should now see the difference between ver-
sion 1 and 2. The change set may be saved on the filesystem by action-
clicking on it and selecting file out . A DiffPerfect.X.cs file is now located
next to your Pharo image.
Kinds of repositories 121
HTTP. HTTP repositories are probably the most popular kind of repository
since this is the kind supported by SmalltalkHub.
The nice thing about HTTP repositories is that it is easy to link directly
to specific versions from web sites. With a little configuration work on the
HTTP server, HTTP repositories can be made browsable by ordinary web
browsers, WebDAV clients, and so on.
HTTP repositories may be used with an HTTP server other than
SmalltalkHub. For example, a simple configuration10 turns Apache into a
Monticello repository with restricted access rights:
"My apache2 install worked as a Monticello repository right out of the box on my
RedHat 7.2 server. For posterity's sake, here's all I had to add to my apache2 config:"
Alias /monticello/ /var/monticello/
<Directory /var/monticello>
DAV on
Options indexes
Order allow,deny
Allow from all
AllowOverride None
# Limit write permission to list of valid users.
<LimitExcept GET PROPFIND OPTIONS REPORT>
AuthName "Authorization Realm"
AuthUserFile /etc/monticello-auth
AuthType Basic
Require valid-user
</LimitExcept>
</Directory>
"This gives a world-readable, authorized-user-writable Monticello repository in
/var/monticello. I created /etc/monticello-auth with htpasswd and off I went.
I love Monticello and look forward to future improvements."
FTP. This is similar to an HTTP repository, except that it uses an FTP server
instead. An FTP server may also offer restricted access right and different
FTP clients may be used to browse such a Monticello repository.
10 http://www.visoracle.com/squeak/faq/monticello-1.html
122 Versioning Your Code with Monticello
SMTP. SMTP repositories are useful for sending versions by mail. When
creating an SMTP repository, you specify a destination email address. This
could be the address of another developer — the package’s maintainer, for
example — or a mailing list such as pharo-project. Any versions saved in
such a repository will be emailed to that address. SMTP repositories are
write-only.
(path asFileReference).
MCRepositoryGroup default addRepository: repo ].
Using SmalltalkHub
SmalltalkHub is a online repository that you can use to store your Monticello
packages. An instance is running and accessible from http://smalltalkhub.com/.
Use a web browser to visit the main page http:// smalltalkhub.com/ . When you
select a project, you should see this kind of repository expression:
MCHttpRepository
location: 'http://smalltalkhub.com/mc/PharoExtras/Phexample/main'
user: ''
password: ''
124 Versioning Your Code with Monticello
Add this repository to Monticello by clicking +Repository , and then selecting HTTP .
Fill out the template with the URL corresponding to the project — you can copy the
above repository expression from the web page and paste it into the template. Since
you are not going to commit new versions of this package, you do not need to fill in
the user and password. Open the repository, select the latest version of Phexample
and click Load .
Pressing the Join link on the SmalltalkHub home page will probably be
your first step if you do not have a SmalltalkHub account. Once you are a
member, + New Project allows you to create a new project.
zip” since an mcz file is simply a zipped file containing the source code and
other meta-data.
You may try to unzip such a file, for example to view the source code
directly, but normally, end users should not need to unzip these files them-
selves. If you unzip it, you will find the following members of the mcz file.
File contents Mcz files are actually ZIP archives that follow certain conven-
tions. Conceptually a version contains four things:
Metadata encoding The other members of the zip archive are encoded us-
ing S-expressions. Conceptually, the expressions represent nestable dictio-
naries. Each pair of elements in a list represent a key and value. For example,
the following is an excerpt of a “version” file of a package named AA:
(name 'AA-ab.3' message 'empty log message' date '10 January 2008' time '10
:31:06 am' author 'ab' ancestors ((name 'AA-ab.2' message...)))
It basically says that the version AA-ab.3 has an empty log message, was
created on January 10, 2008, by ab, and has an ancestor named AA-ab.2, ...
• There are many kinds of repositories, the most popular being HTTP
repositories, such as those hosted by SmalltalkHub.
• You can drag and drop an mcz file onto your image as a quick way to
load it.
Chapter 8
Figure 8.1: The browser shows that the class String gets the methods asUrl and
asUrlRelativeTo: from the package network-url
classVariableNames: ''
poolDictionaries: ''
category: 'Zork'
Figure 8.2: The change browser shows that the method String>>asUrl has
changed.
Figure 8.3: (left) Typical setup with clean and dirty packages loaded and
cached — (right) Package published.
Here is a typical Gofer script: it says that we want to load the package
PBE2GoferExample from the repository PBE2GoferExample that is available on
http://www.smalltalkhub.com in the account of JannikLaval.
Gofer new
url: 'http://smalltalkhub.com/mc/PharoBooks/GoferExample/main';
package: 'PBE2GoferExample';
load
Since the same public servers are often used, Gofer’s API offers a number
of shortcuts to shorten the scripts. Often, we want to write a script and give
134 Gofer: Scripting Package Loading
it to other people to load our code. In such a case having to specify a pass-
word is not really adequate. Here is an example for smalltalkHub (which has
some verbose urls such as ’http://smalltalkhub.com/mc/PharoBooks/GoferExample/
main’ for the project GoferExample). We use the smalltalkhubUser:project: mes-
sage and just specify the minimal information. In this chapter, we also use
squeaksource3: as a shortcut for http://ss3.gemtalksystems.com/ss.
Package Identification
Once an URL and the option are specified, we should define the packages
we want to load. Using the message version: defines the exact version to load,
while the message package: should be used to load the latest version available
in all the repositories.
The following example load the version 2 of the package.
Gofer new
smalltalkhubUser: 'PharoBooks' project: 'GoferExample';
version: 'PBE2GoferExample-janniklaval.1';
load
We can also specify some constraints to identify packages using the mes-
sage package: aString constraint: aBlock to pass a block.
For example the following code will load the latest version of the package
saved by the developer named janniklaval.
Gofer new
smalltalkhubUser: 'PharoBooks' project: 'GoferExample';
package: 'PBE2GoferExample'
constraint: [ :version | version author = 'janniklaval' ];
load
Gofer actions 135
url: 'http://smalltalkhub.com/mc/JLaval/Phratch/main';
package: 'Collections-Arithmetic';
package: 'Sound';
package: 'Settings-Sound';
package: 'SoundScores';
package: 'SoundMorphicUserInterface';
package: 'Phratch';
load
Note that such scripts load the latest versions of the packages and are
therefore fragile, because if a new package version is published, you will
load it even if this is unstable. In general it is a good practice to control
the version of the external components we rely on and use the latest version
for our own current development. Now, such problem can be solved with
Metacello, the tool to express configurations and load them.
Other protocols
Gofer supports also FTP as well as loading from a local directory. We basi-
cally use the same messages as before, with some changes.
For FTP, we should specify the URL using 'ftp' as the heading.
Gofer new
url: 'ftp://wtf-is-ftp.com/code';
...
Finally it is possible to look for packages in a repository and all its sub-
folders using the keen star.
Gofer new
directory: '/home/pharoer/hacking/MCPackages/*';
...
Changes present in the working copy are merged with the code of the remote
copy. It is often the case that after a merge, the working copy gets dirty and
should be republished. The new version will contain the current changes
and the changes of the remote version. In case of conflicts the user will be
warned or else the operation will happen silently.
Gofer new
smalltalkhubUser: 'PharoBooks' project: 'GoferExample';
package: 'PBE2GoferExample';
merge
The message update loads the remote version in the image. The modifica-
tions of the working copy are lost.
The message revert resets the local version, i.e., it loads the current version
again. The changes of the working copy are then lost.
Gofer new
smalltalkhubUser: 'PharoBooks' project: 'GoferExample';
"we add the latest version of PBE2GoferExample"
package: 'PBE2GoferExample';
"we browse the latest version published on the server"
browseRemoteChanges
The unload operation. The message unload unloads the packages from the
image. Note that using the Monticello browser you can delete a package, but
such an operation does not remove the code of the classes associated with
the package, it just destroys the package. Unloading a package destroys the
packages and the classes it contains.
The following code unloads the packages and its classes from the current
image.
Gofer new
package: 'PBE2GoferExample';
unload
Note that you cannot unload Gofer itself that way. Gofer gofer unload does
not work.
Now, if you want to load your packages locally remember to set up the
lookup so that it takes into account the local cache and disables errors as
presented in the beginning of this chapter (messages disableRepositoryErrors
and enablePackageCache).
The message push performs the inverse operation. It publishes locally
available packages to the remote server. All the packages that you published
locally are then pushed to the server.
140 Gofer: Scripting Package Loading
Gofer new
smalltalkhubUser: 'PharoBooks' project: 'GoferExample';
package: 'PBE2GoferExample';
push
As a pattern, we always keep the copies of all the versions of our projects
or the projects we used in our local cache. This way we are autonomous from
any network failure and the packages are backed up in our regular backup.
With these two messages, it is easy to write a script sync that synchronizes
local and remote repositories.
Gofer new
smalltalkhubUser: 'PharoBooks' project: 'GoferExample';
package: 'PBE2GoferExample';
push.
Gofer new
smalltalkhubUser: 'PharoBooks' project: 'GoferExample';
package: 'PBE2GoferExample';
fetch
Automating Answers
Sometimes package installation asks for information such as passwords.
With the systematic use of a build server, packages will probably stop to
do that, but it is important to know how to supply answers from within a
script to these questions. The message valueSupplyingAnswers: supports such a
task.
[ Gofer new
squeaksource: 'Seaside30';
package: 'LoadOrderTests';
load ]
Some useful scripts 141
valueSupplyingAnswers: {
{'Load Seaside'. True}.
{'SqueakSource User Name'. 'pharoUser'}.
{'SqueakSource Password'. 'pharoPwd'}.
{'Run tests'. false}.
}
This message should be sent to a block, giving a list of questions and their
answers as shown in previous examples
Configuration Loading
Gofer also supports Metacello configuration loading. It provides a set of
the following messages to handle configurations: configurationOf:, loadVersion:,
loadDevelopment, and loadStable.
In this example, loading the development version of NativeBoost. There
you need only to specify the NativeBoost project and you will load the
ConfigurationOfNativeBoost and execute the loading the development version.
Gofer new
smalltalkhubUser: 'Pharo' project: 'NativeBoost';
configuration;
loadDevelopment
When the repository name does not match the name of the configuration
you should use configurationOf: and provide the name of the configuration
class.
The following script gathers the package versions by packages and re-
turns a dictionary.
allResolved)
groupedBy: [ :each | each packageName])
Script 8.3: Getting the package list for the Kozen project hosted on SS3.
((Gofer new
squeaksource3: 'Kozen';
allResolved)
groupedBy: [ :each | each packageName]) keys
Fetching packages
Here is a script to fetch all the packages of a given repository. It is useful for
grabbing your files and having a version locally.
Script 8.5: Fetching all the refactoring packages from the Pharo2.0 repository
| go |
go := Gofer new.
go squeaksource3: 'Pharo20'.
(go allResolved select: [ :each | 'Refactoring*' match: each packageName])
do: [ :pack |
self crLog: pack packageName.
go package: pack packageName; fetch]
Script 8.6: How to publish package files to a new repository using Pharo 1.4
| go |
go := Gofer new.
go repository: (MCHttpRepository
location: 'http://ss3.gemtalksystems.com/ss/Pharo14'
user: 'pharoUser'
Some useful scripts 143
password: 'pharoPwd').
The following script uses the new filesystem library, we also show how
we can get the package name and not the versions. The script also pays
attention to only publish mcz files. It can be extended to publish selectively
specific packages.
Script 8.7: How to publish package files to a new repository using Pharo 20
| go |
go := Gofer new.
go repository: (MCHttpRepository
location: 'http://ss3.gemtalksystems.com/ss/rb-pharo'
user: 'pharoUser'
password: 'pharoPwd').
go := Gofer new.
go repository: repo.
(((FileSystem disk workingDirectory / 'package-cache')
allFiles select: [:each | '*Fame*.mcz' match: each basename])
groupedBy: [:each | (each base copyUpToLast: $-) ]) keys
do: [:name | go package: name; push]
source := MCHttpRepository
location: 'http://www.squeaksource.com/MyProject'.
destination := MCSmalltalkhubRepository
owner: 'TheOwner'
project: 'YourPackage'
user: 'YourName'
password: ''.
• The method load allows us to load packages from sources given with
the method url: and package:.
Have you ever had this problem when trying to load a project: you get
an error because a package that you were not even aware of is missing? Or
worse — it is present, but you have the wrong version? This situation can
easily occur, even though the project loads fine for its developers, when the
developers are working in a context that is different from yours.
The solution for the project developers is to use a package dependency
management system to explicitly manage the dependencies between the pack-
ages that make up a project. This chapter shows you how to use Metacello,
Pharo’s package management system, and demonstrates the benefits of us-
ing it.
9.1 Introduction
Gofer: Monticello’s scripting interface. Gofer is a small tool that sits on top
of Monticello: it is used to load, update, merge, difference, revert, com-
mit, recompile and unload groups of Monticello packages. Gofer also
makes sure that these operations are performed as cleanly as possible.
For more information, see Chapter 8.
Versions. A version identifies the exact version of each package and project
that should be loaded. A version is based upon a baseline version. For
each package in the baseline version, the Monticello file name (e.g.,
Metacello-Base-dkh.152) is specified. For each project in the baseline
version, the Metacello version number is specified.
sion 0.5 is based on baseline 0.5 and specifies the versions of the packages
(PackageA-version.6, PackageB-version.4 and PackageC-version.1) and ver-
sion of the dependent project (ProjectB-version3).
ConfigurationOfCoolBrowser>>version01: spec
<version: '0.1'>
The method version01: spec builds a description of version 0.1 of the project
in the object spec. The common code for version 0.1 (specified using the mes-
sage for:do:) consists of particular versions of the packages named CoolBrowser
-Core and CoolBrowser-Tests. These are specified with the message package:
packageName with: versionName. These versions are available in the Monti-
cello repository http://www.example.com/CoolBrowser, which is specified using
the message repository:. The blessing: method is used to denote that this is
a released version and that the specification will not be changed in the fu-
ture. The blessing #development should be used when the version has not
stabilized.
Now let us look at more details.
• Immediately after the method selector you see the pragma definition:
<version: '0.1'>. The pragma version: indicates that the version created in
this method should be associated with version 0.1 of the CoolBrowser
project. That is why we said that the name of the method is not that im-
portant. Metacello uses the pragma, not the method name, to identify
the version being defined.
• The argument of the method, spec, is the only variable in the method
and it is used as the receiver of four different messages: for:do:, blessing:,
package:with:, and repository:.
How to manage multiple repositories. You can also add multiple reposi-
tories to a spec. You just have to specify multiple times repository: expression.
ConfigurationOfCoolBrowser>>version02: spec
...
154 Managing Projects with Metacello
that you have a coherent set of package versions. To load versions, you send
the message load to a version. Here are some examples for loading versions
of the CoolBrowser:
(ConfigurationOfCoolBrowser project version: '0.1') load.
(ConfigurationOfCoolBrowser project version: '0.2') load.
Note that in addition, if you print the result of each expression, you get
a list of packages in load order: Metacello manages not only which packages
are loaded, but also the order. It can be handy to debug configurations.
Selective Loading. By default, the load message loads all the packages asso-
ciated with the version (as we will see later, we can change that by defining
a particular group called default). If you want to load a subset of the packages
in a project, you should list the names of the packages that you are interested
in as an argument to the load: method:
(ConfigurationOfCoolBrowser project version: '0.2') load:
{ 'CoolBrowser-Core' .
'CoolBrowser-Addons' }.
Apart from load and record, there is also another useful method which is
fetch (and fetch:). As explained, record simply records which Monticello files
should be downloaded and in which order. fetch accesses and downloads all
the needed Monticello files. Just for the record, in the implementation load
first does a fetch and then a doLoad.
Let us focus on internal dependencies for now: imagine that the packages
CoolBrowser-Tests and CoolBrowser-Addons both depend on CoolBrowser-Core
as described in Figure 9.4. The specifications for versions 0.1 and 0.2 did not
capture this dependency. Here is a new configuration that does:
ConfigurationOfCoolBrowser>>version03: spec
<version: '0.3'>
9.7 Baselines
A baseline represents the skeleton or architecture of a project in terms of
the structural dependencies between packages or projects. A baseline de-
fines the structure of a project using just package names. When the struc-
ture changes, the baseline should be updated. In the absence of structural
changes, the changes are limited to picking specific versions of the packages
in the baseline.
Now, let’s continue with our example. First we modify it to use baselines:
we create one method for our baseline. Note that the method name and the
version pragma can take any form. Still, for readability purposes, we add
’baseline’ to both of them. It is the argument of the blessing: message that is
mandatory and defines a baseline.
ConfigurationOfCoolBrowser>>baseline04: spec "convention"
<version: '0.4-baseline'> "convention"
Figure 9.5: Version 0.4 now imports a baseline that expresses the dependen-
cies between packages.
it, as shown in Figure 9.5. The baseline specifies a repository, the packages,
and the dependencies between those packages, but it does not specify the
specific versions of the packages.
To define a version in terms of a baseline, we use the pragma
<version:imports:> , as follows:
ConfigurationOfCoolBrowser>>version04: spec
<version: ’0.4’ imports: #('0.4-baseline')>
Figure 9.6: A second version (0.5) imports the same baseline as version 0.4.
Loading Baselines
Even though version 0.4-baseline does not contain explicit package version
information, you can still load it!
(ConfigurationOfCoolBrowser project version: '0.4-baseline') load.
Declaring a new version. Now suppose that we want to create a new ver-
sion of our project, version 0.5, that has the same structure as version 0.4, but
contains different versions of the packages. We can capture this content by
importing the same baseline; this relationship is depicted in Figure 9.6.
ConfigurationOfCoolBrowser>>version05: spec
<version: ’0.5’ imports: #(’0.4-baseline’)>
Creating a baseline for a big project will often require some time and
effort, since it must capture all the dependencies of all the packages, as well
160 Managing Projects with Metacello
as some other things that we will look at later. However, once the baseline is
defined, creating new versions of the project is greatly simplified and takes
very little time.
9.8 Groups
Suppose now that the CoolBrowser project grows: A developer writes
some tests for CoolBrowser-Addons. These constitute a new package named
CoolBrowser-AddonsTests, which naturally depends on CoolBrowser-Addons and
CoolBrowser-Tests, as shown in Figure 9.7.
We may want to load projects with or without tests. In addition, it is
convenient to be able to load all of the tests with a simple expression like:
(ConfigurationOfCoolBrowser project version: '0.6') load: 'Tests'.
instead of having to explicitly list all of the test packages, like this:
(ConfigurationOfCoolBrowser project version: '0.6')
load: #('CoolBrowser-Tests' 'CoolBrowser-AddonsTests').
Figure 9.7: A baseline with six groups: default, Core, Extras, Tests, Complete-
WithoutTests and CompleteWithTests.
Groups are defined in baselines. We are defining the groups in the base-
line version, since a group is a structural component. Note that the default
group will be used in the subsequent sections. Here the default group men-
tions that the two packages ’CoolBrowser-Core’ and ’CoolBrowser-Addons’
will be loaded when the method load is used.
Using this baseline, we can now define version 0.6 to be the same as ver-
sion 0.5, except for the addition of the new package CoolBrowser-AddonsTests.
ConfigurationOfCoolBrowser>>version06: spec
<version: ’0.6’ imports: #(’0.6-baseline’)>
JohnLewis.1' ].
Examples. Once you have defined a group, you can use its name anywhere
you would use the name of a project or package. The load: method takes as
parameter the name of a package, a project, a group, or a collection of those
items. All of the following statements are possible:
The groups default and ’ALL’. The default group is a special one. The load
message loads the members of the default group while loading the ALL group
will load all the packages. Moreover, by default, default loads ALL!
We have named the project reference CoolBrowser ALL. The name of the
project reference is arbitrary. You can select the name you want, although
is it recommended that you choose a name that makes sense to that project
reference. In the specification for the CoolToolSet-Core package, we have spec-
ified that CoolBrowser ALL is required. As will be explained later, the message
project:with: allows one to specify the exact version of the project you want to
load.
The message loads: specify which packages or groups to load. The param-
eter of loads: can be the same as the one of load, i.e., the name of a package,
the name of a group, or a collection of these things. Notice that calling loads:
Dependencies between projects 165
is optional, you only need it if you want to load something different from
the default.
Now we can load CoolToolSet like this:
(ConfigurationOfCoolToolSet project version: '0.1') load.
• The message className: specifies the name of the class that contains the
project metadata; in this case ConfigurationOfCoolBrowser.
• The messages file: and repository: give Metacello the information that
it might need to search for and load class ConfigurationOfCoolBrowser, if
it is not present in the image. The argument of file: is the name of the
Monticello package that contains the metadata class, and the argument
of repository: is the URL of the Monticello repository that contains that
package. If the Monticello repository is protected, then you should use
the message: repository:username:password: instead.
spec
package: 'Soup-Core' with: 'Soup-Core-sd.11';
package: 'Soup-Tests-Core' with: 'Soup-Tests-Core-sd.3';
package: 'Soup-Help' with: 'Soup-Help-StephaneDucasse.2' ].
What you can also do is to use the loads: message in the project reference
to specify which packages of the project you want to load. Such solution is
nice because you factor the information in the project reference and you do
not have to duplicate it in all the versions.
168 Managing Projects with Metacello
ConfigurationOfSoup>>version10: spec
<version: '1.0' imports: #('1.0-baseline')>
spec
package: 'Soup-Core' with: 'Soup-Core-sd.11';
package: 'Soup-Tests-Core' with: 'Soup-Tests-Core-sd.3';
package: 'Soup-Help' with: 'Soup-Help-StephaneDucasse.2' ].
<version: '0.2-baseline'>
spec for: #common do: [
spec blessing: #baseline.
spec repository: 'http://www.example.com/CoolToolSet'.
spec
project: 'CoolBrowser default' with: [
spec
loads: #('default');
repository: 'http://www.example.com/CoolBrowser';
file: 'CoolBrowser-Metacello']
project: 'CoolBrowser Tests'
copyFrom: 'CoolBrowser default'
with: [ spec loads: #('Tests').].
spec
package: 'CoolToolSet-Core' with: [ spec requires: 'CoolBrowser default' ];
package: 'CoolToolSet-Tests' with: [
spec requires: #('CoolToolSet-Core' 'CoolBrowser Tests') ].].
project: 'PhexampleCore'
with: [ spec
versionString: #stable;
loads: #('Core');
repository: 'http://www.smalltalkhub.com/mc/Phexample/main' ].
....
'Fame-Tests-Core' with: [spec requires: #('Fame-Core' 'Fame-Example' '
PhexampleCore' ) ].
ConfigurationOfCoolBrowser>>preloadForCore
Transcript show: 'This is the preload script. Sorry I had no better idea'.
ConfigurationOfCoolBrowser>>version08: spec
<version: '0.8' imports: #('0.7-baseline')>
spec
package: 'CoolBrowser-Core' with: [
spec
file: 'CoolBrowser-Core-BobJones.20';
preLoadDoIt: #preloadForCore;
postLoadDoIt: #postloadForCore:package: ];
package: 'CoolBrowser-Tests' with: 'CoolBrowser-Tests-JohnLewis.8';
package: 'CoolBrowser-Addons' with: 'CoolBrowser-Addons-JohnLewis.6
';
package: 'CoolBrowser-AddonsTests' with: 'CoolBrowser-AddonsTests-
JohnLewis.1' ].
In this example, we added pre and post load scripts at project level.
Again, the selectors can receive 0, 1 or 2 arguments.
Metacello automatically loads the package of the used platform. But to do that,
we need to specify platform specific information using the method for:do: as
shown in the following example. Here we define that a different package ver-
sion will be loaded depending on the platform. The platform specific pack-
ages will be loaded in addition to the common ones depending on which
plateform you are executing the script.
ConfigurationOfCoolBrowser>>version09: spec
<version: '0.9' imports: #('0.9-baseline')>
Specifying versions is one aspect though you should also specify baseline
specific information.
ConfigurationOfCoolBrowser>>baseline09: spec
<version: '0.9-baseline'>
spec
package: 'CoolBrowser-Core';
package: 'CoolBrowser-Tests' with: [ spec requires: 'CoolBrowser-Core' ];
package: 'CoolBrowser-Addons' with: [ spec requires: 'CoolBrowser-Core' ];
package: 'CoolBrowser-AddonsTests' with: [
spec requires: #('CoolBrowser-Addons' 'CoolBrowser-Tests' ) ].
spec
group: 'default' with: #('CoolBrowser-Core' 'CoolBrowser-Addons' );
group: 'Core' with: #('CoolBrowser-Core' 'CoolBrowser-Platform' );
group: 'Extras' with: #('CoolBrowser-Addon');
group: 'Tests' with: #('CoolBrowser-Tests' 'CoolBrowser-AddonsTests' );
group: 'CompleteWithoutTests' with: #('Core', 'Extras' );
group: 'CompleteWithTests' with: #('CompleteWithoutTests', 'Tests' )].
spec
package: 'CoolBrowser-Core';
package: 'CoolBrowser-Tests' with: [ spec requires: 'CoolBrowser-Core' ];
package: 'CoolBrowser-Addons' with: [ spec requires: 'CoolBrowser-Core' ];
package: 'CoolBrowser-AddonsTests' with: [
spec requires: #('CoolBrowser-Addons' 'CoolBrowser-Tests' ) ].
spec
group: 'default' with: #('CoolBrowser-Core' 'CoolBrowser-Addons' );
group: 'Core' with: #('CoolBrowser-Core' 'CoolBrowser-Platform' );
group: 'Extras' with: #('CoolBrowser-Addon');
group: 'Tests' with: #('CoolBrowser-Tests' 'CoolBrowser-AddonsTests' );
group: 'CompleteWithoutTests' with: #('Core', 'Extras' );
group: 'CompleteWithTests' with: #('CompleteWithoutTests', 'Tests' )].
Loading order. Notice that if you are in a system where the platform at-
tributes are (#common #squeakCommon #pharo #'pharo2.x' #'pharo2.0.x') (you can
obtain this information doing ConfigurationOf project attributes) and you have
specified three sections such as #common, #pharo and #pharo2.0.x, these sec-
tions will loaded one after the other.
ConfigurationOfCoolBrowser>>baseline09: spec
<version: '0.9-baseline'>
spec
package: 'CoolBrowser-Core';
package: 'CoolBrowser-Tests' with: [ spec requires: 'CoolBrowser-Core' ];
package: 'CoolBrowser-Addons' with: [ spec requires: 'CoolBrowser-Core' ];
package: 'CoolBrowser-AddonsTests' with: [
spec requires: #('CoolBrowser-Addons' 'CoolBrowser-Tests' ) ].
spec
group: 'default' with: #('CoolBrowser-Core' 'CoolBrowser-Addons' );
group: 'Core' with: #('CoolBrowser-Core' 'CoolBrowser-Platform' );
group: 'Extras' with: #('CoolBrowser-Addon');
group: 'Tests' with: #('CoolBrowser-Tests' 'CoolBrowser-AddonsTests' );
group: 'CompleteWithoutTests' with: #('Core', 'Extras' );
group: 'CompleteWithTests' with: #('CompleteWithoutTests', 'Tests' )].
Finally, note that the method for:do: is not only used to specify a platform
specific package, but also for anything that has to do with different dialects.
You can put whatever you want from the configuration inside that block. For
example, you can define, change and customize groups, packages, reposito-
ries, etc, for each dialect dialect and do this:
ConfigurationOfCoolBrowser>>baseline010: spec
<version: '0.10-baseline'>
spec
...
spec
group: 'default' with: #('CoolBrowser-Core' 'CoolBrowser-Addons' );
group: 'Core' with: #('CoolBrowser-Core' 'CoolBrowser-Platform' );
group: 'Extras' with: #('CoolBrowser-Addon');
group: 'Tests' with: #('CoolBrowser-Tests' 'CoolBrowser-AddonsTests' );
group: 'CompleteWithoutTests' with: #('Core', 'Extras' );
group: 'CompleteWithTests' with: #('CompleteWithoutTests', 'Tests' )].
176 Managing Projects with Metacello
spec
package: 'CoolBrowser-Core';
package: 'CoolBrowser-Tests' with: [ spec requires: 'CoolBrowser-Core' ];
spec
group: 'default' with: #('CoolBrowser-Core' 'CoolBrowser-Addons' );
group: 'Core' with: #('CoolBrowser-Core' 'CoolBrowser-Platform' )].
In this example, for Pharo we use a different repository than for Gem-
stone. However, this is not mandatory, since both can have the same repos-
itory and differ in other things, like versions, post and pre code executions,
dependencies, etc.
In addition, the addons and tests are not available for Gemstone, and
thus, those packages and groups are not included. As you can see, all that
we have been doing inside the for: #common: do: can be done inside another
for:do: for a specific dialect.
Note that the #stable here overrides the bleeding edge loading behavior
that you would get if you were (fool enough) to load a baseline (remember
loading a baseline loads bleeding edge versions). Here we make sure that
the stable version of OmniBrowser for your platform will be loaded (and
not the latest one). The next section is about the different symbolic versions.
stable. A symbolic version that specifies the stable literal version for a par-
ticular platform and version of such a platform. The stable version is
the version that should be used for loading. With the exception of the
bleedingEdge version (which has a pre-defined default), you will need
to edit your configuration to add the stable or development version in-
formation: I want a certified version for the platform. Now pay attention
because if you rely on a stable version of a package it does not mean
that the package cannot change. Indeed the package implementor may
produce a new version that may be incompatible with your system.
Here it indicates that there is no version for the common tag. Us-
ing a symbolic version that resolves to notDefined will result in a
MetacelloSymbolicVersionNotDefinedError being signaled.
For the development symbolic version you can use any version that you
wouldd like (including another symbolic version). As the following code
shows it, we can specify a specific version, a baseline (which will load the
latest versions specified by the baseline) or a stable version.
development: spec
<symbolicVersion: #'development'>
spec for: #'common' version: '1.1'
development: spec
Milestoning development: symbolic versions 179
<symbolicVersion: #'development'>
spec for: #'common' version: '1.1-baseline'
development: spec
<symbolicVersion: #'development'>
spec for: #'common' version: #stable
Warning. The term stable is misleading. It does not mean that you will
always load exactly the same version because the developer of the system
you rely on may change the meaning of stable to point to another stable
version. But such a stable version may introduce incompatibility with your
own code. So when you release your code you should use a specific version
to be sure that you will not get impacted by other changes.
is not always the last version. This is because latestVersion answers the latest
version whose blessing is not #development, #broken, or #blessing. To find the
latest #development version for example, you should execute this expression:
ConfigurationOfCoolBrowser project latestVersion: #development.
Nevertheless, you can get the very last version independently of blessing
using the lastVersion method as illustrated below
ConfigurationOfCoolBrowser project lastVersion.
In general, the #development blessing should be used for any version that
is unstable. Once a version has stabilized, a different blessing should be
applied.
The following expression will load the latest version of all of the packages
for the latest #baseline version:
(ConfigurationOfCoolBrowser project latestVersion: #baseline) load.
Since the latest #baseline version should reflect the most up-to-date project
structure, executing the previous expression loads the absolute bleeding
edge version of the project.
180 Managing Projects with Metacello
Hints.
Some patterns emerge when working with Metacello. Here is one: Create
a baseline version and use the #stable version for all of the projects in the
baseline. In the literal version, use the explicit version, so that you get an
explicit repeatable specification for a set of projects that were known to work
together.
Here is an example, the pharo 1.2.2-baseline would include specs that
look like this:
spec
project: 'OB Dev' with: [
spec
className: 'ConfigurationOfOmniBrowser';
versionString: #stable;
...];
project: 'ScriptManager' with: [
spec
className: 'ConfigurationOfScriptManager';
versionString: #stable;
...];
project: 'Shout' with: [
spec
className: 'ConfigurationOfShout';
versionString: #stable;
...];
....].
Loading Pharo 1.2.2-baseline would cause the #stable version for each of
those projects to be loaded ... but remember over time the #stable version
will change and incompatibilities between packages can creep in. By using
#stable versions you will be in better shape than using #bleedingEdge, because
the #stable version is known to work.
Pharo 1.2.2 (literal version) will have corresponding specs that look like
this:
spec
project: 'OB Dev' with: '1.2.4';
project: 'ScriptManager' with: '1.2';
project: 'Shout' with: '1.2.2';
....].
So that you have driven a stake into the ground stating that these versions
are known to work together (have passed tests as a unit). Five years in the
future, you will be able to load Pharo 1.2.2 and get exactly the same packages
every time, whereas the #stable versions may have drifted over time.
If you are just bringing up a PharoCore1.2 image and would like to load
Load types 181
the Pharo dev code, you should load the #stable version of Pharo (which may
be 1.2.2 today and 1.2.3 tomorrow). If you want to duplicate the environment
that someone is working in, you will ask them for the version of Pharo and
load that explicit version to reproduce the bug or whatever request you may
need.
If you use the stable version in your baseline there is no need to do any-
thing special in your version specification.
If you use a linear load, then each package is loaded in order. Class side
initialize methods and pre/post code execution are performed just before or
after loading that specific package.
It is important to notice that managing dependences does not imply the
order packages will be loaded. That a package A depends on package B
doesn’t mean that B will be loaded before A. It just guarantees that if you
want to load A, then B will be loaded too.
A problem with this also happens with methods override. If a package
overrides a method from another package, and the order is not preserved,
this can be a problem because we are not sure of the order in which they will
load, and thus, we cannot be sure which version of the method will be finally
loaded.
When using atomic loading the package order is lost and we have the
mentioned problems. However, if we use the linear mode, then each package
is loaded in order. Moreover, the methods override should be preserved too.
A possible problem with linear mode is the following: suppose project A
depends on other two projects B and C. B depends on the project D version
1.1 and C depends on project D version 1.2 (the same project but another
version). First question, which D version does A have at the end? By de-
fault (you can change this using the method operator: in the project method),
Metacello will finally load version 1.2, i.e., the latest one.
However, in atomic loading only 1.2 is loaded. In linear loading, both
versions may (depending on the dependency order) be loaded, although 1.2
will be finally loaded. But this means that 1.1 may be loaded first and then
1.2. Sometimes this can be a problem because an older version of a package
or project may not even load in the Pharo image we are using.
For all the mentioned reasons, the default mode is linear. Users should
use atomic loading in particular cases and when they are completely sure.
Finally, if you want to explicitly set a load type, you have to do it in the
project method. Example:
ConfigurationOfCoolToolSet >>project
As you can see in the code, we check if CBNode class (a class from Cool-
Browser) is present and depending on that we set a specific project attribute.
This is flexible enough to let you define your own conditions and set the
amount of project attributes you wish (you can define an array of attributes).
Now the question is how to use these project attributes. In the following
baseline we see an example:
184 Managing Projects with Metacello
You can notice that the way to use project attributes is through the exist-
ing method for:do:. Inside that method you can do whatever you want: de-
fine groups, dependencies, etc. In our case, if CoolBrowser is present, then
we just add ’CoolToolSet-CB’ to the default group. If it is not present, then
’CoolBrowser default’ is added to dependency to ’CoolToolSet-CB’. In this
case, we do not add it to the default group because we do not want that. If
desired, the user should explicitly load that package also.
Again, notice that inside the for:do: you are free to do whatever you want.
Project version attributes 185
ConfigurationOfCoolBrowser>>version07: spec
<version: '0.7' imports: #('0.7-baseline')>
Author: the name of the author who created the version. When using the
OB-Metacello tools or MetacelloToolbox, the author field is automati-
cally updated to reflect the current author as defined in the image.
Timestamp: the date and time when the version was completed. When us-
ing the OB-Metacello tools or MetacelloToolbox, the timestamp field is
automatically updated to reflect the current date and time. Note that
the timestamp must be a String.
To end this chapter, we show you can query this information. This illus-
trates that most of the information that you define in a Metacello version can
then be queried. For example, you can evaluate the following expressions:
Metacello Memento
ConfigurationOfCoolToolSet>>baseline06: spec "could be called differently just a convention"
<version: '0.6-baseline'> "Convention. Used in the version: method"
spec for: #common do: [ "#common/#pharo/#gemstone/#pharo’1.4’"
spec blessing: #baseline. "Important: identifies a baseline"
spec repository: 'http://www.example.com/CoolToolSet'.
spec
group: ’default’ with: #(’CoolBrowser-Core’ ’CoolBrowser-Addons’);
group: 'Core' with: #('CoolBrowser-Core');
group: 'Extras' with: #('CoolBrowser-Addon');
group: 'Tests' with: #('CoolBrowser-Tests' 'CoolBrowser-AddonsTests');
group: 'CompleteWithoutTests' with: #('Core' 'Extras');
group: 'CompleteWithTests' with: #('CompleteWithoutTests' 'Tests')
].
ConfigurationOfGemToolsExample>>baseline10: spec
<version: '1.0-baseline'>
spec for: #common do: [
spec blessing: #'baseline'. "required see above"
spec repository: 'http://seaside.gemstone.com/ss/GLASSClient'.
spec
project: 'FFI' with: [
spec
className: 'ConfigurationOfFFI';
versionString: #bleedingEdge; "Optional. #stable/#development/#bleedingEdge/specific
version"
repository: 'http://www.squeaksource.com/MetacelloRepository' ];
project: 'OmniBrowser' with: [
spec
className: 'ConfigurationOfOmniBrowser';
versionString: #stable; "Optional. #stable/#development/#bleedingEdge/specific
version"
repository: 'http://www.squeaksource.com/MetacelloRepository' ];
project: 'Shout' with: [
spec
className: 'ConfigurationOfShout';
versionString: #stable;
repository: 'http://www.squeaksource.com/MetacelloRepository' ];
project: 'HelpSystem' with: [
spec
className: 'ConfigurationOfHelpSystem';
versionString: #stable;
repository: 'http://www.squeaksource.com/MetacelloRepository'].
spec
package: 'OB-SUnitGUI' with: [spec requires: #('OmniBrowser')];
package: 'GemTools-Client' with: [ spec requires: #('OmniBrowser' 'FFI' 'Shout' 'OB-SUnitGUI' ).];
package: 'GemTools-Platform' with: [ spec requires: #('GemTools-Client' ). ];
package: 'GemTools-Help' with: [
spec requires: #('HelpSystem' 'GemTools-Client' ). ].
spec group: 'default' with: #('OB-SUnitGUI' 'GemTools-Client' 'GemTools-Platform' 'GemTools-Help')].
ConfigurationOfGemToolsExample>>version10: spec
<version: '1.0' imports: #('1.0-baseline' )>
spec for: #common do: [
spec blessing: #development.
spec description: 'initial development version'.
spec author: 'dkh'.
spec timestamp: '1/12/2011 12:29'.
spec
project: 'FFI' with: '1.2';
project: 'OmniBrowser' with: #stable;
project: 'Shout' with: #stable;
project: 'HelpSystem' with: #stable.
spec
package: 'OB-SUnitGUI' with: 'OB-SUnitGUI-dkh.52';
package: 'GemTools-Client' with: 'GemTools-Client-NorbertHartl.544';
package: 'GemTools-Platform' with: 'GemTools-Platform.pharo10beta-dkh.5';
package: 'GemTools-Help' with: 'GemTools-Help-DaleHenrichs.24'. ].
188 Managing Projects with Metacello
Loading. load, load: The load method loads the default group and if there
is no default group defined, then all packages are loaded. The load: method
takes as parameter the name of a package, a project, a group, or a collection
of those items.
(ConfigurationOfCoolBrowser project version: '0.1') load.
(ConfigurationOfCoolBrowser project version: '0.2') load: {'CBrowser-Core' . 'CBrowser-
Addons'}.
Debugging. record, record: loadDirectives The message record does the record
for the default group and if you want a specific group of items, you can use
record:, just as it is for load.
| pkgs loader |
loader := ((Smalltalk globals at: #ConfigurationOfMoose) project version: 'default')
ignoreImage: true;
record.
Frameworks
Chapter 10
Glamour
Now that Glamour is installed, we are ready to build our first browser
by using Glamour’s declarative language. What about building an Apple’s
Finder-like file browser? This browser is built using the Miller Columns
browsing technique, displaying hierarchical elements in a series of columns.
The principle of this browser is that a column always reflects the content of
the element selected in the previous column, the first column-content being
chosen on opening.
In our case of navigating through the file systems, the browser displays a
list of a particular directory’s entries (each file and directory) in the first col-
umn and then, depending on the user selection, appending another column
(see Figure 10.1):
• if the user selects a directory, the next column will display the entries
of that particular directory;
• if the user selects a file, the next column will display the content of the
file.
This may look complex at first because of the recursion. However, Glam-
our provides an intuitive way of describing Miller Columns-based browsers.
According to the Glamour’s terminology this particular browser is called
finder, referring to the Apple’s Finder found on Mac OS X. Glamour offers
this behavior with the class GLMFinder. This class has to be instantiated and
initialized to properly list our domain of interest, the files:
| browser |
browser := GLMFinder new.
browser show: [:a |
a list
Installation and first browser 193
display: #children ].
browser openOn: FileSystem disk root.
Note that at that stage selecting a plain file raises an error. We will under-
stand why and how to fix that situation soon.
From this small piece of code you get a list of all entries (either files or
directories) found at the root of your file system, each line representing either
a file or a directory. If you click on a directory, you can see the entries of
this directory in the next column. The filesystem navigation facilities are
provided by the Filesystem framework, thoroughly discussed in Chapter 3.
This code has some problems however. Each line displays the full print
string of the entry and this is probably not what you want. A typical user
would expect only names of each entry. This can easily be done by customiz-
ing the list:
browser show: [:a |
a list
display: #children;
format: #basename ].
This way, the message basename will be sent to each entry to get its name.
This makes the files and directores much easier to read by showing the file
name instead of its fullname.
Another problem is that the code does not distinguish between files and
directories. If you click on a file, you will get an error because the browser
will send it the message children that it does not understand. To fix that, we
just have to avoid displaying a list of contained entries if the selected element
is a file:
browser show: [:a |
a list
when: #isDirectory;
display: #children;
format: #basename ].
This works well but the user can not distinguish between a line represent-
ing a file or a directory. This can be fixed by, for example, adding a slash at
the end of the file name if it is a directory:
browser show: [:a |
a list
when: #isDirectory;
display: #children;
format: #basenameWithIndicator ].
The last thing we might want to do is to display the contents of the entry
if it is a file. The following gives the final version of the file browser:
194 Glamour
| browser |
browser := GLMFinder new
variableSizePanes;
title: 'Find your file';
yourself.
This code extends the previous one with variable-sized panes, a title as
well as directory entry, access permission handling and file content reading.
The resulting browser is presented in Figure 10.1.
This short introduction has just presented how to install Glamour and
how to use it to create a simple file browser.
Running example
In the following tutorial we will be creating a simple Smalltalk class nav-
igator. Such navigators are used in many Smalltalk browsers and usually
consist of four panes, which are abstractly depicted in figure Figure 10.2.
The class navigator functions as follows: Pane 1 shows a list or a tree of
packages, each package containing classes, which make up the organizational
structure of the environment. When a package is selected, pane 2 shows a
list of all classes in the selected package. When a class is selected, pane 3
shows all protocols (a construct to group methods also known as method cat-
egories) and all methods of the class are shown on pane 4. When a protocol
is selected in pane 3, only the subset of methods that belong to that protocol
Presentation, Transmission and Ports 195
PBE2CodeNavigator class>>open
^ self new open
PBE2CodeNavigator>>open
self buildBrowser.
browser openOn: self organizer.
PBE2CodeNavigator>>organizer
^ RPackageOrganizer default
PBE2CodeNavigator>>buildBrowser
browser := GLMTabulator new.
PBE2CodeNavigator>>packagesIn: constructor
constructor list
display: [:organizer | organizer packageNames sorted];
format: #asString
Glamour browsers are composed in terms of panes and the flow of data
between them. In our browser we currently have only one pane displaying
packages. The flow of data is specified by means of transmissions. These
are triggered when certain changes in the browser graphical user interface
occur, such as an item selection in a list. We make our browser more useful
by displaying classes contained in the selected package (see Figure 10.3).
PBE2CodeNavigator>>buildBrowser
browser := GLMTabulator new.
browser
column: #packages;
column: #classes.
PBE2CodeNavigator>>classesIn: constructor
constructor list
display: [:packageName | (self organizer packageNamed: packageName)
definedClasses]
The listing above shows almost all of the core language constructs of
Glamour. Since we want to be able to reference the panes later, we give them
the distinct names “packages” and “classes” and arrange them in columns
using the column: keyword. Similarly, a row: keyword exists with which panes
can be organized in rows.
The transmit:, to: and from: keywords create a transmission—a directed con-
nection that defines the flow of information from one pane to another. In
this case, we create a link from the packages pane to the classes pane. The
from: keyword signifies the origin of the transmission and to: the destination.
If nothing more specific is stated, Glamour assumes that the origin refers to
Presentation, Transmission and Ports 197
Figure 10.3: Two-pane browser. When a package is selected in the left pane,
the contained classes are shown on the right pane.
the selection of the specified pane. We show how to specify other aspects of
the origin pane and how to use multiple origins below.
Finally, the andShow: specifies what to display on the destination pane
when the connection is activated or transmitted. In our example, we want to
show a list of the classes that are contained in the selected package.
The display: keyword simply stores the supplied block within the presen-
tation. The blocks will only be evaluated later, when the presentation should
be displayed on-screen. If no explicit display block is specified, Glamour at-
tempts to display the object in some generic way. In the case of list presenta-
tions, this means that the displayString message is sent to the object to retrieve
a standard string representation. As we have previously seen, format: is used
to change this default behavior.
Along with display:, it is possible to specify a when: condition to limit the
applicability of the connection. By default, the only condition is that an item
is in fact selected, i.e., that the display variable argument is not null.
Another Presentation
So far, packages are visually represented as a flat list. However, packages
are naturally structured with the corresponding class category. To exploit
this structure, we replace the list with a tree presentation for packages:
PBE2CodeNavigator>>packagesIn: constructor
constructor tree
display: [ :organizer | (self rootPackagesOn: organizer) asSet sorted ];
children: [ :rootPackage :organizer | (self childrenOf: rootPackage on: organizer)
sorted ];
format: #asString
PBE2CodeNavigator>>classesIn: constructor
constructor list
198 Glamour
PBE2CodeNavigator>>rootPackagesOn: organizer
^ organizer packageNames collect: [ :string | string readStream upTo: $- ]
PBE2CodeNavigator>>categoriesIn: constructor
constructor list
display: [:class | class organization categories]
The browser resulting from the above changes is shown in figure Fig-
ure 10.4.
Multiple Origins
Adding the list of methods as Pane 4 involves slightly more machinery.
When a method category is selected we want to show only the methods that
belong to that category. If no category is selected, all methods that belong to
the current class are shown.
This leads to our methods pane depending on the selection of two other
panes, the class pane and the category pane. Multiple origins can be defined
using multiple from: keywords as shown below.
Presentation, Transmission and Ports 199
Figure 10.4: Improved class navigator including a tree to display the pack-
ages and a list of method categories for the selected class.
PBE2CodeNavigator>>buildBrowser
browser := GLMTabulator new.
browser
column: #packages;
column: #classes;
column: #categories;
column: #methods.
PBE2CodeNavigator>>methodsIn: constructor
constructor list
display: [:class :category |
(class organization listAtCategoryNamed: category) sorted].
constructor list
when: [:class :category | class notNil and: [category isNil]];
display: [:class | class selectors sorted];
allowNil
The listing shows a couple of new properties. First, the multiple ori-
gins are reflected in the number of arguments of the blocks that are used
in the display: and when: clauses. Secondly, we are using more than one
presentation—Glamour shows all presentations whose conditions match in
the order that they were defined when the corresponding transmission is
fired.
In the first presentation, the condition matches when all arguments are
200 Glamour
defined (not null), this is the default for all presentations. The second con-
dition matches only when the category is undefined and the class defined.
When a presentation must be displayed even in the presence of an unde-
fined origin, it is necessary to use allowNil as shown. We can therefore omit
the category from the display block.
The completed class navigator is displayed in Figure 10.5.
Ports
When we stated that transmissions connect panes this was not entirely cor-
rect. More precisely, transmissions are connected to properties of panes
called ports. Such ports consist of a name and a value which accommodates
a particular aspect of state of the pane or its contained presentations. If the
port is not explicitly specified by the user, Glamour uses the selection port by
default. As a result, the following two statements are equivalent:
browser transmit from: #packages; to: #classes; andShow: [:a | ...].
browser transmit from: #packages port: #selection; to: #classes; andShow: [:a | ...].
Reusing Browsers
One of Glamour’s strengths is to use browsers in place of primitive presen-
tations such as lists and trees. This conveys formidable possibilities to com-
pose and nest browsers.
Composing and Interaction 201
A new class PBE2CodeEditor will implement this editor. An editor will del-
egate the presentation of panes 1 through 4 to the previously implemented
PBE2CodeNavigator. To achieve this, we first have to make the existing naviga-
tor return the constructed browser.
PBE2CodeNavigator>>buildBrowser
...
"new line"
^ browser
We can then reuse the navigator in the new editor browser as shown
below.
Object subclass: #PBE2CodeEditor
instanceVariableNames: 'browser'
classVariableNames: ''
poolDictionaries: ''
category: 'PBE2-CodeBrowser'.
PBE2CodeEditor class>>open
202 Glamour
PBE2CodeEditor>>open
self buildBrowser.
browser openOn: self organizer
PBE2CodeEditor>>organizer
^ RPackageOrganizer default
PBE2CodeEditor>>buildBrowser
browser := GLMTabulator new.
browser
row: #navigator;
row: #source.
PBE2CodeEditor>>navigatorIn: constructor
constructor custom: (PBE2CodeNavigator new buildBrowser)
The listing shows how the browser is used exactly like we would use a
list or other type of presentation. In fact, browsers are a type of presentation.
Evaluating PBE2CodeEditor open opens a browser that embeds the navi-
gator in the upper part and has an empty pane at the lower part. Source
code is not displayed yet because no connection has been made between
the panes so far. The source code is obtained by wiring the navigator with
the text pane: we need both the name of the selected method as well as the
class in which it is defined. Since this information is defined only within
the navigator browser, we must first export it to the outside world by using
sendToOutside:from:. For this we append the following lines to codeNavigator:
PBE2CodeNavigator>>buildBrowser
...
browser transmit from: #classes; toOutsidePort: #selectedClass.
browser transmit from: #methods; toOutsidePort: #selectedMethod.
^ browser
This will send the selection within classes and methods to the selected-
Class and selectedMethod ports of the containing pane. Alternatively, we
could have added these lines to the navigatorIn: method in the code editor—it
makes no difference to Glamour as follows:
PBE2CodeEditor>>navigatorIn: constructor
"Alternative way of adding outside ports. There is no need to use this
code and the previous one simultaneously."
| navigator |
Composing and Interaction 203
PBE2CodeEditor>>sourceIn: constructor
constructor text
display: [:class :method | class sourceCodeAt: method]
We can now view the source code of any selected method and have cre-
ated a modular browser by reusing the class navigator that we had already
written earlier. The composed browser described by the listing is shown in
figure 10.7.
Actions
Navigating through the domain is essential to finding useful elements. How-
ever, having a proper set of available actions is essential to letting one inter-
act with the domain. Actions may be defined and associated with a presen-
tation. An action is a block that is evaluated when a keyboard shortcut is
pressed or when an entry in a context menu is clicked. An action is defined
via act:on: sent to a presentation:
PBE2CodeEditor>>sourceIn: constructor
constructor text
display: [:class :method | class sourceCodeAt: method ];
act: [:presentation :class :method | class compile: presentation text] on: $s.
204 Glamour
Figure 10.7: Composed browser that reuses the previously described class
navigator to show the source of a selected method.
...
act: [:presentation :class :method | class compile: presentation text]
on: $s
entitled: 'Save'
Multiple Presentations
Frequently, developers wish to provide more than one presentation of a spe-
cific object. In our code browser for example, we may wish to show the
classes not only as a list but as a graphical representation as well. Glamour
includes support to display and interact with visualizations created using
the Mondrian visualization engine (presented in Chapter 12). To add a second
presentation, we simply define it in the using: block as well:
PBE2CodeNavigator>>classesIn: constructor
constructor list
when: [:packageName | self organizer includesPackageNamed: packageName ];
display: [:packageName | (self organizer packageNamed: packageName)
definedClasses];
title: 'Class list'.
constructor mondrian
when: [:packageName | self organizer includesPackageNamed: packageName];
painting: [ :view :packageName |
view nodes: (self organizer packageNamed: packageName)
definedClasses.
view edgesFrom: #superclass.
view treeLayout];
title: 'Hierarchy'
Other Browsers
We have essentially used the GLMTabulator which is named after its ability
to generate custom layouts using the aforementioned row: and column: key-
words. Additional browsers are provided or can be written by the user.
Browser implementations can be subdivided into two categories: browsers
that have explicit panes, i.e.,, they are declared explicitly by the user—and
browsers that have implicit panes.
The GLMTabulator is an example of a browser that uses explicit panes. With
implicit browsers, we do not declare the panes directly but the browser cre-
ates them and the connections between them internally. An example of such
a browser is the Finder, which has been discussed in Section 10.1. Since the
panes are created for us, we need not use the from:to: keywords but can sim-
ply specify our presentations:
browser := GLMFinder new.
browser list
display: [:class | class subclasses].
The listing above creates a browser (shown in figure 10.9) and opens to
show a list of subclasses of Collection. Upon selecting an item from the list,
the browser expands to the right to show the subclasses of the selected item.
This can continue indefinitely as long as something to select remains.
• Data flows along transmissions set with transmit from: #source; to: #target.
• A transmission may have several source.
• List and text panes are obtained by sending list and text to a browser.
Content is set with display: and items are formatted with format:.
1 http://www.themoosebook.org/book
Chapter 11
Roassal is known to work with the versions 1.4, 2.0, 3.0, and 4.0 of Pharo.
A first visualization.
The first visualization we will show represents the Collection class hierarchy.
It defines each class as a box connected with its subclasses. Each box displays
the number of methods and number of instance variables of the represented
class.
classElements
do: [ :c |
c width: c model instVarNames size.
c height: c model methods size.
c + ROBorder.
c @ RODraggable ].
view addAll: classElements.
associations := classElements
collect: [:c |
(c model superclass = Object)
ifFalse: [ (view elementFromModel: c
model superclass) -> c]]
thenSelect: [ :assoc | assoc isNil not ].
edges := ROEdge linesFor: associations.
view addAll: edges.
Roassal Easel
2 Note that a Glamour-based easel is also provided, under the Moose section of the World
menu. The Glamour-based Roassal easel is similar to the easel presented here. A dedicated
presentation of this version may be found in the moose book, http://themoosebook.org.
212 Agile Visualization with Roassal
here employed.
214 Agile Visualization with Roassal
The code above opens a window with two square elements, with the ori-
gin at the top left corner. We first create two elements of size 50 and 100,
respectively, and add them to the view using the addAll: message. We make
the two elements with borders and both are draggable. Note that in our ex-
ample the shape and the interaction are added before opening the view. It
can be done afterwards. Even once added and rendered, graphical compo-
nents are free to be modified.
An element may be translated by sending translateBy: or translateTo: with
a point as parameter. The parameter representing the step or the position
in pixels. The axes are defined as shown in Figure 11.2, the x-axis increases
from left to right and the y-axis from top to bottom.
Nesting Elements. A ROElement object can also contain other ROElement ob-
jects. We refer to this containment relationship as nesting. Nesting enables
elements to be structured as a tree. In addition, as shown by the following
example, the location of children is relative to that of the parent. This means
that when we translate the parent, the children will be translated as well.
216 Agile Visualization with Roassal
Translating the view’s camera. A view also answers to the translateBy: and
translateTo: messages. Even if it looks like it, it is not the view that changes its
position but its camera. The camera component of a view, represented by an
instance of ROCamera, is the point of view from which a visualization object
is actually viewed. More about the camera can be found in Section 11.8
1. Add all data with no particular shape. In this case data is the Collection
class with all its subclasses;
In this section, we start with the first step: adding all elements represent-
ing each class of the hierarchy.
We can do this easily by sending the forCollection: message to the ROElement
class, which is a helper to build ROElements from a collection. Each ROElement
from the returning value of this message is a representation of each element
from the parameter. We add a border shape to each of them and make them
draggable for easier manipulation. Finally, we apply a default layout to see
all the elements in the view. More explanation of how layouts work will
follow later.
218 Agile Visualization with Roassal
ROElement new
model: 'foo';
size: 100;
+ ROLabel.
ROElement new
size: 100;
+ ROBorder.
ROElement new
size: 200;
+ (ROBox new
color: Color green;
borderColor: Color red;
borderWidth: 4 ).
We now will add some shapes to the classes in the Collection hierarchy exam-
ple. Each class representation will have a width representing the number of
instance variables of the class and a height representing the number of its
methods.
view := ROView new.
classElements := ROElement forCollection: Collection withAllSubclasses.
classElements do: [ :c |
c width: c model instVarNames size.
c height: c model methods size.
c + ROBorder.
c @ RODraggable ].
view addAll: classElements.
ROHorizontalLineLayout new on: view elements.
view open.
Adding shape to an edge. There are several kinds of line shapes to use
besides the standard one, like ROOrthoHorizontalLineShape. All of them are
subclasses of the ROAbstractLine class, including ROLine. Some examples are
shown in Figure 11.12 and Figure 11.13.
edge + ROLine.
edge + ROOrthoHorizontalLineShape.
Adding an arrow to a line. A line can also contain one or more arrows.
An arrow is an instance of a subclass of ROAbstractArrow, like ROArrow or
ROHorizontalArrow. To add an arrow to a line shape we use the add: message,
as in Figure 11.14 and Figure 11.15.
By default the arrow will be located at the end of the edge, but we can
customize this position using the add:offset:. The offset parameter must be a
number between 0 and 1. It indicates in which percent of the line length the
arrow will be. For example, if the offset is 0.5, the arrow will be set at the
middle of the line, as shown in Figure 11.16.
When a line contains more than one arrow we can setup different offsets
for each arrow:
Edges: linking elements 223
Now we know how to make links between elements. With the following
code we can create edges between each class to its superclass. To do so, we
first need to create a collection of associations to build edges with them. Each
association represents a starting point as the association key and an ending
point as the association value. For this example each association goes from a
ROElement representing a class to the ROElement that represents its superclass.
Once we have the associations, we create the instances of ROEdge by us-
ing the linesFor: message. This message takes as parameter a collection of
associations and returns a collection of edges.
Figure 11.18: Adding links between each class and its superclass
Now we have each class in the Collection hierarchy with the shape we
want and connected with each superclass. However we do not see a real
hierarchy. This is because we need an appropriate layout to arrange all the
elements of the view. The next section covers how to apply layouts to ele-
ments.
11.5 Layouts
A layout defines how a collection of elements is automatically arranged. To
apply a layout, use the on: message with a collection of ROElements as parame-
ter. In the example shown in in Figure 11.19 we use the spriteOn: convenience
message to create a collection of ROElements, each one with size 50, shaped
with a red border and draggable. We then apply a layout to arrange the
elements on a grid.
of elements are arranged with two layouts. The first one aligns elements
along a vertical line and the second along a horizontal line. We first cre-
ate elements for the vertical line, apply the ROVerticalLineLayout and shape
them with a label. We then do the same for the second group, using the
ROHorizontalLineLayout and spacing them to avoid overlapping.
226 Agile Visualization with Roassal
classVariableNames: ''
poolDictionaries: ''
category: 'Roassal-Layout'
The instance variable initialPosition defines where the virtual line starts,
which means, where the first element of the line will be located. This variable
is set in an initialize method:
RODiagonalLineLayout >> initialize
super initialize.
initialPosition := 0@0.
| view elements |
view := ROView new.
elements := ROElement spritesOn: (1 to: 3).
view addAll: elements.
RODiagonalLineLayout on: view elements.
view open.
One key point of the layouts in Roassal is to consider the size of the el-
ements to layout. When defining a new layout, remember to make your
algorithm use the element size.
Layouts 229
Figure 11.24: Collection class hierarchy with width representing the number
of instance variables and height the number of methods.
Some interactions are more complex to set up, like popup elements which
are displayed when the mouse is over an element.
232 Agile Visualization with Roassal
From the available interactions in Roassal, only a few examples are pre-
sented here.
ROAbstractPopup
Figure 11.26: ROPopupView that creates a view with the same number of ele-
ments as the model of the element the mouse is over.
RODynamicEdge
A recurrent need when visualizing data elements and their relations is show-
ing outgoing edges when the mouse points to an element. Instead of trying
to get the right mixture of callbacks when entering or leaving the element,
the interaction RODynamicEdge considerably eases the task.
The following example makes some lines appear when the mouse hovers
over some elements:
| rawView el1 el2 el3 |
rawView := ROView new.
rawView add: (el1 := ROBox element size: 20).
rawView add: (el2 := ROBox element size: 20).
rawView add: (el3 := ROBox element size: 20).
ROCircleLayout on: (Array with: el1 with: el2 with: el3).
el1 @ RODraggable.
el2 @ RODraggable.
el3 @ RODraggable.
el1 @ (RODynamicEdge toAll: (Array with: el2 with: el3) using: (ROLine arrowed color:
Color red)).
rawView open
234 Agile Visualization with Roassal
ROAnimation
A camera has an extent, which is what we are seeing, and a real extent,
which represents the far extent. The extent of the view’s camera affects the
way a view is drawn in a canvas. When rendering a view, each point, rect-
angle or other shape that needs to be drawn will be plotted according to
the camera’s extent. This is done by transforming each absolute position in
virtual points relative to the camera’s vision. For example, when zooming
in on a view, the content on the extent is “stretched” to fill the real extent,
which makes objects bigger. The extent and the real extent of the camera are
modified using extent: and realExtent: accessors, respectively. The camera also
stores the window size of the visualization.
The camera has an altitude from the view, which is computed using the
extent. The smaller the extent is, the lower the camera is located, and vice-
versa. The altitude of the camera can be set by sending the altitude: message
using a number as parameter. A camera cannot be rotated, only translated.
This also means that the camera is always perpendicularly looking at the
view.
Figure 11.28 illustrates what we have just mentioned. It indicates all of
the information regarding the view for which it is associated.We also see that
the visible part of the visualization is given by the camera’s extent.
236 Agile Visualization with Roassal
The ROZoomMove interaction affects the extent of the camera. This inter-
action modifies the camera’s position and extends it to fit a desired rectan-
gle. For example, when zooming in to focus on a particular element of the
view, the ROZoomMove translates and extends the camera to fit that element’s
bounds. This movement is simulated by changing the camera’s altitude.
Using the camera to build a minimap for navigation. The interaction and
animation model offered by Roassal support complex behavior. Consider
the following code:
| view eltos |
view := ROView new.
view @ RODraggable .
view on: ROMouseRightClick do: [ :event |
ROZoomInMove new on: view ].
view on: ROMouseLeftClick do: [ :event |
ROZoomOutMove new on: view ].
Understanding a View’s Camera 237
It opens a view with 400 labelled elements and elements are ordered us-
ing a grid layout. Pressing the left mouse button zooms in the view. The
right mouse button zooms out. Pressing the m key will open a minimap.
This feature is enabled using the ROMiniMap interaction.
The ROMiniMap opens a new window that gives a complete vision of a vi-
sualization. It also eases the navigation by using the original view’s camera.
The minimap is composed of a smaller version of the visualization and a
lupa (magnifying glass), which represents the current visible part of the main
view’s window.
Coming back to our main example, the interaction is simply added by
sending the @ROMiniMap message to a view and pressing “m” to open it (Fig-
ure 11.29).
which has a different extent than the view’s camera. This allows one to see
the same view with different sizes.
The magnifier size represents the visible part of the window and its po-
sition is related to the view’s camera position. When the view is translated
to a point, the magnifier follows it by changing its position: the point repre-
senting the camera position is translated to a point on the ROMiniMapDisplayer
camera extent. And when the view is zoomed in or zoomed out the extent
of the camera is changed, increasing or decreasing the magnifier’s size.
• The Roassal Core, a set of packages that define all the main classes, like
ROView, ROElement, ROShape and ROCamera. It also contains all the tests.
Mondrian is based on Roassal. Check the Roassal chapter for installation pro-
cedures. If you are using a Moose distribution of Pharo 2 , then you already
have Roassal.
1 http://themoosebook.org/book/internals/mondrian
2 http://www.moosetechnology.org/
242 Scripting Visualizations with Mondrian
A First Visualization
You can get a first visualization by entering and executing the following code
in a workspace. By executing the following in a workspace, you should see
the Collection class hierarchy.
| view |
view := ROMondrianViewBuilder new.
view shape rectangle
width: [ :cls | cls numberOfVariables * 5 ];
height: #numberOfMethods;
linearFillColor: #numberOfLinesOfCode within: Collection withAllSubclasses.
view interaction action: #browse.
view nodes: ROShape withAllSubclasses.
view edgesFrom: #superclass.
view treeLayout.
view open
• the class shading indicates the amount of lines of code the class contains.
The class colored in black contains the most lines of code. The white
class contains the smallest quantity of lines of code.
We will detail and review all the mechanisms involved in the visualiza-
tion later on.
To define shapes, use the shape message followed by the desired shape
with its characteristics, before the node or nodes definition. This will locally
define the shape for the nodes.
By using the nodes: message with a collection of objects you can create
several nodes.
244 Scripting Visualizations with Mondrian
If the node or nodes have nested nodes, use the node:forIt: or nodes:forEach:
message to add them. The second parameter is a block which will add the
nested nodes, as the following code shows:
view treeLayout.
view open.
There are essentially two ways to work with Mondrian, either using the
easel or a view renderer. The easel is a tool in which users may interactively
and incrementally build a visualization by means of a script. The easel is par-
ticularly useful when prototyping. MOViewRenderer enables a visualization to
be programmatically built, in a non-interactive fashion. You probably want
to use this class when embedding your visualization in your application.
We will first use Mondrian in its easiest way, by using the easel. To open
an easel, you can either use the World menu (it should contain the entry
“Mondrian Easel”) or execute the expression:
ROEaselMorphic open.
In the easel you have just opened, you can see two panels: the one on
top is the visualization panel, the second one is the script panel. In the script
panel, enter the following code and press the generate button:
view nodes: (1 to: 20).
You should see in the top pane 20 small boxes lined up in the top left
corner. You have just rendered the numerical set between 1 and 20. Each
box represents a number. The amount of interaction you can do is quite
limited for now. You can only drag and drop a number and get a tooltip that
indicates its value. We will soon see how to define interactions. For now, let
us explore the basic drawing capabilities of Mondrian.
We can add edges between nodes that we already drawn. Add a second
line:
view nodes: (1 to: 20).
view edgesFrom: [ :v | v * 2 ].
Visualizing the Collection framework 247
Each number is linked with its double. Not all the doubles are visible.
For example, the double of 20 is 40, which is not part of the visualization. In
that case, no edge is drawn.
The message edgesFrom: defines one edge per node, when possible. For
each node that has been added in the visualization, an edge is defined be-
tween this node and a node lookup from the provided block.
Mondrian contains a number of layouts to order nodes. Here, we use the
circle layout:
view nodes: (1 to: 20).
view edgesFrom: [ :v | v * 2 ].
view circleLayout.
Figure 12.1 shows the result. Each class is represented as a box. The
Collection class (the root of the hierarchy) is the top most box. The width
of a class conveys the amount of instance variables it has. We multiply it
by 3 for more contrasting results. The height conveys the number of meth-
ods. We can immediately spot classes with many more methods than others:
Multiple edges 249
Figure 12.1: The system complexity for the collection class hierarchy.
String and CompiledMethod clearly shows up. These two classes contains
many references to other classes. We also see that text: makes a shape contain
a text.
Mondrian provides a set of utility methods to easily create elements. Con-
sider the expression:
Colored shapes 251
itself equivalent to
view
edges: Collection withAllSubclasses
from: [ :each | each superclass ]
to: [ :each | each yourself ].
Figure 12.3: Abstract classes are in gray and classes with the word “Abstract”
in their name are in blue.
252 Scripting Visualizations with Mondrian
Similar as with height: and width:, messages to define color either take a
symbol, a block or a constant value as argument. The argument is evaluated
against the domain object represented by the graphical element (a double
dispatch sends the message moValue: to the argument). The use of ifTrue:ifFalse:
is not really practicable. Utilities methods are provided for that purpose to
easily pick a color from a particular condition. The definition of the shape
can simply be:
view shape rectangle
size: 10;
if: [ :cls | ('*Array*' match: cls name) ] borderColor: Color blue;
if: [ :cls | cls hasAbstractMethods ] fillColor: Color lightGray;
...
Figure 12.4: The system complexity visualization: nodes are classes; height
is the number of lines of methods; width the number of variables; color con-
veys about the number of lines of code.
Figure 12.5: Boxes are classes and links are inheritance relationships. The
amount of abstract method is indicated by the size of the class. A red class
defines abstract methods and a pink class solely inherits from an abstract
class.
So far, we have seen that an element has a shape to describe its graphical
representation. It also contains an interaction that contains event handlers.
The message popupText: takes a block as argument. This block is evaluated
with the domain object as argument. The block has to return the popup text
content. In our case, it is simply a list of the methods.
In addition to a textual content, Mondrian allows a view to be popped
up. We will enhance the previous example to illustrate this point. When the
mouse enters a node, a new view is defined and displayed next to the node.
view interaction popupView: [ :element :secondView |
Subviews 255
12.9 Subviews
A node is a view in itself. This allows for a graph to be embedded in any
node. The embedded view is physically bounded by the encapsulating node.
The embedding is realized via the keywords nodes:forEach: and node:forIt:.
The following example approximates the dependencies between meth-
ods by linking methods that may call each other. A method m1 is connected
to a method m2 if m1 contains a reference to the selector #m2. This is a simple
but effective way to see the dependencies between methods. Consider:
view nodes: ROShape withAllSubclasses forEach: [:cls |
view nodes: cls methods.
view edges: cls methods from: #yourself toAll: [ :cm | cls methods select: [ :rcm | cm
messages anySatisfy: [:s | rcm selector == s ] ] ].
view treeLayout
].
view interaction action: #browse.
view edgesFrom: #superclass.
view treeLayout.
256 Scripting Visualizations with Mondrian
Figure 12.6: Large boxes are classes. Inner boxes are methods. Edges show a
possible invocation between the two.
12.11 Events
Each mouse movement, click and keyboard keystroke corresponds to a par-
ticular event. Mondrian offers a rich hierarchy of events. The root of the
hierarchy is MOEvent. To associate a particular action to an event, a handler
has to be defined on the object interaction. In the following example, clicking
on a class opens a code browser:
view shape rectangle
width: [ :each | each instVarNames size * 5 ];
height: [ :each | each methods size ];
if: #hasAbstractMethods fillColor: Color lightRed;
if: [:cls | cls methods anySatisfy: #isAbstract ] fillColor: Color red.
The block handler accepts one argument: the event generated. The ob-
ject that triggered the event is obtained by sending modelElement to the event
object.
12.12 Interaction
Mondrian offers a number of contextual interaction mechanisms. The inter-
action object contains a number of keywords for that purpose. The message
highlightWhenOver: takes a block as argument. This block returns a list of the
nodes to highlight when the mouse enters a node. Consider the example:
view interaction
highlightWhenOver: [:v | {v - 1 . v + 1. v + 4 . v - 4}].
view shape rectangle
width: 40;
height: 30;
withText.
view nodes: (1 to: 16).
view gridLayout gapSize: 2.
hand size a hierarchy of unit tests is displayed. Locating the mouse pointer
above a unit test highlights the classes that are referenced by one of the unit
test methods. Consider the (rather long) script:
The script contains two parts. The first part is the ubiquitous system
complexity of the collection framework. The second part renders the tests
contained in the CollectionsTests. The width of a class is the number of lit-
erals contained in it. The height is the number of lines of code. Since the
collection tests makes a great use of traits to reuse code, these metrics have
to be scaled down. When the mouse is placed over a test unit, then all the
classes of the collection framework referenced in this class are highlighted.
Chapter summary 259
• The most common way to define nodes is with nodes: and edges with
edgesFrom:, edges:from:to: and edges:from:toAll:.
• A whole range of layout is offered. The most common layouts are ac-
cessible by sending circleLayout, treeLayout, gridLayout to a view.
This chapter was about the Mondrian domain specific language. Mon-
drian is an older visualization framework developed by Tudor Girba and
Michael Meyer in 2005. Mondrian has been maintained from 2008 until 2009
by Alexandre Bergel.
Language
Chapter 13
Handling Exceptions
13.1 Introduction
Modern programming languages, including Smalltalk offer a dedicated
exception-handling mechanism that greatly simplifies the way in which ex-
ceptional situations are signaled and handled. Before the development of the
ANSI Smalltalk standard in 1996, several exception handling mechanisms
existed, mostly incompatible with each other. Pharo’s exception handling
follows the ANSI standard, with some embellishments; we present it in this
chapter from a user perspective.
The basic idea behind exception handling is that client code does not clut-
ter the main logic flow with checks for error codes, but specifies instead an
exception handler to “catch” exceptions. When something goes wrong, instead
of returning an error code, the method that detects the exceptional situation
interrupts the main flow of execution by signaling an exception. This does
264 Handling Exceptions
two things: it captures essential information about the context in which the
exception occurred, and transfers control to the exception handler, written
by the client, which decides what to do about it. The “essential information
about the context” is saved in an Exception object; various classes of Exception
are specified to cover the varied exceptional situations that may arise.
Pharo’s exception-handling mechanism is particularly expressive and
flexible, covering a wide range of possibilities. Exception handlers can be
used to ensure that certain actions take place even if something goes wrong,
or to take action only if something goes wrong. Like everything in Smalltalk,
exceptions are objects, and respond to a variety of messages. When an excep-
tion is caught by a handler, there are many possible responses: the handler
can specify an alternative action to perform; it can ask the exception object
to resume the interrupted operation; it can retry the operation; it can pass the
exception to another handler; or it can reraise a completely different excep-
tion.
This code ensures that the writer file handle will be closed, even if an error
occurs in Form fromUser or while writing to the file.
Here is how it works in more detail. The nextPutImage: method of the class
GIFReadWriter converts a form (i.e., an instance of the class Form, representing
a bitmap image) into a GIF image. This method writes into a stream which
has been opened on a file. The nextPutImage: method does not close the stream
it is writing to, therefore we should be sure to close the stream even if a prob-
lem arises while writing. This is achieved by sending the message ensure: to
the block that does the writing. In case nextPutImage: fails, control will flow
into the block passed to ensure:. If it does not fail, the ensured block will still
be executed. So, in either case, we can be sure that writer is closed.
Here is another use of ensure:, in class Cursor:
Handling non-local returns 265
Cursor»showWhile: aBlock
"While evaluating the argument, aBlock,
make the receiver be the cursor shape."
| oldcursor |
oldcursor := Sensor currentCursor.
self show.
^aBlock ensure: [ oldcursor show ]
Open a transcript and evaluate the code above in a workspace. When the pre-
debugger windows opens, first try selecting Proceed and then Abandon . Note that
the argument to ifCurtailed: is evaluated only when the receiver terminates abnor-
mally. What happens when you select Debug ?
Here are some examples of ifCurtailed: usage: the text of the Transcript show:
describes the situation:
[^ 10] ifCurtailed: [Transcript show: 'This is displayed'; cr]
266 Handling Exceptions
Both ensure: and ifCurtailed: are very useful for making sure that important
“cleanup” code is executed, but are not by themselves sufficient for handling
all exceptional situations. Now let’s look at a more general mechanism for
handling exceptions.
aBlock is the code that detects an abnormal situation and signals an excep-
tion; called the protected block. handlerAction is the block that is evaluated if an
exception is signaled and called the exception handler. exceptionClass defines
the class of exceptions that handlerAction will be asked to handle.
The message on:do: returns the value of the receiver (the protected block)
and when an error occurs it returns the value of the handlerAction block as
illustrated by the following expressions:
Exception handlers 267
The beauty of this mechanism lies in the fact that the protected block can
be written in a straightforward way, without regard to any possible errors. A
single exception handler is responsible for taking care of anything that may
go wrong.
Consider the following example where we want to copy the contents of
one file to another. Although several file-related things could go wrong, with
exception handling, we simply write a straight-line method, and define a
single exception handler for the whole transaction:
| source destination fromStream toStream |
source := 'log.txt'.
destination := 'log-backup.txt'.
[ fromStream := FileStream oldFileNamed: (FileSystem workingDirectory / source).
[ toStream := FileStream newFileNamed: (FileSystem workingDirectory / destination).
[ toStream nextPutAll: fromStream contents ]
ensure: [ toStream close ] ]
ensure: [ fromStream close ] ]
on: FileStreamException
do: [ :ex | UIManager default inform: 'Copy failed -- ', ex description ].
A Buggy Solution. Study the following code and see why it is wrong.
268 Handling Exceptions
If any exception other than FileStreamException happens, the files are not
properly closed.
^ failure ].
contents := fromStream contents.
contents ifNil: [
fromStream close.
toStream close.
UIManager default inform: 'Copy failed -- source file has no contents'.
^ failure ].
result := toStream nextPutAll: contents.
result ifFalse: [
fromStream close.
toStream close.
UIManager default inform: 'Copy failed -- could not write to ', destination.
^ failure ].
fromStream close.
toStream close.
^ success.
What a mess! Without exception handling, we must explicitly check the re-
sult of each operation before proceeding to the next. Not only must we check
error codes at each point that something might go wrong, but we must also
be prepared to cleanup any operations performed up to that point and abort
the rest of the code.
If you are wondering how this works, have a look at the implementation
of Exception class»,
Exception class», anotherException
"Create an exception set."
^ExceptionSet new add: self; add: anotherException; yourself
The rest of the magic occurs in the class ExceptionSet, which has a surpris-
ingly simple implementation.
Object subclass: #ExceptionSet
instanceVariableNames: 'exceptions'
classVariableNames: ''
poolDictionaries: ''
Signaling an exception 271
category: 'Exceptions-Kernel'
ExceptionSet»initialize
super initialize.
exceptions := OrderedCollection new
ExceptionSet», anException
self add: anException.
^self
ExceptionSet»add: anException
exceptions add: anException
ExceptionSet»handles: anException
exceptions do: [:ex | (ex handles: anException) ifTrue: [^true]].
^false
with the exception as its sole argument. We will see shortly some of the ways
in which the handler can use the exception object.
When signaling an exception, it is possible to provide information spe-
cific to the situation just encountered, as illustrated in the code below. For
example, if the file to be opened does not exist, the name of the non-existent
file can be recorded in the exception object:
StandardFileStream class»oldFileNamed: fileName
"Open an existing file with the given name for reading and writing. If the name has no
directory part, then default directory will be assumed. If the file does not exist, an
exception will be signaled. If the file exists, its prior contents may be modified or
replaced, but the file will not be truncated on close."
| fullName |
fullName := self fullName: fileName.
^(self isAFileNamed: fullName)
ifTrue: [self new open: fullName forWrite: true]
ifFalse: ["File does not exist..."
(FileDoesNotExistException new fileName: fullName) signal]
The exception handler may make use of this information to recover from
the abnormal situation. The argument ex in an exception handler [:ex | ...] will
be an instance of FileDoesNotExistException or of one of its subclasses. Here the
exception is queried for the filename of the missing file by sending it the
message fileName.
| result |
result := [(StandardFileStream oldFileNamed: 'error42.log') contentsOfEntireFile]
on: FileDoesNotExistException
do: [:ex | ex fileName , ' not available'].
Transcript show: result; cr
derstood, can be stored in the exception, and thus made available to the
debugger.
Object»doesNotUnderstand: aMessage
"Handle the fact that there was an attempt to send the given message to the receiver
but the receiver does not understand this message (typically sent from the machine
when a message is sent to the receiver and no method is defined for that selector).
"
MessageNotUnderstood new
message: aMessage;
receiver: self;
signal.
^ aMessage sentTo: self.
That completes our description of how exceptions are used. The remain-
der of this chapter discusses how exceptions are implemented and adds
some details that are relevant only if you define your own exceptions.
1. Look in the current activation context for a handler, and test if that
handler canHandleSignal: E.
2. If no handler is found and the stack is not empty, go down the stack
and return to step 1.
Without the second handler, the nested exception will not be caught, and
the debugger will be invoked.
An alternative would be to specify the second handler within the first
one:
result := [ Error signal: 'error 1' ]
on: Exception
do: [[ Error signal: 'error 2' ]
on: Exception
do: [:ex | ex description ]].
result −→ 'Error: error 2'
(ii) return an alternative result for the protected block by sending return:
aValue to the exception object;
276 Handling Exceptions
(iii) retry the protected block, by sending retry, or try a different block by
sending retryUsing:;
(iv) resume the protected block at the failure point by sending resume or
resume:;
(v) pass the caught exception to the enclosing handler by sending pass; or
We will briefly look at the first three possibilities, and then take a closer
look at the remaining ones.
The handler takes over from the point where the error is signaled, and
any code following in the original block is not evaluated.
The ANSI standard is not clear regarding the difference between using
do: [:ex | 100 ] and do: [:ex | ex return: 100] to return a value. We suggest that you
use return: since it is more intention-revealing, even if these two expressions
are equivalent in Pharo.
Handling exceptions 277
As another example, keep in mind the file handling code we saw earlier
in which we printed a message to the Transcript when a file is not found.
Instead, we could prompt for the file as follows:
278 Handling Exceptions
Resuming execution
A method that signals an exception that isResumable can be resumed at the
place immediately following the signal. An exception handler may therefore
perform some action, and then resume the execution flow. This behavior is
achieved by sending resume: to the exception in the handler. The argument
is the value to be used in place of the expression that signaled the exception.
In the following example we signal and catch MyResumableTestError, which is
defined in the Tests-Exceptions category:
result := [ | log |
log := OrderedCollection new.
log addLast: 1.
log addLast: MyResumableTestError signal.
log addLast: 2.
log addLast: MyResumableTestError signal.
log addLast: 3.
log ]
on: MyResumableTestError
do: [ :ex | ex resume: 0 ].
result −→ an OrderedCollection(1 0 2 0 3)
Here we can clearly see that the value of MyResumableTestError signal is the
value of the argument to the resume: message.
The message resume is equivalent to resume: nil.
The usefulness of resuming an exception is illustrated by the following
functionality which loads a package. When installing packages, warnings
may be signaled and should not be considered fatal errors, so we should
simply ignore the warning and continue installing.
The class PackageInstaller does not exist, though here is a sketch of a possi-
ble implementation.
PackageInstaller»installQuietly: packageNameCollection
....
[ self install ] on: Warning do: [ :ex | ex resume ].
ResumableLoader»readOptionsFrom: aStream
| option |
[aStream atEnd]
whileFalse: [option := self parseOption: aStream.
"nil if invalid"
option isNil
ifTrue: [InvalidOption signal]
ifFalse: [self addOption: option]].
Note that to be sure to close the stream, the stream close should guarded
by an ensure: invocation.
Depending on user input, the handler in readConfiguration might return
nil, or it might resume the exception, causing the signal message send in
readOptionsFrom: to return and the parsing of the options stream to continue.
Note that InvalidOption must be resumable; it suffices to define it as a sub-
class of Exception.
You can have a look at the senders of resume: to see how it can be used.
Passing exceptions on
To illustrate the remaining possibilities for handling exceptions such as pass-
ing an exception, we will look at how to implement a generalization of the
perform: method. If we send perform: aSymbol to an object, this will cause the
message named aSymbol to be sent to that object:
5 perform: #factorial −→ 120 "same as: 5 factorial"
These perform:-like methods are very useful for accessing an interface dynam-
ically, since the messages to be sent can be determined at run-time. One
280 Handling Exceptions
The goal of this section has been to demonstrate the power of exceptions.
It should be clear that while you can do almost anything with exceptions, the
code that results is not always easy to understand. There is often a simpler
way to get the same effect without exceptions; see method 13.2 on page 289
for a better way to implement performAll:.
Resending exceptions
Suppose that in our performAll: example we no longer want to ignore selectors
not understood by the receiver, but instead we want to consider an occur-
rence of such a selector as an error. However, we want it to be signaled as an
application-specific exception, let’s say InvalidAction, rather than the generic
MessageNotUnderstood. In other words, we want the ability to “resignal” a
signaled exception as a different one.
It might seem that the solution would simply be to signal the new ex-
ception in the handler block. The handler block in our implementation of
performAll: would be:
exception, then control will be returned to the point where outer was sent,
not the original point where the exception was signaled:
passResume := [[ Warning signal . 1 ] "resume to here"
on: Warning
do: [ :ex | ex pass . 2 ]]
on: Warning
do: [ :ex | ex resume ].
passResume −→ 1 "resumes to original signal point"
[[ 1/0 ]
ifCurtailed: [ Transcript show: 'then should show curtailed'; cr. 6 ]]
on: Error do: [ :e |
Transcript show: 'should show first error'; cr.
e return: 4 ].
First the [1/0] raises a division by zero error. This error is handled by the
exception handler. It prints the first message. Then it returns the value 4 and
since the receiver raised an error, the argument of the ifCurtailed: message is
evaluated: it prints the second message. Note that ifCurtailed: does not change
the return value expressed by the error handler or the ifCurtailed: argument.
The following expression shows that when the stack is not unwound the
expression value is simply returned and none of the handlers are executed.
1 is returned.
[[ 1 ]
Exceptions and ensure:/ifCurtailed: interaction 283
The following expression shows that when an error occurs the handler
associated with the error is executed before the ensure: argument. Here the
expression prints should show error first, then then should show ensure and it re-
turns 4.
[[ 1/0 ]
ensure: [ Transcript show: 'then should show ensure'; cr. 6 ]]
on: Error do: [ :e |
Transcript show: 'should show error first'; cr.
e return: 4 ].
Finally the last expression shows that errors are executed one by one from
the closest to the farthest from the error, then the ensure: argument. Here
error1, then error2, and then then should show ensure are displayed.
[[[ 1/0 ] ensure: [ Transcript show: 'then should show ensure'; cr. 6 ]]
on: Error do: [ :e|
Transcript show: 'error 1'; cr.
e pass ]] on: Error do: [ :e |
Transcript show: 'error 2'; cr. e return: 4 ].
284 Handling Exceptions
Object»halt
"This is the typical message to use for inserting breakpoints during
debugging. It behaves like halt:, but does not call on halt: in order to
avoid putting this message on the stack. Halt is especially useful when
the breakpoint message is an arbitrary one."
Halt signal
This code signals a new exception, UnhandledError, that conveys the idea
that no handler is present. The defaultAction of UnhandledError is to open a de-
bugger:
UnhandledError»defaultAction
"The current computation is terminated. The cause of the error should be logged or
reported to the user. If the program is operating in an interactive debugging
environment the computation should be suspended and the debugger activated."
^ UIManager default unhandledErrorDefaultAction: self exception
MorphicUIManager»unhandledErrorDefaultAction: anException
^ Smalltalk tools debugError: anException.
is a bit of a mess; you can expect to see some of the details change as Pharo
is improved.
The second thing that we notice is that there are two large sub-hierarchies:
Error and Notification. Errors tell us that the program has fallen into some kind
of abnormal situation. In contrast, Notifications tell us that an event has
occurred, but without the assumption that it is abnormal. So, if a Notification
is not handled, the program will continue to execute. An important subclass
of Notification is Warning; warnings are used to notify other parts of the system,
or the user, of abnormal but non-lethal behavior.
The property of being resumable is largely orthogonal to the location of
an exception in the hierarchy. In general, Errors are not resumable, but 10 of
its subclasses are resumable. For example, MessageNotUnderstood is a subclass
of Error, but it is resumable. TestFailures are not resumable, but, as you would
expect, ResumableTestFailures are.
Resumability is controlled by the private Exception method isResumable.
For example:
Exception new isResumable −→ true
Error new isResumable −→ false
Notification new isResumable −→ true
Halt new isResumable −→ true
MessageNotUnderstood new isResumable −→ true
288 Handling Exceptions
If you declare a new subclass of exceptions, you should look in its protocol
for the isResumable method, and override it as appropriate to the semantics
of your exception.
In some situations, it will never make sense to resume an exception. In
such a case you should signal a non-resumable subclass — either an existing
one or one of your own creation. In other situations, it will always be OK
to resume an exception, without the handler having to do anything. In fact,
this gives us another way of characterizing a notification: a Notification is a
resumable Exception that can be safely resumed without first modifying the
state of the system. More often, it will be safe to resume an exception only
if the state of the system is first modified in some way. So, if you signal a
resumable exception, you should be very clear about what you expect an
exception handler to do before it resumes the exception.
There is no instance variable here to store the exception class or the han-
dler, nor is there any place in the superclass to store them. However, note
that MethodContext is defined as a variableSubclass. This means that in addition
to the named instance variables, instances of this class have some indexed
slots. Every MethodContext has indexed slots, that are used to store, among
Exceptions implementation 291
In the protected block, we query the context that represents the protected
block execution using thisContext sender. This execution was triggered by the
on:do: message execution. The last line explores a 2-element array that con-
tains the exception class and the exception handler.
If you get some strange results using halt and inspect inside the protected
block, note that as the method is being executed, the state of the context ob-
ject changes, and when the method returns, the context is terminated, setting
to nil several of its fields. Opening an explorer on thisContext will show you
that the context sender is effectively the execution of the method on:do:.
Note that you can also execute the following code:
[thisContext sender explore] on: Error do: [:ex|].
You obtain an explorer and you can see that the exception class and the
handler are stored in the first and second variable instance variables of the
method context object (a method context represents an execution stack ele-
ment).
We see that on:do: execution stores the exception class and its handler on
the method context. Note that this is not specific to on:do: but any message
execution stores arguments on its corresponding context.
Figure 13.5: Explore a method context to find the exception class and the
handler.
primitiveMarkHandlerMethod
"Primitive. Mark the method for exception handling. The primitive must fail after
marking the context so that the regular code is run."
Now we know that the context corresponding to the method on:do: is marked
and a context has a direct reference through an instance variable to the
method it has activated. Therefore, we can know if the context is an excep-
tion handler by checking if the method it has activated holds primitive 199.
That’s what’s the method isHandlerContext is doing (code below).
MethodContext»isHandlerContext
"is this context for method that is marked?"
^method primitive = 199
ContextPart»nextHandlerContext
| value |
((self exceptionClass handles: exception)
and: [self exceptionHandlerIsActive])
ifFalse: [ ^ self nextHandlerContext handleSignal: exception ].
ContextPart»exceptionClass
"handlercontext only. access temporaries from BlockClosure>>#on:do:"
^self tempAt: 1
exceptionHandlerBlock
"handlercontext only. access temporaries from BlockClosure>>#on:do:"
^self tempAt: 2
exceptionHandlerIsActive
"handlercontext only. access temporaries from BlockClosure>>#on:do:"
^self tempAt: 3
exceptionHandlerIsActive: aBoolean
"handlercontext only. access temporaries from BlockClosure>>#on:do:"
self tempAt: 3 put: aBoolean
Notice how this method uses tempAt: 1 to access the exception class, and
ask if it handles the exception. What about tempAt: 3? That is the temporary
variable handlerActive of the on:do: method. Checking that handlerActive is true
and then setting it to false ensures that a handler will not be asked to handle
an exception that it signals itself. The return: message sent as the final action
of handleSignal is responsible for unwinding the stack, i.e., removing all the
context between the exception signaler context and its exception handler as
well as executing unwind blocks (blocks created with ensure).
To summarize, the signal method, with optional assistance from the vir-
tual machine for performance, finds the context that correspond to an on:do:
message with an appropriate exception class. Because the execution stack
is made up of a linked list of Context objects that may be manipulated just
like any other object, the stack can be shortened at any time. This is a superb
example of flexibility of Pharo.
Ensure:’s implementation 295
| complete returnValue |
<primitive: 198>
returnValue := self valueNoContextSwitch.
complete ifNil: [
complete := true.
aBlock value ].
^ returnValue
The <primitive: 198 > works the same way as the <primitive: 199 > we saw in
the previous section. It always fails, however, its presence marks the method
in way that can easily be detected from the context activating this method.
Moreover, the unwind block is stored the same way as the exception class
and its associated handler. More explicitly, it is stored in the context of ensure:
method execution, that can be accessed from the block through thisContext
sender tempAt: 1.
In the case where the block does not fail and does not have a non-local re-
turn, the ensure: message implementation executes the block, stores the result
in the returnValue variable, executes the argument block and lastly returns the
result of the block previously stored. The complete variable is here to prevent
the argument block from being executed twice.
Ensuring a failing block. The ensure: message will execute the argument
block even if the block fails. In the following example, the ensureWithOnDo
message returns 2 and executes 1. In the subsequent section we will carefully
look at where and what the block is actually returning and in which order
the blocks are executed.
Bexp>>ensureWithOnDo
^[ [ Error signal ] ensure: [ 1 ].
^3 ] on: Error do: [ 2 ]
Bexp>>mainBlock
^[ self traceCr: 'mainBlock start'.
self failingBlock ensure: self ensureBlock.
self traceCr: 'mainBlock end' ]
Bexp>>failingBlock
^[ self traceCr: 'failingBlock start'.
Error signal.
self traceCr: 'failingBlock end' ]
Bexp>>ensureBlock
^[ self traceCr: 'ensureBlock value'.
#EnsureBlockValue ]
Bexp>>exceptionHandlerBlock
^[ self traceCr: 'exceptionHandlerBlock value'.
#ExceptionHandlerBlockValue ]
Bexp>>start
| res |
self traceCr: 'start start'.
res := self mainBlock on: Error do: self exceptionHandlerBlock.
self traceCr: 'start end'.
self traceCr: 'The result is : ', res, '.'.
^ res
Executing Bexp new start prints the following (we added indentation to
stress the calling flow).
start start
mainBlock start
failingBlock start
exceptionHandlerBlock value
ensureBlock value
start end
The result is: ExceptionHandlerBlockValue.
There are three important things to see. First, the failing block and the
main block are not fully executed because of the signal message. Secondly, the
exception block is executed before the ensure block. Lastly, the start method
will return the result of the exception handler block.
To understand how this works, we have to look at the end of the excep-
tion implementation. We finish the previous explanation on the handleSignal
method.
ContextPart»handleSignal: exception
"Sent to handler (on:do:) contexts only. If my exception class (first arg) handles
exception then execute my handle block (second arg), otherwise forward this
Ensure:’s implementation 297
message to the next handler context. If none left, execute exception's defaultAction
(see nil>>handleSignal:)."
| value |
((self exceptionClass handles: exception)
and: [self exceptionHandlerIsActive])
ifFalse: [ ^ self nextHandlerContext handleSignal: exception ].
In our example, Pharo will execute the failing block, then will look for
the next handler context, marked with <primitive: 199 >. As a regular excep-
tion, Pharo finds the exception handler context, and runs the exceptionHan-
dlerBlock. The method handleSignal finishes with the return: method. Let’s
have a look into it.
ContextPart>>return: value
"Unwind thisContext to self and return value to self's sender. Execute any unwind
blocks while unwinding. ASSUMES self is a sender of thisContext"
The return: message will check if the context has a sender, and, if not, send
a CannotReturn Exception. Then the sender of this context will call the resume:
message.
resume: value
"Unwind thisContext to self and resume with value as result of last send. Execute
unwind blocks when unwinding. ASSUMES self is a sender of thisContext"
| context unwindBlock |
self isDead
ifTrue: [ self cannotReturn: value to: self ].
context := firstUnwindContext.
298 Handling Exceptions
This is the method where the argument block of ensure: is executed. This
method looks for all the unwind contexts between the context of the method
resume: and self, which is the sender of the on:do: context (in our case the
context of start). When the method finds an unwound context, the unwound
block is executed. Lastly, it triggers the terminateTo: message.
ContextPart>>terminateTo: previousContext
"Terminate all the Contexts between me and previousContext, if previousContext is on
my Context stack. Make previousContext my sender."
| currentContext sendingContext |
<primitive: 196>
(self hasSender: previousContext) ifTrue: [
currentContext := sender.
[currentContext == previousContext] whileFalse: [
sendingContext := currentContext sender.
currentContext terminate.
currentContext := sendingContext]].
sender := previousContext
Basically, this method terminates all the contexts between thisContext and
self, which is the sender of the on:do: context (in our case the context of start
). Moreover, the sender of thisContext will become self, which is the sender
of the on:do: context (in our case the context of start). It is implemented as a
primitive for performance only, so the primitive is optional and the fallback
code has the same behavior.
Let’s summarize what happens with Figure 13.6 which represents the
execution of the method ensureWithOnDo defined previously.
Ensuring a non local return. The method resume:through: is also called when
performing a non local return. In the case of non local return, the stack is
unwound in a similar way than or exception. The virtual machine, while
performing a non local return, send the message aboutToReturn:through: to the
active context. Therefore, if one has changed the implementation of excep-
tion in the language,
Ensure:’s implementation 299
Bexp>>ensureWithOnDo context 1
^[[Error signal] ensure: [1].
^3] on: Error do: [2] Bexp new
BlockClosure>>on: exception do: handlerAction context 2
| handlerActive | X
<primitive: 199> [[Error signal]
handlerActive := true. ensure: [1].^3]
^self value
BlockClosure>>ensure: aBlock context 3
| complete returnValue |
<primitive: 198> [Error signal]
returnValue := self valueNoContextSwitch. X
complete ifNil: [
complete := true.
aBlock value.].
^ returnValue
Exception class>>signal context 4
signalContext := thisContext contextTag.
signaler ifNil: [ signaler := self receiver ]. Error X
^ signalContext nextHandlerContext handleSignal: self
ContextPart>>handleSignal: exception context 5
| val |
((self exceptionClass handles: exception) context 2
and: [self exceptionHandlerIsActive]) ifFalse: [
^ self nextHandlerContext handleSignal: exception]. X
exception privHandlerContext: self contextTag.
self exceptionHandlerIsActive: false.
val := [self exceptionHandlerBlock cull: exception]
ensure: [self exceptionHandlerIsActive: true].
self return: val.
ContextPart>>return: value context 6
sender ifNil: [self cannotReturn: value to: sender]. X
sender resume: value context 2
ContextPart>>resume: value context 7
self resume: value through: (thisContext X
findNextUnwindContextUpTo: self) context 1
ContextPart>>resume: value through: firstUnwindContext context 8
| context unwindBlock |
self isDead context 1
ifTrue: [ self cannotReturn: value to: self ].
context := firstUnwindContext.
[ context isNil ] whileFalse: [
context unwindComplete ifNil:[
context unwindComplete: true.
unwindBlock := context unwindBlock.
thisContext terminateTo: context.
unwindBlock value].
context := context findNextUnwindContextUpTo: self].
thisContext terminateTo: self.
^value
Legend
14.1 Basics
What is a block? A block is a lambda expression that captures (or closes over)
its environment at creation-time. We will see later what it means exactly. For
now, imagine a block as an anonymous function or method. A block is a
piece of code whose execution is frozen and can be kicked in using messages.
Blocks are defined by square brackets.
If you execute and print the result of the following code, you will not get
3, but a block. Indeed, you did not ask for the block value, but just for the
block itself, and you got it.
[1+2]
−→ [1+2]
[ :x | x + 2 ] value: 5
−→ 7
Other messages. Some messages are useful to profile evaluation (more in-
formation in the Chapter 17):
bench. Return how many times the receiver block can be evaluated in 5
seconds.
Some messages are related to error handling (as explained in the Chap-
ter 13).
ifCurtailed: onErrorBlock. Evaluate the receiver, and, if the evaluation does not
complete, evaluate the error block. If evaluation of the receiver finishes
normally, the error block is not evaluated.
Some messages are related to process scheduling. We list the most impor-
tant ones. Since this Chapter is not about concurrent programming in Pharo,
we will not go deep into them.
forkAt: aPriority. Create and schedule a Process evaluating the receiver at the
given priority. Answer the newly created process.
classVariableNames: ''
poolDictionaries: ''
category: 'BlockExperiment'
Bexp>>evaluateBlock: aBlock
|t|
t := nil.
aBlock value
• The method evaluateBlock: defines its own local variable t with the same
name than the one in the block. This is not this variable, however,
that is used when the block is evaluated. While executing the method
evaluateBlock: the block is evaluated (Step 2), during the execution of
the expression t traceCr the non-local variable t is looked up in the home
context of the block i.e., the method context that created the block and
not the context of the currently executed method.
308 Blocks: a Detailed Analysis
Figure 14.1: Non-local variables are looked up the method activation context
where the block was created and not where it is evaluated.
Bexp>>setVariableAndDefineBlock2
|t|
t := 42.
self evaluateBlock: [ t := 33. t traceCr ]
Bexp>>setVariableAndDefineBlock3
|t|
t := 42.
self evaluateBlock: [ t traceCr. t := 33. t traceCr ].
self evaluateBlock: [ t traceCr. t := 66. t traceCr ].
self evaluateBlock: [ t traceCr ]
Bexp new setVariableAndDefineBlock3 will print 42, 33, 33, 66 and 66. Here the
two blocks [ t := 33. t traceCr ] and [ t := 66. t traceCr ] access the same variable t
and can modify it. During the first execution of the method evaluateBlock: its
current value 42 is printed, then the value is changed and printed. A similar
situation occurs with the second call. This example shows that blocks share
the location where variables are stored and also that a block does not copy
the value of a captured variable. It just refers to the location of the variables
and several blocks can refer to the same location.
Here the initial value of the variable t is 42. The block is created and
stored into the instance variable block but the value to t is changed to 69
before the block is evaluated. And this is the last value (69) that is effec-
tively printed because it is looked up at execution-time. Executing Bexp new
setVariableAndDefineBlock4 prints 69.
Bexp>>setVariableAndDefineBlock4
|t|
t := 42.
block := [ t traceCr: t ].
t := 69.
self evaluateBlock: block
310 Blocks: a Detailed Analysis
Bexp>>testArg: arg
block := [arg traceCr].
self evaluateBlockAndIgnoreArgument: 'zork'.
Bexp>>evaluateBlockAndIgnoreArgument: arg
block value.
Now executing Bexp new testArg: 'foo' prints 'foo' even if in the method
evaluateBlockAndIgnoreArgument: the temporary arg is redefined. In fact each
method invocation has its own values for the arguments.
Bexp>>initialize
super initialize.
x := 123.
Bexp2>>initialize
super initialize.
Variables and blocks 311
x := 69.
Bexp2>>evaluateBlock: aBlock
aBlock value
Then define the methods that will invoke methods defined in Bexp2.
Bexp>>evaluateBlock: aBlock
Bexp2 new evaluateBlock: aBlock
Bexp>>evaluateBlock
self evaluateBlock: [self crTrace ; traceCr: x]
Conclusion. We show that blocks capture variables that are reached from
the context in which the block was defined and not where there are executed.
Blocks keep references to variable locations that can be shared between mul-
tiple blocks.
Block-local variables
As we saw previously a block is a lexical closure that is connected to the
place where it is defined. In the following, we will illustrate this connection
by showing that block local variables are allocated in the execution context
link to their creation. We will show the difference when a variable is local to
a block or to a method (see Figure 14.2).
Let’s comment the code: we create a loop that stores the current index
(an block argument) in a temporary variable temp created in the loop. We
then store a block that accesses this variable in a collection. After the loop,
we execute each accessing block and return the collection of values. If we
execute this method, we get a collection with 1, 2 and 3. This result shows
that each block in the collection refers to a different temp variable. This is due
to the fact that an execution context is created for each block creation (at each
loop step) and that the block [ temp ] is stored in this context.
Method allocation. Now let us create a new method that is the same as
blockLocalTemp except that the variable temp is a method variable instead of a
block variable.
Bexp>>blockOutsideTemp
| collection temp |
collection := OrderedCollection new.
#(1 2 3) do: [ :index |
temp := index.
collection add: [ temp ] ].
^ collection collect: [ :each | each value ]
When we execute Bexp new foo, we get 0 and not nil. What you see here is
that the value is shared between the method body and the block. Inside the
method body we can access the variable whose value was set by the block
evaluation. Both the method and block bodies access the same temporary
variable a.
Let’s make it slightly more complicated. Define the method twoBlockArray
as follows:
Bexp>>twoBlockArray
|a|
a := 0.
^ {[ a := 2] . [a]}
You can also define the code as follows and open a transcript to see the
results.
| res |
res := Bexp new twoBlockArray.
res second value traceCr.
res first value.
res second value traceCr.
Let us step back and look at an important point. In the previous code
snippet when the expressions res second value and res first value are executed,
the method twoBlockArray has already finished its execution - as such it is not
on the execution stack anymore. Still the temporary variable a can be ac-
cessed and set to a new value. This experiment shows that the variables
referred to by a block may live longer than the method which created the
block that refers to them. We say that the variables outlive the execution of
their defining method.
You can see from this example that while temporary variables are some-
how stored in an activation context, the implementation is a bit more subtle
than that. The block implementation needs to keep referenced variables in a
structure that is not in the execution stack but lives on the heap. The compiler
performs some analysis and when it detects that a variable may outlive its
creation context, it allocates the variables in a structure that is not allocated
on the execution stack.
Basics on return
By default the returned value of a method is the receiver of the message
i.e., self. A return expression (the expression starting with the character ^)
allows one to return a different value than the receiver of the message. In
addition, the execution of a return statement exits the currently executed
method and returns to its caller. This ignores the expressions following the
return statement.
not printed, since the method testExplicitReturn will have returned before.
Bexp>>testExplicitReturn
self traceCr: 'one'.
0 isZero ifTrue: [ self traceCr: 'two'. ^ self].
self traceCr: 'not printed'
Note that the return expression should be the last statement of a block
body.
For example, the following expression Bexp new jumpingOut will return 3
and not 42. ^ 42 will never be reached. The expression [ ^3 ] could be deeply
nested, its execution jumps out all the levels and return to the method caller.
Some old code (predating introduction of exceptions) passes non-local re-
turning blocks around leading to complex flows and difficult to maintain
code. We strongly suggest not using this style because it leads to complex
code and bugs. In subsequent sections we will carefully look at where a
return is actually returning.
Understanding return
Now to see that a return is really escaping the current execution, let us build
a slightly more complex call flow. We define four methods among which
one (defineBlock) creates an escaping block, one (arg:) evaluates this block and
one (evaluatingBlock:) that executes the block. Note that to stress the escaping
behavior of a return we defined evaluatingBlock: so that it endlessly loops after
evaluating its argument.
Bexp>>start
| res |
316 Blocks: a Detailed Analysis
Bexp>>defineBlock
| res |
self traceCr: 'defineBlock start'.
res := self arg: [ self traceCr: 'block start'.
1 isZero ifFalse: [ ^ 33 ].
self traceCr: 'block end'. ].
self traceCr: 'defineBlock end'.
^ res
Bexp>>arg: aBlock
| res |
self traceCr: 'arg start'.
res := self evaluateBlock: aBlock.
self traceCr: 'arg end'.
^ res
Bexp>>evaluateBlock: aBlock
| res |
self traceCr: 'evaluateBlock start'.
res := self evaluateBlock: aBlock value.
self traceCr: 'evaluateBlock loops so should never print that one'.
^ res
Executing Bexp new start prints the following (indentation added to stress
the calling flow).
start start
defineBlock start
arg start
evaluateBlock start
block start
start end
What we see is that the calling method start is fully executed. The method
defineBlock is not completely executed. Indeed, its escaping block [^33] is ex-
ecuted two calls away in the method evaluateBlock:. The evaluation of the
block returns to the block home context sender (i.e., the context that invoked
the method creating the block).
When the return statement of the block is executed in the method
evaluateBlock:, the execution discards the pending computation and returns
to the method execution point that created the home context of the block. The
block is defined in the method defineBlock. The home context of the block is
the activation context that represents the definition of the method defineBlock.
Returning from inside a block 317
Figure 14.3: A block with non-local return execution returns to the method
execution that activated the block home context. Frames represent contexts
and dashed frames represent the same block at different execution points.
Therefore the return expression returns to the start method execution just af-
ter the defineBlock execution. This is why the pending executions of arg: and
evaluateBlock: are discarded and why we see the execution of the method start
end.
As shown by Figure 14.3, [^33] will return to the sender of its home con-
text. [^33] home context is the context that represents the execution of the
method defineBlock, therefore it will return its result to the method start.
To verify where the execution will end, you can use the expression
thisContext home sender copy inspect. which returns a method context pointing
to the assignment in the method start.
Then we define a simple assert: method that raises an error if its argument
is false.
Bexp>>assert: aBoolean
aBoolean ifFalse: [Error signal]
Bexp>>returnBlock
^ [ ^ self ]
When we execute returnBlock, the method returns the block to its caller
(here the top level execution). When evaluating the block, because the
method defining it has already terminated and because the block is contain-
ing a return expression that should normally return to the sender of the block
home context, an error is signaled.
Conclusion. Blocks with non-local expressions ([^ ...]) return to the sender
of the block home context (the context representing the execution led to the
block creation).
We saw that blocks refer to the home context when looking for variables. So
now we will look at contexts. Contexts represent program execution. The
Pharo execution engine represents its current execution state with the fol-
lowing information:
5. a call stack.
Figure 14.4: A method context where we can access the value of the tempo-
rary variable temp at that given point of execution.
Bexp>>first: arg
| temp |
temp := arg * 2.
thisContext copy inspect.
^ temp
You will get the inspector shown in Figure 14.4. Note that we copy the
current context obtained using thisContext because the Virtual Machine limits
memory consumption by reusing contexts.
MethodContext does not only represent activation context of method execu-
tion but also the ones for blocks. Let us have a look at some values of the
current context:
• sender points to the previous context that led to the creation of the cur-
rent one. Here when you executed the expression, a context was cre-
ated and this context is the sender of the current one.
• stackp defines the depth of the stack of variables in the context. In most
cases, its value is the number of stored temporary variables (including
arguments). But in certain cases, for example during a message send,
the depth of the stack is increased: the receiver is pushed, then the
arguments, lastly the message send is executed and the depth of the
stack goes back to its previous value.
The class MethodContext and its superclasses define many methods to get
information about a particular context. For example, you can get the val-
ues of the arguments by sending the arguments message and the value of a
particular temporary variable by sending tempNamed:.
Figure 14.5: The pc variable holds 27 because the last (bytecode) instruction
executed was the message send inspect.
Message execution 323
Let’s look at the following example. When you execute, just press "ok" to
the dialogs popping up.
| homeContext b1 |
homeContext := thisContext.
b1 := [| b2 |
self assert: thisContext closure == b1.
self assert: b1 outerContext == homeContext.
self assert: b1 home = homeContext.
b2 := [self assert: thisContext closure == b2.
self assert: b2 outerContext closure outerContext == homeContext].
self assert: b2 home = homeContext.
b2 value].
b1 value
Sending a message
To send a message to a receiver, the VM has to:
1. Find the class of the receiver using the receiver object’s header.
324 Blocks: a Detailed Analysis
(a) check for a primitive associated with the method by reading the
method header;
(b) if there is a primitive, execute it;
(c) if the primitive completes successfully, return the result object to
the message sender;
(d) when there is no primitive or the primitive fails, continue to the
next step.
4. Create a new context. Set up the program counter, stack pointer, home
contexts, then copy the arguments and receiver from the message send-
ing context’s stack to the new stack.
5. Activate that new context and start executing the instructions in the
new method.
The execution state before the message send must be remembered be-
cause the instructions after the message send must be executed when the
message returns. State is saved using contexts. There will be many contexts
in the system at any time. The context that represents the current state of
execution is called the active context.
When a message send happens in the active context, the active context
is suspended and a new context is created and activated. The suspended
context retains the state associated with the original compiled method until
that context becomes active again. A context must remember the context
that it suspended so that the suspended context can be resumed when a
result is returned. The suspended context is called the new context’s sender.
Figure 14.6 represents the relations between compiled methods and context.
The method points to the currently executed method. The program counter
points to the last instruction of the compiled method. Sender points to the
context that was previously active.
Sketch of implementation
Temporaries and arguments for blocks are handled the same way as in meth-
ods. Arguments are passed on the stack and temporaries are held in the
corresponding context. Nevertheless, a block can access more variables than
Message execution 325
a method: a block can refer to arguments and temporaries from the enclos-
ing method. As we have seen before, blocks can be passed around freely
and activated at any time. In all cases, the block can access and modify the
variables from the method it was defined in.
Let us consider the example shown in Figure 14.7. The temp variable used
in the block of the exampleReadInBlock method is non-local or remote variable.
temp is initialized and changed in the method body and later on read in the
block. The actual value of the variable is not stored in the block context but
in the defining method context, also known as home context. In a typical
implementation the home context of a block is accessed through its closure.
This approach works well if all objects are first-class objects, including the
method and block context. Blocks can be evaluated outside their home con-
text and still refer to remote variables. Hence all home contexts might outlive
the method activation.
point. Combined with the typical coding practice of using small methods
that call many other objects, Pharo can generate a lot of contexts.
The most efficient way to deal with method contexts is to not create them
at all. At the VM level, this is done by using real stack frames. Method con-
texts can be easily mapped to stack frames: whenever we call a method we
create a new frame, whenever we return from a method we delete the current
frame. In that matter Pharo is not very different from C. This means when-
ever we return from a method the method context (stack frame) is immedi-
ately removed. Hence no high-level garbage collection is needed. Neverthe-
less, using the stack gets much more complicated when we have to support
blocks.
As mentioned before, method contexts that are used as home contexts
might outlive their activation. If method contexts work as we explained up
to now we would have to check each time for home contexts if a stack frame
is removed. This comes with a big performance penalty. Hence the next step
in using a stack for contexts is to make sure method contexts can be safely
removed when we return from a method.
The Figure 14.8 shows how non-local variables are no longer directly
stored in the home context, but in a separate remote array which is heap
allocated.
Figure 14.8: How the VM stores remote variables so that they continue to
leave when a method returns.
method defining it has returned. A block can access its own variables and
also non local variables: instance variables, temporaries and arguments of
the defining method. We also saw how blocks can terminate a method and
return a value to the sender. We say that these blocks are non-local return-
ing blocks and that some care has to be taken to avoid errors: a block can
not terminate a method that has already returned. Finally, we show what
contexts are and how they play an important role with block creation and
execution. We show what the thisContext pseudo variable is and how to use it
to get information about the executing context and potentially change it.
We thank Eliot Miranda for the clarifications.
Chapter 15
We manipulate numbers all the time and in this chapter we propose you
a little journey into the way integers are mapped to their binary representa-
tions. We will open the box and take a language implementor perspective
and explore how small integers are represented.
We will start with some simple maths reminders on math that are the
basics of our digital world. Then we will have a look at how small integers
are encoded. This is commonly forgotten over time and our goal is to aid in
refreshing this knowledge.
Figure 15.2: 13 = 1 ∗ 23 + 1 ∗ 22 + 0 ∗ 21 + 1 ∗ 20 .
Binary notation
Pharo has a syntax for representing numbers in different bases. We write
2r1101 where 2 indicates the base or radix, here 2, and the rest the number
expressed in this base. Note that we could also write 2r01101 or 2r0001101
since this notation follows the convention that the least significant bit is the
rightmost one.
2r1101
−→ 13
13 printStringBase: 2
−→ '1101'
Integer readFrom: '1101' base: 2
−→ 13
Note that the last two messages printStringBase: and readFrom:base: do not
handle the internal encoding of negative numbers well as we will see later.
-2 printStringBase: 2 returns -10 but this is not the internal number represen-
tation (known as two’s complement). These messages just print/read the
number in a given base.
The radix notation can be used to specify numbers in different bases. Ob-
viously 15 written in decimal base (10r15) returns 15, while 15 in base 16
returns 16 + 5 = 21 as illustrated by the following expressions.
Bit shifting is multiplying by 2 powers 331
10r15
−→ 15
16r15
−→ 21
The message bitShift: is equivalent to >> and <<, but it uses negative and
positive integers to indicate the shift direction. A positive argument offers
the same behavior as <<, multiplying the receiver by a power of 2. A negative
is similar to >>.
2r000001000
−→ 8
2r000001000 bitShift: -1
−→ 4
2r000001000 bitShift: 1
−→ 16
2r000001000
−→ 8
2r000001000 >> 2 "we divide by four"
−→ 2
(2r000001000 >> 2) printStringBase: 2
−→ '10'
2r000001000 << 2 "we multiply by four"
−→ 32
The previous examples only show bit shifting numbers with one or two
bits, but there is no constraint at this level. The complete sequence of bits
can be shifted as shown with 2r000001100 below and Figure 15.4.
So far, there is nothing really special. Though you should have learned
this in a basic math lectures, it is always good to walk on a hill before climb-
ing a mountain.
Bit manipulation and access 333
bitAnd: can then be used to select part of a number. For example, bitAnd: 2
r111 selects the three first bits.
2r000001101 bitAnd: 2r111
−→ 5
2r000001101 bitAnd: 2r0
−→ 0
2r0001001101 bitAnd: 2r1111
−→ 13 "1101"
2r000001101 bitAnd: 2r111000
−→ 8 "1000"
2r000101101 bitAnd: 2r111000
−→ 40 "101000"
Bit Access. Smalltalk lets you access bit information. The message bitAt: re-
turns the value of the bit at a given position. It follows the Pharo convention
that collection indexes start at one.
2r000001101 bitAt: 1
−→ 1
2r000001101 bitAt: 2
−→ 0
2r000001101 bitAt: 3
−→ 1
2r000001101 bitAt: 4
−→ 1
2r000001101 bitAt: 5
−→ 0
With Pharo you can access the full environment and learn from the sys-
tem itself. Here is the implementation of the method bitAt: on the Integer class.
Integer>>bitAt: anInteger
"Answer 1 if the bit at position anInteger is set to 1, 0 otherwise.
self is considered an infinite sequence of bits, so anInteger can be any strictly positive
integer.
Bit at position 1 is the least significant bit.
Negative numbers are in two-complements.
We shift to the right from an integer minus one (hence 1 - anInteger) and
with a bitAnd: we know whether there is a one or zero in the location. Imagine
that we have 2r000001101, when we do 2r000001101 bitAt: 5 we will shift it from
4 and doing a bitAnd: 1 with select that bits (i.e., returns 1 if it was at 1 and zero
otherwise, so its value). Doing a bitAnd: 1 is equivalent to tell whether there
is a 1 in the least significant bit.
Again, nothing really special here, but this was to refresh our memories.
Now we will see how numbers are internally encoded in Pharo using 2’s
complement. We will start by understanding the 10’s complement and look
at 2’s complement.
Ten’s complement of a number 335
Subtraction at work
The key point of complement techniques is to convert subtractions into ad-
ditions. Let us check that.
Before getting into 2’s complement we will look at negative number rep-
resentation.
To get the value out of the bit representation, we simply add: −27 + 26 + 25 +
24 + 23 + 22 + 21 + 0 ∗ 20 , i.e., −128 + 64 + 32 + 16 + 8 + 4 + 2 and we get −2.
−69 is represented on 8 bit encoding as: 1011 1011. To get the value out
of the bit representation is simple. We add: −27 + 0 ∗ 26 + 25 + 24 + 23 + 0 ∗
22 + 21 + 20 , i.e., −128 + 32 + 16 + 8 + 2 + 1 and we get −69.
Following the same principle, check that the value of -1 is the one de-
scribed in Figure 15.5.
Let us count a bit: on an 8 bit representation we can then encode 0 to
255 positive integers or -128 to 64 + 32 + 16 + 8 + 4 + 2 + 1 127. In fact we
can encode from −1 ∗ 27 to 27 − 1. More generally on N bits we can encode
−1 ∗ 2N −1 to 2N −1 − 1 integer values.
Now we have all the pieces of the puzzle: we know how we can encode
positive and negative numbers, we know how to use the complement to
turn a subtraction into an addition. Let us see how the 2’s complement is
used to negate numbers and perform subtraction.
The 2’s complement is a common method to represent signed integers.
The advantages are that addition and subtraction are implemented without
having to check the sign of the operands and 2’s complement has only one
representation for zero (avoiding negative zero). Adding numbers of differ-
ent sign encoded using 2’s complement does not require any special process-
ing: the sign of the result is determined automatically. The 2’s complement
of a positive number represents the negative form of that number.
338 Exploring Little Numbers
-2 bitInvert bitString
−→ '0000000000000000000000000000001'
2 bitString
−→ '0000000000000000000000000000010'
Subtracting. To subtract a number from another one, we just add the sec-
ond number’s 2’s complement to the first one.
When we want to compute 110110−101, we compute the 2’s complement
of 101 and add it. We add 110110 and 111011, and get 110001. This is correct:
54 − 5 = 49.
110110 - 101
110110
+ 111011
----------
110001
−→ '0000000000000000000000000110001'
(2r110110 bitString)
−→ '0000000000000000000000000110110'
2r101 bitString
−→ '0000000000000000000000000000101'
2r101 negated bitString
−→ '1111111111111111111111111111011'
The case where the result is a negative number is also well handled. For
example, if we want to compute 15 − 35, we should get -20 and this is what
we get. Let us see that: 15 is encoded as 0000 1111 and 35 as 0010 0011. Now
the two’s complement of 35 is 1101 1101.
0011111 (carry)
0000 1111
1101 1101
-----------------
1111111101100
SmallInteger maxVal highBit tells the highest bit which can be used to repre-
sent a positive SmallInteger, and + 1 accounts for the sign bit of the Small-
Integer (0 for positive, 1 for negative).
Let us explore a bit.
2 raisedTo: 29
−→ 536870912
536870912 class
−→ SmallInteger
2 raisedTo: 30
−→ 1073741824
1073741824 class
−→ LargePositiveInteger
(1073741824 - 1) class
−→ SmallInteger
-1073741824 class
−→ SmallInteger
2 class maxVal
returns 1073741823
-1 * (2 raisedTo: (31-1))
−→ -1073741824
(2 raisedTo: 30) - 1
−→ 1073741823
−→ true
15.8 Hexadecimal
We cannot finish this chapter without talking about hexadecimal. Pharo uses
the same syntax for hexadecimal than for binary. 16rF indicates that F is
encoded in 16 base.
We can get the hexadecimal equivalent of a number using the message
hex. Using the message printStringHex we get the number printed in hexadeci-
mal without the radix notation.
15 hex
−→ '16rF'
15 printStringHex
−→ 'F'
16rF printIt
−→ 15
The following snippet lists some equivalence between a number and its
hexadecimal representation.
Note that Pharo supports large numbers whose limit in size is mainly the
memory you have at your disposal.
Chapter 16
Floats are inexact by nature and this can confuse programmers. This
chapter introduces this problem and presents some practical solutions to it.
The basic message is that Floats are what they are: inexact but fast numbers.
Note that most of the situations described in this chapters are conse-
quences on how Floats are structured by the hardware and are not tied to
Pharo. The very same problems in others programming languages.
Hey, this is unexpected, you did not learn that in school, did you? This
behavior is surprising indeed, but it’s normal since floats are inexact num-
bers. What is important to understand is that the way floats are printed is
also influencing our understanding. Some approaches print a simpler repre-
sentation of reality than others. In early versions of Pharo printing 0.1 + 0.2
were printing 0.3, now it prints 0.30000000000000004. This change was guided
by the idea that it is better not to lie to the user. Showing the inexactness of
a float is better than hiding it because one day or another we can be deeply
bitten by them.
346 Fun with Floats
0.3 printString
−→ '0.3'
The method storeString also conveys that we are in presence of two differ-
ent numbers.
(0.1 + 0.2) storeString
−→ '0.30000000000000004'
0.3 storeString
−→ '0.3'
About closeTo:. One way to know if two floats are probably close enough
to look like the same number is to use the message closeTo:
(0.1 + 0.2) closeTo: 0.3
−→ true
The method closeTo: verify that the two compared numbers have less than
0.0001 of difference. Here is its source code.
closeTo: num
"are these two numbers close?"
num isNumber ifFalse: [^[self = num] ifError: [false]].
self = 0.0 ifTrue: [^num abs < 0.0001].
num = 0 ifTrue: [^self abs < 0.0001].
^self = num asFloat
or: [(self - num) abs / (self abs max: num abs) < 0.0001]
Now, if you execute the following line, you will see that the expressions
are not equals.
(0.1 asScaledDecimal: 2) + (0.2 asScaledDecimal: 2) = (0.3 asScaledDecimal: 2)
−→ false
11.125 significand.
−→ 1.390625
11.125 exponent.
−→ 3
'10110010000000000000000000000000000000000000000000000' size
−→ 53
Float precision.
−→ 53
You can also retrieve the exact fraction corresponding to the internal rep-
resentation of the Float:
11.125 asTrueFraction.
−→ (89/8)
Until there we’ve retrieved the exact input we’ve injected into the Float.
Are Float operations exact after all? Hem, no, we only played with fractions
having a power of 2 as denominator and a few bits in numerator. If one of
these conditions is not met, we won’t find any exact Float representation of
our numbers. For example, it is not possible to represent 1/5 with a finite
number of binary digits. Consequently, a decimal fraction like 0.1 cannot be
represented exactly with above representation.
(1/5) asFloat = (1/5).
−→ false
(1/5) = 0.2
−→ false
Let us see in detail how we could get the fractional bits of 1/5 i.e., 2r1/2r101.
For that, we must lay out the division:
1 101
10 0.00110011
100
1000
-101
11
110
-101
1
10
100
1000
-101
11
110
-101
1
350 Fun with Floats
That’s the bit pattern we expected, except the last bits 001 have been
rounded to upper 010. This is the default rounding mode of Float, round
to nearest even. We now understand why 0.2 is represented inexactly in ma-
chine. It’s the same mantissa for 0.1, and its exponent is -4.
0.2 significand
−→ 1.6
0.1 significand
−→ 1.6
0.2 exponent
−→ -3
0.1 exponent
−→ -4
So, when we entered 0.1 + 0.2, we didn’t get exactly (1/10) + (1/5). In-
stead of that we got:
0.1 asTrueFraction + 0.2 asTrueFraction.
−→ (10808639105689191/36028797018963968)
But that’s not all the story... Let us inspect the bit pattern of above frac-
tion, and check the span of this bit pattern, that is the position of highest bit
set to 1 (leftmost) and position of lowest bit set to 1 (rightmost):
10808639105689191 printStringBase: 2.
−→ '100110011001100110011001100110011001100110011001100111'
10808639105689191 highBit.
−→ 54
10808639105689191 lowBit.
−→ 1
36028797018963968 printStringBase: 2.
−→ '10000000000000000000000000000000000000000000000000000000'
With floats, printing is inexact 351
This means that the fraction denominator is 255 and that you need 55
decimal digits after the decimal point to really print internal representation
of 0.1 exactly.
352 Fun with Floats
It is surprising but not false that 2.8 truncateTo: 0.01 does not return 2.8 but
2.8000000000000003. This is because truncateTo: and roundTo: perform several
operations on floats: inexact operations on inexact numbers can lead to cu-
mulative rounding errors as you saw above, and that’s just what happens
again.
Even if you perform the operations exactly and then round to nearest
Float, the result is inexact because of the initial inexact representation of 2.8
and 0.01.
(2.8 asTrueFraction roundTo: 0.01 asTrueFraction) asFloat
−→ 2.8000000000000003
Using 0.01s2 rather than 0.01 let this example appear to work:
2.80 truncateTo: 0.01s2
−→ 2.80s2
Fun with inexact representations 353
But it’s just a case of luck, the fact that 2.8 is inexact is enough to cause
other surprises as illustrated below:
To add a nail to the coffin, let’s play a bit more with inexact representations.
Let us try to see the difference between different numbers:
{
((2.8 asTrueFraction roundTo: 0.01 asTrueFraction) - (2.8 predecessor)) abs -> -1.
((2.8 asTrueFraction roundTo: 0.01 asTrueFraction) - (2.8)) abs -> 0.
((2.8 asTrueFraction roundTo: 0.01 asTrueFraction) - (2.8 successor)) abs -> 1.
} detectMin: [:e | e key ]
−→ 0.0->1
• Never use = to compare floats (e.g., (0.1 + 0.2) = 0.3 returns false)
• Use closeTo: instead (e.g., (0.1 + 0.2) closeTo: 0.3 returns true)
• A float number is represented in base as sign x mantissa x 2exponent (e.g.,
1.2345 = 12345 x 10−4 )
There are much more things to know about floats, and if you are ad-
vanced enough, it would be a good idea to check this link from the wikipedia
page "What Every Computer Scientist Should Know About Floating-Point
Arithmetic" (http://www.validlab.com/goldberg/paper.pdf).
Part V
Tools
Chapter 17
Profiling Applications
universal 80-20 rule: only a few amount of the total amount of methods (let’s
say 20%) consume the largest part of the available resources (80% of memory
and CPU consumption). Optimizing an application is essentially a matter of
tradeoff therefore. In this chapter we will see how to use the available tools
to quickly identify these 20% of methods and how to measure the progress
coming along the program enhancements we bring.
Experience shows that having unit tests is essential to ensure that we do
not break the program semantics when optimizing it. When replacing an
algorithm by another, we ought to make sure that the program still do what
it is supposed to do.
| coll |
coll := #(1 2 3 4 5 6 7 8) asOrderedCollection.
[ 100000 timesRepeat: [ (coll select: [:each | each > 5]) collect: [:i |i * i]]] timeToRun
"Calling select:, then collect: - −→ ∼ 570 - 600 ms"
| coll |
coll := #(1 2 3 4 5 6 7 8) asOrderedCollection.
[ 100000 timesRepeat: [ coll select: [:each | each > 5] thenCollect:[:i |i * i]]] timeToRun
"Calling select:thenCollect: - −→ ∼ 397 - 415 ms"
Although the difference between these two executions is only about few
hundred of milliseconds, opting for one method instead of the other could
significantly slow your application!
Code profiling in Pharo 359
| newCollection |
newCollection := self copyEmpty.
firstIndex to: lastIndex do: [:index |
| element |
element := array at: index.
(selectBlock value: element)
ifTrue: [ newCollection addLast: (collectBlock value: element) ]].
^ newCollection
As you have probably guessed already, other collections such as Set and
Dictionary do not benefit from an optimized version. We leave as an exer-
cise an efficient implementation for other abstract data types. As part of the
community effort, do not forget to submit your contribution to Pharo if you
come up with an optimized and better version of select:thenCollect: or other
methods. The Pharo team really value such effort.
The method bench. When sent to a block, the bench message estimates how
many times this block is evaluated per second. For example, the expression [
1000 factorial ] bench says that 1000 factorial may be executed approximately 350
times per second.
MessageTally
MessageTally is implemented as a unique class having the same name. Us-
ing it is quite simple. A message spyOn: needs to be sent to MessageTally
with a block expression as argument to obtained a detailed execution analy-
sis. Evaluating MessageTally spyOn: ["your expression here"] opens a window that
contains the following information:
2. leaf methods of the execution. A leaf method is a method that does not
invoke other methods (e.g., primitive, accessors).
Figure 17.1 shows the result of the expression MessageTally spyOn: [20
timesRepeat: [Transcript show: 1000 factorial printString]]. The message spyOn: exe-
cutes the provided block in a new process. The analysis focuses on one pro-
cess, only, the one that executes the block to profile. The message spyAllOn:
profiles all the processes that are active during the execution. This is useful
to analyze the distribution of the computation over several processes.
A tool a bit less crude than MessageTally is TimeProfileBrowser. It shows
the implementation of the executed method in addition (Figure 17.2).
TimeProfileBrowser understand the message spyOn:. It means that in the be-
low source code, MessageTally can be replaced with TimeProfileBrowser to
obtain the better user interface.
Via the World menu. The World menu (obtained by clicking outside any
Pharo window) offers some profiling facilities under the System submenu
362 Profiling Applications
(Figure 17.3). Start profiling all Processes creates a block from a text selection
and invokes spyAllOn:. The entry Start profiling UI profiles the user interface
process. This is quite handy when debugging a user interface!
Via the Test Runner. As the size of an application grows, unit tests are
usually becoming a good candidate for code profiling. Running tests often
is rather tedious when the time to run them is getting too long. The Test
Runner in Pharo offers a button Run Profiled (Figure 17.4).
Pressing this button runs the selected unit tests and generates a message
tally report.
For illustration purpose, let us consider the following scenario: the string
character 'A' is cumulatively appended 9 000 times to an initial empty string.
Read and interpret the results 363
MessageTally spyOn:
[ 500 timesRepeat: [
| str |
str := ''.
9000 timesRepeat: [ str := str, 'A' ]]].
**Tree**
--------------------------------
Process: (40s) 535298048: nil
--------------------------------
29.7% {7152ms} primitives
11.5% {2759ms} ByteString(SequenceableCollection)>>copyReplaceFrom:to:with:
5.9% {1410ms} primitives
5.6% {1349ms} ByteString class(String class)>>new:
**Leaves**
29.7% {7152ms} ByteString(SequenceableCollection)>>,
9.2% {2226ms} SmallInteger(Integer)>>timesRepeat:
5.9% {1410ms} ByteString(SequenceableCollection)>>copyReplaceFrom:to:with:
5.6% {1349ms} ByteString class(String class)>>new:
4.4% {1052ms} UndefinedObject>>DoIt
364 Profiling Applications
**Memory**
old +0 bytes
young +9,536 bytes
used +9,536 bytes
free -9,536 bytes
**GCs**
full 0 totalling 0ms (0.0% uptime)
incr 9707 totalling 7,985ms (16.0% uptime), avg 1.0ms
tenures 0
root table 0 overflows
The first line gives the overall execution time and the number of sam-
plings (also called tallies, we will come back on sampling at the end of the
chapter).
**Tree**
--------------------------------
Process: (40s) 535298048: nil
--------------------------------
29.7% {7152ms} primitives
11.5% {2759ms} ByteString(SequenceableCollection)>>copyReplaceFrom:to:with:
5.9% {1410ms} primitives
5.6% {1349ms} ByteString class(String class)>>new:
This tree shows that the interpreter spent 29.7% of its time by execut-
ing primitives. 11.5% of the total execution time is spent in the method
SequenceableCollection>>copyReplaceFrom:to:with:. This method is called when
concatenating character strings using the message comma (,), itself indirectly
invoking new: and some virtual machine primitives.
The execution takes 11.5% of the execution time, this means that the in-
terpreter effort is shared with other processes. The invocation chain from the
code to the primitives is relatively short. Reaching hundreds of nested calls
is no exception for most of applications. We will optimize this example later
on.
Read and interpret the results 365
**Memory**
The statistical part on memory consumption tells the observed changes on
the quantity of memory allocated and the garbage collector usage. To fully
understand this information, one needs to keep in mind that Pharo’s garbage
collector (GC) is a scavenging GC, relying on the principle that an old object
has greater change to live even longer. It is designed following the fact that
an old object will probably be kept referenced in the future. On the contrary,
a young object has greater change to be quickly dereferenced.
Several memory zones are considered and the migration of a young ob-
ject to the space dedicated for old object is qualified as tenured. (Following
the metaphor of American academic scientists, when a permanent position
is obtained.)
An example of the memory analyze realized by MessageTally:
**Memory**
old +0 bytes
young +9,536 bytes
used +9,536 bytes
free -9,536 bytes
1. the old value is about the grow of the memory space dedicated to old
objects. An object is qualified as “old” when its physical memory loca-
tion is in the “old memory space”. This happens when a full garbage
366 Profiling Applications
collector is triggered, or when there are too many object survivors (ac-
cording to some threshold specified in the virtual machine). This mem-
ory space is cleaned by a full garbage collection only. (An incremental
GC does not reduce its size therefore).
An increase of the old memory space is likely to be due to a memory leak:
the virtual machine is unable to release memory, promoting young ob-
jects as old.
2. the young value tells about the increase of the memory space dedicated
to young objects. When an object is created, it is physically located in
this memory space. The size of this memory space changes frequently.
In our example, none of the objects created during the execution have
been promoted as old. 9 536 bytes are used by the current process, located
in the young memory space. The amount of available memory has been
reduced accordingly.
**GCs**
The **GCs** provides statistics about the garbage collector. An example of a
garbage collector report is:
**GCs**
full 0 totalling 0ms (0.0% uptime)
incr 9707 totalling 7,985ms (16.0% uptime), avg 1.0ms
tenures 1 (avg 9707 GCs/tenure)
root table 0 overflows
1. The full value totals the amount of full garbage collections and the
amount of time it took. Full garbage collection are not that frequent.
They results from often allocating large memory chunks.
3. The number of tenures tells the amount of objects that migrated to the
old memory space. This migration happens when the size of the young
memory space is above a given threshold. This typically happens
Illustrative analysis 367
4. The root table overflows is the amount of root objects used by the garbage
collector to navigate the image. This navigation happens when the sys-
tem is running short on memory and need to collect all the objects rel-
evant for the future program execution. The overflow value identifies
the rare situation when the number of roots used by the incremental
GC is greater than the its internal table. This situation forces the GC to
promote some objects as tenured.
Using a Stream for string concatenation. At the first glance, one could
think that creating a stream is costly since it is frequently used with relatively
slow inputs and outputs (e.g., network socket, disk accesses, Transcript). But
replacing the string concatenation employed in the previous example by a
stream operation is almost 10 times faster! This is easily understandable
since concatenating 9000 times a character strings creates 8999 intermedi-
ately objects, each being filled with the content of another. Using a stream,
we simply have to append a character at each iteration.
MessageTally spyOn:
[ 500 timesRepeat: [
| str |
str := WriteStream on: (String new).
9000 timesRepeat: [ str nextPut: $A ]]].
368 Profiling Applications
**Tree**
--------------------------------
Process: (40s) 535298048: nil
--------------------------------
**Leaves**
33.0% {266ms} SmallInteger(Integer)>>timesRepeat:
21.2% {171ms} UndefinedObject>>DoIt
**Memory**
old +0 bytes
young -18,272 bytes
used -18,272 bytes
free +18,272 bytes
**GCs**
full 0 totalling 0ms (0.0% uptime)
incr 5 totalling 7ms (3.0% uptime), avg 1.0ms
tenures 0
root table 0 overflows
MessageTally spyOn:
[ 500 timesRepeat: [
| str |
str := WriteStream on: (String new: 9000).
9000 timesRepeat: [ str nextPut: $A ]]].
For this example, it is possible to improve the script by using the method
atAllPut:. The script below takes only a couple of milliseconds.
MessageTally spyOn:
[ 500 timesRepeat: [
| str |
str :=String new: 9000.
str atAllPut: $A ]].
valuable. The time taken with 9000 iterations is 2.7 times slower than with
500. Using the string concatenation (i.e., using the , method) instead of a
stream widens the gap with a factor 10. This experiment clearly illustrates
the importance of using appropriate tools to concatenate strings.
The time of the profiled execution is also an important quality factor for
the result. MessageTally employs a sampling technique to profile code. Per
default, MessageTally samples the current executing thread each millisecond
per default. It is therefore necessary that all the methods involved in the
computation are executed a “fair” amount of time to appear in the result
report. If the application to profile is very short (few milliseconds only), then
executing it a number of times help improving the accuracy of the report.
The downside of tallySend: is the time taken to execute the provided block.
The block to profile is executed by an interpreter written in Pharo, which
is slower then the one of the virtual machine. A piece of code profiled by
tallySends: is about 200 times slower. The interpreter is available from the
method ContextPart»runSimulated: aBlock contextAtEachStep: block2.
Integer»fibSlow
self assert: self >= 0.
(self <= 1) ifTrue: [ ^ self].
^ (self - 1) fibSlow + (self - 2) fibSlow
Integer»fibLookup: cache
| res |
res := cache at: (self + 1).
^ res ifNil: [ cache at: (self + 1) put: (self fibWithCache: cache ) ]
Integer»fibWithCache: cache
(self <= 1) ifTrue: [ ^ self].
^ ((self - 1) fibLookup: cache) + ((self - 2) fibLookup: cache)
Each line represents the memory analysis of a Pharo class. Classes are
ordered along the space they occupy. The class ByteString describes strings. It
is frequent to have strings to consume one third of the memory. Code space
gives the amount of bytes used by the class and its metaclass. It does not
include the space used by class variables. The value is given by the method
Behavior>>spaceUsed.
372 Profiling Applications
The essence of the profiling activity is given by the following code ex-
cerpt:
observedProcess := Processor activeProcess.
Timer := [
[ true ] whileTrue: [
| startTime |
startTime := Time millisecondClockValue.
myDelay wait.
self
tally: Processor preemptedProcess suspendedContext
in: (observedProcess == Processor preemptedProcess
ifTrue: [ observedProcess ] ifFalse: [ nil ])
by: (Time millisecondClockValue - startTime) // millisecs ].
nil] newProcess.
Timer priority: Processor timingPriority-1.
PetitParser: Building
Modular Parsers
Packrat Parsers give linear parse-time guarantees and avoid common prob-
lems with left-recursion in PEGs.
Loading PetitParser
Enough talking, let’s get started. PetitParser is developed in Pharo, and there
are also versions for Java and Dart available. A ready made image can be
downloaded2 . To load PetitParser into an existing image evaluate the fol-
lowing Gofer expression:
Figure 18.1: Syntax diagram representation for the identifier parser defined
in script 18.2
A graphical notation
Figure 18.1 presents a syntax diagram of the identifier parser. Each box rep-
resents a parser. The arrows between the boxes represent the flow in which
input is consumed. The rounded boxes are elementary parsers (terminals).
The squared boxes (not shown on this figure) are parsers composed of other
parsers (non terminals).
If you inspect the object identifier of the previous script, you’ll notice that
it is an instance of a PPSequenceParser. If you dive further into the object you
will notice the following tree of different parser objects:
The root parser is a sequence parser because the , (comma) operator cre-
ates a sequence of (1) a letter parser and (2) zero or more word character
parser. The root parser first child is a predicate object parser created by the
#letter asParser expression. This parser is capable of parsing a single letter
as defined by the Character»isLetter method. The second child is a repeating
parser created by the star call. This parser uses its child parser (another pred-
icate object parser) as much as possible on the input (i.e., it is a greedy parser).
Its child parser is a predicate object parser created by the #word asParser ex-
pression. This parser is capable of parsing a single digit or letter as defined
by the Character»isDigit and Character»isLetter methods.
378 PetitParser: Building Modular Parsers
Script 18.4: Parsing some input strings with the identifier parser
identifier parse: 'yeah'. −→ #($y #($e $a $h))
identifier parse: 'f123'. −→ #($f #($1 $2 $3))
While it seems odd to get these nested arrays with characters as a return
value, this is the default decomposition of the input into a parse tree. We’ll
see in a while how that can be customized.
If we try to parse something invalid we get an instance of PPFailure as an
answer:
This parsing results in a failure because the first character (1) is not a
letter. Instances of PPFailure are the only objects in the system that answer
with true when you send the message #isPetitFailure. Alternatively you can
also use PPParser»parse:onError: to throw an exception in case of an error:
identifier
parse: '123'
onError: [ :msg :pos | self error: msg ].
If you are only interested if a given string (or stream) matches or not you
can use the following constructs:
Script 18.7: Ensuring that the whole input is matched using PPParser»end
identifier end matches: 'foo()'. −→ false
Writing parsers with PetitParser 379
The PPParser»end message creates a new parser that matches the end of
input. To be able to compose parsers easily, it is important that parsers do not
match the end of input by default. Because of this, you might be interested
to find all the places that a parser can match using the message PPParser»
matchesSkipIn: and PPParser»matchesIn:.
digit
Figure 18.2: Syntax diagram representation for the identifier2 parser defined
in script 18.10
Parser action
To define an action or transformation on a parser we can use one of the mes-
sages PPParser»==>, PPParser»flatten, PPParser»token and PPParser»trim defined
in the protocol Table 18.3.
number: digit
Figure 18.3: Syntax diagram representation for the number parser defined in
script 18.14
The table 18.3 shows the basic elements to build parsers. There are a
few more well documented and tested factory methods in the operators pro-
tocols of PPParser. If you want to know more about these factory methods,
browse these protocols. An interesting one is separatedBy: which answers a
new parser that parses the input one or more times, with separations speci-
fied by another parser.
above, the next step is to define the productions for addition and multipli-
cation in order of precedence. Note that we instantiate the productions as
PPDelegateParser upfront, because they recursively refer to each other. The
method #setParser: then resolves this recursion. The following script defines
three parsers for the addition, multiplication and parenthesis (see Figure 18.4
for the related syntax diagram):
term setParser: (prod , $+ asParser trim , term ==> [ :nodes | nodes first + nodes last ])
/ prod.
prod setParser: (prim , $* asParser trim , prod ==> [ :nodes | nodes first * nodes last ])
/ prim.
prim setParser: ($( asParser trim , term , $) asParser trim ==> [ :nodes | nodes second ])
/ number.
The term parser is defined as being either (1) a prod followed by ‘+’, fol-
lowed by another term or (2) a prod. In case (1), an action block asks the
parser to compute the arithmetic addition of the value of the first node (a
prod) and the last node (a term). The prod parser is similar to the term
parser. The prim parser is interesting in that it accepts left and right paren-
thesis before and after a term and has an action block that simply ignores
them.
To understand the precedence of productions, see Figure 18.5. The root
of the tree in this figure (term), is the production that is tried first. A term is
either a + or a prod. The term production comes first because + as the lowest
priority in mathematics.
To make sure that our parser consumes all input we wrap it with the end
parser into the start production:
term: prod +
prod
prod: prim *
prim
prim: ( term )
number
Figure 18.4: Syntax diagram representation for the term, prod, and prim
parsers defined in script 18.15
term
+ prod
∗ prim
parens number
gle script makes it unnecessary hard to reuse specific parts of that grammar.
Luckily there is PPCompositeParser to the rescue.
Again we start with the grammar for an integer number. Define the
method number as follows:
Script 18.20: Defining more expression grammar parsers, this time with no associ-
ated action
ExpressionGrammar>>term
^ add / prod
ExpressionGrammar>>add
^ prod , $+ asParser trim , term
386 PetitParser: Building Modular Parsers
ExpressionGrammar>>prod
^ mul / prim
ExpressionGrammar>>mul
^ prim , $* asParser trim , prod
ExpressionGrammar>>prim
^ parens / number
ExpressionGrammar>>parens
^ $( asParser trim , term , $) asParser trim
Script 18.21: Defining the starting point of our expression grammar parser
ExpressionGrammar>>start
^ term end
Script 18.23: Reusing the number parser from the ExpressionGrammar grammar
PPCompositeParser subclass: #MyNewGrammar
instanceVariableNames: 'number'
classVariableNames: ''
poolDictionaries: ''
category: 'PetitTutorial'
MyNewGrammar class>>dependencies
"Answer a collection of PPCompositeParser classes that this parser directly
dependends on."
^ {ExpressionGrammar}
MyNewGrammar>>number
"Answer the same parser as ExpressionGrammar>>number."
^ (self dependencyAt: ExpressionGrammar) number
Defining an evaluator
Now that we have defined a grammar we can reuse this definition to imple-
ment an evaluator. To do this we create a subclass of ExpressionGrammar called
ExpressionEvaluator.
Script 18.24: Separating the grammar from the evaluator by creating a subclass
ExpressionGrammar subclass: #ExpressionEvaluator
instanceVariableNames: ''
classVariableNames: ''
poolDictionaries: ''
category: 'PetitTutorial'
We then redefine the implementation of add, mul and parens with our eval-
uation semantics. This is accomplished by calling the super implementation
and adapting the returned parser as shown in the following methods.
Script 18.25: Refining the definition of some parsers to evaluate arithmetic expres-
sions
ExpressionEvaluator>>add
^ super add ==> [ :nodes | nodes first + nodes last ]
ExpressionEvaluator>>mul
^ super mul ==> [ :nodes | nodes first * nodes last ]
ExpressionEvaluator>>parens
^ super parens ==> [ :nodes | nodes second ]
Defining a Pretty-Printer
We can reuse the grammar for example to define a simple pretty printer. This
is as easy as subclassing ExpressionGrammar again!
Script 18.27: Separating the grammar from the pretty printer by creating a subclass
ExpressionGrammar subclass: #ExpressionPrinter
instanceVariableNames: ''
classVariableNames: ''
poolDictionaries: ''
category: 'PetitTutorial'
ExpressionPrinter>>add
^ super add ==> [:nodes | nodes first , ' + ' , nodes third]
ExpressionPrinter>>mul
^ super mul ==> [:nodes | nodes first , ' * ' , nodes third]
ExpressionPrinter>>number
^ super number ==> [:num | num printString]
ExpressionPrinter>>parens
^ super parens ==> [:node | '(' , node second , ')']
This pretty printer can be tried out as shown by the following expres-
sions.
expression
group: [ :g |
g left: $* asParser token trim do: [ :a :op :b | a * b ].
g left: $/ asParser token trim do: [ :a :op :b | a / b ] ];
group: [ :g |
g left: $+ asParser token trim do: [ :a :op :b | a + b ].
g left: $- asParser token trim do: [ :a :op :b | a - b ] ].
Script 18.30: Now our parser is also able to manage subtraction and division
expression parse: '1-2/3'. −→ (1/3)
Script 18.31: Creating a class to hold the tests for our arithmetic expression gram-
mar
PPCompositeParserTest subclass: #ExpressionGrammarTest
instanceVariableNames: ''
classVariableNames: ''
poolDictionaries: ''
category: 'PetitTutorial'
It is then important that the test case class references the parser class:
this is done by overriding the PPCompositeParserTest»parserClass method in
ExpressionGrammarTest:
390 PetitParser: Building Modular Parsers
ExpressionGrammarTest>>testAdd
self parse: '123+77' rule: #add.
These tests ensure that the ExpressionGrammar parser can parse some ex-
pressions using a specified production rule. Testing the evaluator and pretty
printer is similarly easy:
ExpressionEvaluatorTest>>parserClass
^ ExpressionEvaluator
ExpressionEvaluatorTest>>testAdd
super testAdd.
self assert: result equals: 200
ExpressionEvaluatorTest>>testNumber
super testNumber.
self assert: result equals: 123
ExpressionPrinterTest>>parserClass
^ ExpressionPrinter
ExpressionPrinterTest>>testAdd
super testAdd.
Case Study: A JSON Parser 391
ExpressionPrinterTest>>testNumber
super testNumber.
self assert: result equals: '123'
JSON consists of object definitions (between curly braces “{}”) and arrays
(between square brackets “[]”). An object definition is a set of key/value
pairs whereas an array is a list of values. The previous JSON example then
represents an object (a person) with several key/value pairs (e.g., for the
person’s first name, last name, and age). The address of the person is repre-
sented by another object while the phone number is represented by an array
of objects.
First we define a grammar as subclass of PPCompositeParser. Let us call it
PPJsonGrammar
object: { members }
members: pair
string
pair: : value
Token
Figure 18.6: Syntax diagram representation for the JSON object parser de-
fined in script 18.37
classVariableNames: 'CharacterTable'
poolDictionaries: ''
category: 'PetitJson-Core'
Script 18.37: Defining the JSON parser for object as represented in Figure 18.6
PPJsonGrammar>>object
^ ${ asParser token trim , members optional , $} asParser token trim
PPJsonGrammar>>members
^ pair separatedBy: $, asParser token trim
PPJsonGrammar>>pair
^ stringToken , $: asParser token trim , value
The only new thing here is the call to the PPParser»separatedBy: conve-
nience method which answers a new parser that parses the receiver (a value
here) one or more times, separated by its parameter parser (a comma here).
Arrays are much simpler to parse as depicted in the script 18.38.
Script 18.38: Defining the JSON parser for array as represented in Figure 18.7
PPJsonGrammar>>array
^ $[ asParser token trim ,
Case Study: A JSON Parser 393
array: [ elements ]
elements: value
Figure 18.7: Syntax diagram representation for the JSON array parser de-
fined in script 18.38
elements optional ,
$] asParser token trim
PPJsonGrammar>>elements
^ value separatedBy: $, asParser token trim
Parsing values
In JSON, a value is either a string, a number, an object, an array, a Boolean
(true or false), or null. The value parser is defined as below and represented
in Figure 18.8:
Script 18.39: Defining the JSON parser for value as represented in Figure 18.8
PPJsonGrammar>>value
^ stringToken / numberToken / object / array /
trueToken / falseToken / nullToken
A string requires quite some work to parse. A string starts and end with
double-quotes. What is inside these double-quotes is a sequence of charac-
ters. Any character can either be an escape character, an octal character, or a
normal character. An escape character is composed of a backslash immedi-
ately followed by a special character (e.g., '\n' to get a new line in the string).
An octal character is composed of a backslash, immediately followed by the
letter 'u', immediately followed by 4 hexadecimal digits. Finally, a normal
character is any character except a double quote (used to end the string) and
a backslash (used to introduce an escape character).
Script 18.40: Defining the JSON parser for string as represented in Figure 18.9
PPJsonGrammar>>stringToken
^ string token trim
PPJsonGrammar>>string
^ $" asParser , char star , $" asParser
PPJsonGrammar>>char
394 PetitParser: Building Modular Parsers
string
value:
Token
number
object
array
true
false
null
Figure 18.8: Syntax diagram representation for the JSON value parser de-
fined in script 18.39
Special characters allowed after a slash and their meanings are de-
fined in the CharacterTable dictionary that we initialize in the initialize class
method. Please note that initialize method on a class side is called when
the class is loaded into the system. If you just created the initialize method
class was loaded without the method. To execute it, you shoud evaluate
PPJsonGrammar initialize in your workspace.
Script 18.41: Defining the JSON special characters and their meaning
PPJsonGrammar class>>initialize
CharacterTable := Dictionary new.
CharacterTable
at: $\ put: $\;
at: $/ put: $/;
at: $" put: $";
at: $b put: Character backspace;
at: $f put: Character newPage;
at: $n put: Character lf;
Case Study: A JSON Parser 395
charEscape
char: charOctal
charNormal
\ (backslash)
/ (slash)
b (backspace)
f (formfeed)
n (newline)
r (carr return)
t (tabulation)
Figure 18.9: Syntax diagram representation for the JSON string parser de-
fined in script 18.40
Script 18.42: Defining the JSON parser for number as represented in Figure 18.10
PPJsonGrammar>>numberToken
396 PetitParser: Building Modular Parsers
number: - 0 . digit
digit
digit
1-9
e / E + / - digit
Figure 18.10: Syntax diagram representation for the JSON number parser
defined in script 18.42
The attentive reader will have noticed a small difference between the syn-
tax diagram in Figure 18.10 and the code in script 18.42. Numbers in JSON
can not contain leading zeros: i.e., strings such as "01" do not represent valid
numbers. The syntax diagram makes that particularly explicit by allowing
either a 0 or a digit between 1 and 9. In the above code, the rule is made
implicit by relying on the fact that the parser combinator $/ is ordered: the
parser on the right of $/ is only tried if the parser on the left fails: thus, ($0
asParser / #digit asParser plus) defines numbers as being just a 0 or a sequence
of digits not starting with 0.
The other parsers are fairly trivial:
Script 18.43: Defining missing JSON parsers
PPJsonGrammar>>falseToken
^ 'false' asParser token trim
PPJsonGrammar>>nullToken
^ 'null' asParser token trim
PPJsonGrammar>>trueToken
^ 'true' asParser token trim
Source shows the source code of the rule. The code can be updated and
saved in this window. Moreover, you can add a new rule simply by
defining the new method name and body.
398 PetitParser: Building Modular Parsers
Figure 18.14: Another automatically generated example of the prim rule, after
having clicked the reload button. In this case, the prim example is a parenthe-
sized expression.
First shows set of terminal parsers that can be activated directly after the
rule started. As you can see on Figure 18.15, the first set of prim is either
digit or opening parenthesis '('. This means that once you start parsing
prim the input should continue with either digit or '('.
One can use first set to double-check that the grammar is specified cor-
rectly. For example, if you see '+' in the first set of prim, there is some-
thing wrong with the definitions, because the prim rule was never ment
to start with binary operator.
PetitParser Browser 399
Terminal parser is a parser that does not delegate to any other parser.
Therefore you don’t see parens in prim first set because parens delegates
to another parsers – trimming and sequence parsers (see script 18.46).
You can see '(' which is first set of parens. The same states for number
rule which creates action parser delegating to trimming parser dele-
gating to flattening parser delegating to repeating parser delegating to
#digit parser (see script 18.46). The #digit parser is terminal parser and
therefore you can see ’digit expected’ in a first set. In general, compu-
tation of first set could be complex and therefore PPBrowser computes
this information for us.
ExpressionGrammar>>parens
^ $( asParser trim, term, $} asParser trim
ExpressionGrammar>>number
^ #digit asParser plus flatten trim ==> [:str | str asNumber ]
Follow shows set of terminal parsers that can be activated directly after the
rule finished. As you can see on Figure 18.16, the follow set of prim is
closing bracket character parser ')', star character parser '*', plus charac-
ter parser '+' or epsilon parser (which states for empty string). In other
words, once you finished parsing prim rule the input should continue
with one of ')', '*', '+' characters or the input should be completely con-
sumed.
One can use follow set to double-check that the grammar is specified
correctly. For example if you see '(' in prim follow set, something is
wrong in the definition of your grammar. The prim rule should be fol-
lowed by binary operator or closing bracket, not by opening bracket.
In general, computation of follow could be even more complex than
computation of first and therefore PPBrowser computes this information
for us.
tab. One may parse the input sample by clicking the play I button or by
pressing Cmd-s or Ctrl-s. You can then gain some insight on the parse result
by inspecting the tabs on the bottom-right pane:
Result shows the result of parsing the input sample that can be inspected by
clicking either the Inspect or Explore buttons. Figure Figure 18.17 shows
the result of parsing (1+2).
Debugger shows a tree view of the steps that were performed during pars-
ing. This is very useful if you don’t know what exactly is happening
during parsing. By selecting the step the subset of input is highlighted,
so you can see which part of input was parsed by a particular step.
For example, you can inspect how the ExpressionGrammar works, what
rules are called and in which order. This is depicted in Figure 18.18.
The grey rules are rules that failed. This usually happens for choice
parsers and you can see an example for the prod rule (the definition is
in script 18.47). When parser was parsing 12 + 3 ∗ 4 term, the parser
tried to parse mul rule as a first option in prod. But mul required star
character '*' at position 2 which is not present, so that the mul failed and
instead the prim with value 12 was parsed.
Script 18.47: prod rule in ExpressionGrammar
ExpressionGrammar>>prod
^ mul / prim
ExpressionGrammar>>mul
^ prim, $* asParser trim, prod
Tally shows how many times a particular parser got called during the pars-
ing. The percentage shows the number of calls to total number of
calls ratio. This might be useful while optimizing performance of your
parser (see Figure 18.20).
Profile shows how much time was spent in particular parser during pars-
ing of the input. The percentage shows the ratio of time to total time.
This might be useful while optimizing performance of your parser (see
Figure 18.21).
Progress visually shows how a parser consumes input. The x-axis repre-
sents how many characters were read in the input sample, ranging
from 0 (left margin) to the number of characters in the input (right
margin). The y-axis represents time, ranging from the beginning of the
parsing process (top margin) to its end (bottom margin). A line going
from top-left to bottom-right (such as the one in Figure 18.22) shows
402 PetitParser: Building Modular Parsers
that the parser completed its task by only reading each character of
the input sample once. This is the best case scenario, parsing is linear
in the length of the input: In another words, input of n characters is
parsed in n steps.
When multiple lines are visible, it means that the parser had to go back
to a previously read character in the input sample to try a different
rule. This can be seen in Figure 18.23. In this example, the parser had
to go back several times to correctly parse the whole input sample: all
input was parsed in n! steps which is very bad. If you see many back-
ward jumps for a grammar, you should reconsider the order of choice
parsers, restructure your grammar or use a memoized parser. We will
have a detailed look on a backtracking issue in the following section.
PetitParser Browser 403
Figure 18.22: Progress of Petit Parser that parses input in linear amount of
steps.
Debugging example
As an exercise, we will try to improve a BacktrackingParser from script 18.48.
The BacktrackingParser was designed to accept input corresponding to the reg-
ular expressions 'a*b' and 'a*c'. The parser gives us correct results, but there
is a problem with performance. The BacktrackingParser does too much back-
tracking.
Script 18.48: A parser accepting 'a*b' and 'a*c' with too much backtracking.
PPCompositeParser subclass: #BacktrackingParser
instanceVariableNames: 'ab ap c p'
classVariableNames: ''
poolDictionaries: ''
category: 'PetitTutorial'
BacktrackingParser>>ab
^ 'b' asParser /
('a' asParser, ab)
BacktrackingParser>>c
404 PetitParser: Building Modular Parsers
^ 'c' asParser
BacktrackingParser>>p
^ ab / ap / c
BacktrackingParser>>start
^p
BacktrackingParser>>ap
^ 'a' asParser, p
parsed in a similar way as the 'a*b' strings. You can see such a modification
in script 18.49.
BacktrackingParser>>ab
406 PetitParser: Building Modular Parsers
^ 'b' asParser /
('a' asParser, ab)
BacktrackingParser>>ac
^ 'c' asParser /
('a' asParser, ac)
BacktrackingParser>>start
^ ab / ac
We can check the new metrics for inputc in both Figure 18.30 and Fig-
ure 18.31. There is significant improvement. For inputc , the tally shows only
20 invocations of the parser b and 9 invocations of the parser a. This is very
good improvement compared to the 110 invocations of the parser b and 55
PetitParser Browser 407
Figure 18.30: Progress of BacktrackingParser for inputc after the first update.
Figure 18.31: Tally of BacktrackingParser for inputc after the first update.
Figure 18.32: Progress of the BacktrackingParser after the second update for
inputc .
Figure 18.33: Tally of the BacktrackingParser after the second update for inputc .
# of invocations
Version inputb inputc
Original 28 233
First improvement 28 70
Second improvement 46 48
Table 18.4: Number of parser invocations for inputb and inputc depending
on the version of BacktrackingParser.
BacktrackingParser>>abc
^ ('b' asParser / 'c' asParser) /
('a' asParser, abc)
BacktrackingParser>>start
^ abc
Packrat Parsers 409
• The method ==> performs the transformation given in the block given
in parameter.
• Compose parsers (and create a grammar) by subclassing
PPCompositeParser.
4 http://www.themoosebook.org/book/internals/petit-parser
5 http://scg.unibe.ch/archive/phd/renggli-phd.pdf
Chapter 19
Biographies
1 http://bergel.eu
2 http://damiencassou.seasidehosting.st
412 Biographies
3 http://stephane.ducasse.free.fr
4 http://www.jannik-laval.eu