FP UNIT 5 Clojure
FP UNIT 5 Clojure
_______________________________________________________________
1
Getting Clojure: Why Lisp Still Matter
Clojure is a dynamically typed, practical programming language that targets the
JVM and other modern runtimes. It is a language in the Lisp tradition. Lisp is the
second oldest high-level programming language. It was originally created in 1958
by John McCarthy, and has gone through more than fifty years of evolution. One
of the most recent branches of this evolution is Clojure. It enables the most
powerful meta programming system yet created. It’s powerful enough that the
majority of the language is implemented using it.
The REPL
The Read Eval Print Loop, or REPL is a Clojure’s interactive programming
environment.
How REPL works:
A user of the REPL types some code.
The REPL reads it in, turning it from a string into another data structure.
That data structure is then evaluated to produce a value, which is printed.
Finally, the REPL loops back to the beginning, waiting for new input.
Example 1:
Running “hello, world” in the REPL:
=> (println "hello, world")
hello, world
nil
Example 2:
To add two numbers together, i.e to add 21 and 21:
=> (+ 21 21)
42
Example 3:
Create a function say-hello,which just prints "hello, pragmatic
programmers":
In the first example, "hello, world" is printed, followed by nil. The Eval in REPL
takes a code and executes it. Evaluating a bit of code will always produce a value.
2
Since a call to println has no interesting value, it is being executed only to print
something, therefore nil is returned.
In the second and third example, there was no nil, and only the results 42 and
#'matters/say-hello is printed. The value of two numbers were added together and
the name of a function was defined.
For instance, to evaluate the integer 1 and string "foo" in the REPL:
=> 1
1
=> "foo"
"foo"
Some things don’t evaluate to themselves, like the calls to println and + operator .
With those, the arguments were first evaluated and then passed into the println
function or + operator.
Some calls can be nested, for instance, first (* 10 2) is evaluated to get 20, then
(+ 22 20) is evaluated to get the final value of 42:
=> (+ 22 (* 10 2))
42
These calls can be further nested arbitrarily deep, by adding one more layer
in the following snippet:
=> (+ 22 (* 10 (/ 4 2)))
42
To turn off evaluation, prepend a snippet of code with a single quote:
=> '(+ 1 1)
(+ 1 1)
When we type something into the REPL, we’re typing in a series of characters, a
string. This isn’t what ultimately gets evaluated by Clojure. Instead, these
characters are first passed into the R in REPL, the reader.
3
:foo
Vectors give fast positional access to their elements. They can be created by
placing the elements of the vector inside of square brackets.
read-string: This takes a string directly and reads it. It reads in a new vector and
name it some-more-keywords.
The Read in REPL reads in data that can be manipulated in a code, much like
JSON or YAML parser would.
Consider another Clojure data structure, the list. In Clojure, as in other Lisps, a list
is a singly linked list.
Method 1:
Method 2:
Another way is to simply enclose the list elements in round braces and use
read-string:
4
These two lists are equivalent:
=> (first a-list)
:foo
=> (first another-list)
:foo
=> (= a-list another-list)
True
Method 3:
Here, create a list with three elements:
the symbol + and the integers 21 and 21.
=> (def funky-looking-list (read-string "(+ 21 21)"))
#'matters/funky-looking-list
Use the first function to get the first element.
=> (first funky-looking-list)
+
The first two list methods just contain keywords; third method contains code.
Clojure code is just Clojure data, a property known as homoiconicity.
The evaluation rule that was hinted for function calls is actually the evaluation rule
for lists.
Evaluate funky-looking-list as follows:
=> (eval funky-looking-list)
42
Because Clojure code is just Clojure data, it can be manipulated as with other data.
Macros
A macro is a special kind of function. It’s intended to take a piece of data that
represents code, also known as a form. A macro transforms one form into another
before Clojure’s compiler compiles it. Finally, the evaluation rule for a macro is
special in that a macro does not evaluate its arguments.
Consider a simple macro that takes two arguments, a name and a string to
print. It then creates a function that prints the passed-in string:
Syntax quote is useful because it helps to write macros that look like templates for
the code that they’ll generate.
6
Much of Clojure’s core functionality is built using macros. For instance, defn
expands to def and fn, as shown below:
In summary, Clojure code is just Clojure data. Use the macro system and syntax
quote to write code templates that look like the code they generate. The model of
Eval that is built up here is incomplete in several ways. Most notably, Clojure
supports a form of lazy evaluation, which defers the evaluation of a form until it’s
needed. Second, it’s tempting to think that the Eval in other languages, such as
JavaScript, is the same as the Eval in a Lisp like Clojure. In JavaScript and most
other languages, Eval operates on strings. This means writing any significant
program mustbe done by string manipulation, an extremely error-prone and
difficult proposition for large problems.
_______________________________________________________________
To change a value, the bit at a particular location is flipped. For instance, in the
following dramatic re-enactment, change the value at memory location 8 from 0 to
1.
7
This simplified model is similar to the model presented by modern computer
memory. Modern memory holds many more bits, and the addressing schemes it
uses are more complex, but the core model of memory as a series of cells with an
address still holds.
One core abstraction is mutable object references to mutable objects. This makes
memory friendlier to work with in many ways.
Traditional object-oriented models encourage to program as if the places are used
to store data, just like the cells in a simple memory model.
Many object references may refer to the same place; however, if an object is
modified through any of those references, all of the references may eventually see
the modification.
For instance, the following diagram is a simple data model of a person. Herethe
person is modeled as an object with three references pointing at it, which stand in
for Java-style object reference. Consider that a person has committed a crime
involving three yaks, a razor, and an industrial-sized can of shaving cream. On the
run from the law, he has to change his name and address. Doing so using standard
object-oriented tools would mutate the object that represents the person. This
means that his old name and address would be lost, as the following diagram
demonstrates!
8
This is not how human memory works. Just because a name is changed, it now
doesn’t mean that his past name is just wiped away. If the object is modified
through the reference labeled person. What do the other references see?
In the simple case, program is single-threaded and it is expected that once the
person is modified, all subsequent attempts to view it through any of the references
will see the modified value. Even this simple case can be fairly complex, especially
for large software systems. Two pieces of code far apart in the codebase may
modify the same data without realizing it, which can cause errors that are difficult
to debug. Worst of all is when the references reside in separate threads. In this
case, two threads may attempt to modify the person concurrently. This causes all
sorts of problems. The traditional solution to these problems involves locks.
Programming with locks is difficult.
9
To change the name and address, a new immutable person is created that starts
with the old one as a base and assign reference to the new person:
This allows to have some references refer to the new person and some to the old
person, so if one thread is using the data while another thread attempts to modify it,
it can simply use the old value.
A persistent data structure is a data structure that preserves its previous version
when it’s modified, making it appear to be immutable.
10
(def mike
{:name "Michael"
:address "123 Fake St."
:age 30})
The assoc function takes a map-like data structure and a variable number of
key-value pairs, and associates them with the map. If the keys have values
existing in the map, they are overwritten.
Use assoc to create a new map based on the one referred to by mike, and create a
new reference to mike. The existing new-mike is not affected.
In addition to the persistent maps, Clojure has several other persistent data
structures—most notably vectors, which gives fast lookup by index and fast
appends all while remaining immutable, and sets, which are just immutable
versions of normal sets.
11
This ensures that all of the references being managed point at the latest value for a
given identity. To see a value, the reference that points to it must be de-referenced.
De-referencing returns the state that the reference refers to, which captures it
at the point in time it was de-referenced.
The following diagram shows one last model of a person, where the person is
accessed using a managed reference. Both the personOne and aGuy references
go through the managed references, while the original person’s reference does not:
12
Example:
Here, an add-to-int-atom is created, which adds an integer to the atom which is
just defined.
Now use add-to-int-atom to add the number 42 to the number that the atom refers
to.
The results of the addition is shown when both int-atom and another-int-atom-
reference are dereferenced.
13
=> int-atom-snapshot
42
Let’s start off with a look at a simple concurrent programming problem in Java.
Increment the counters from multiple threads running concurrently. Use two
counters: the first is a plain old int, and the second is Atomic Integer.
14
for (int i = 0; i < 10000; i++) {
executors.execute(new Runnable() {
public void run() {
counter++;
atomicCounter.incrementAndGet();
}
});
}
// Shut down the pool and block until all tasks have executed.
executors.shutdown();
executors.awaitTermination(60, TimeUnit.SECONDS);
System.out.println(String.format("Normal int counter: %s", counter));
System.out.println(String.format("AtomicInteger counter: %s", atomicCounter));
}
}
It can be seen that incrementing an integer isn’t an atomic operation. It’s actually
broken up into multiple steps when compiled into bytecode, so the thread doing an
increment can see the int it’s incrementing in an inconsistent state.
To make this a bit more clear, let’s take a look at a simpler example that just
increments an int a single time.
public class JustIncAnInt {
private static int i;
public static void main(String[] args){
i++;
}
}
Once the preceding code is compiled, use a command-line tool that comes with
JDKs to get a readable representation of the bytecode, which is compiled to:
> javap -c JustIncAnInt
Compiled from "JustIncAnInt.java"
15
public class JustIncAnInt extends java.lang.Object{
public JustIncAnInt();
Code:
0: aload_0
1: invokespecial #1; //Method java/lang/Object."<init>":()V
4: return
public static void main(java.lang.String[]);
Code:
0: getstatic #2; //Field i:I
3: iconst_1
4: iadd
5: putstatic #2; //Field i:I
8: return
}
The single increment is actually compiled to multiple bytecode operations. Without
going too far into the weeds, the 0: getstatic #2; //Field i:I is responsible for
loading the current value of the counter. It’s not until 4: iadd that the increment is
performed, and 5: putstatic #2; //Field i:I that it’s written back into the variable.
This, combined with subtleties involving when a write that one thread makes
becomes visible to other threads, leaves plenty of opportunity for two threads to
read and increment the variable at the same time.
One solution would be to use one of Java’s locking mechanisms to protect the
counter variable to ensure that only one thread can access the counter at a time, and
that any modifications made to it are visible when the lock is relinquished.
Clojure’s Solution
Clojure’s immutable data structures, managed reference types, and software
transactional memory system combine to provide a model that’s both high level
and much easier to use than locking concurrency.
16
Atomic Power
Atoms are good for managing state that’s independent and to which synchronous
changes need to be made. Here, independent means the changes can be made
without coordinating a change to any other piece of state. Synchronous means
block the thread that triggered the change until it’s done. An atom is created out of
any immutable data structure using atom, and make a change to the atom’s state by
passing the atom and function into the swap! function. The swap! function applies
the passed in function to the value wrapped by the atom, swaps the old value for
the new one, and returns it.
Create a simple atom counter, and a function to increment it:
=> @atom-counter
0
=> (increment-atom-counter)
1
(defn concurrent-atom-modification []
(let [executors (Executors/newFixedThreadPool 5)
counter (atom 0)]
17
(dotimes [_ 10000]
(.execute executors (fn [] (swap! counter inc))))
(.shutdown executors)
=> (concurrent-atom-modification)
10000
This gives a high-level approach to concurrency. One consequence of the way that
atoms work is that the function passed into swap! cannot have any side effects, or
at least no side effects that cannot be repeated. This is because the function may be
retried if the compare-and-set! fails the first time around.
Get an Agent
Like atoms, agents manage changes to independent state, but they’re designed to
do so in an asynchronous way. To modify the value referred to by an agent, use the
send function.
Like swap!, it takes an atom to modify and a function that modifies it. Unlike
swap!, it returns immediately. Operations are queued if necessary and applied to
the agent serially.
Create an atom version of counter, and a function to increment it:
(def agent-counter (agent 0))
(defn increment-agent-counter []
(send agent-counter inc))
18
Working the Ref
The final managed reference type is the ref. These can be used to make coordinated
changes across more than one data structure. Consider a tiny system that keeps
track of television series and episodes.
An episode of a series is represented as a map with an id, a name, and a nested map
that represents the series it’s attached to.
The nested series has a series id and name, as shown below:
{:id 42
:name "Fragged"
:series {:id 10 :name "Battlestar Galactica"}}
As episodes are added to the system, populate two maps, one of episodes keyed off
of episode id and one of series keyed off of series id. To add the series to the series
map, pick the embedded series out of an episode when it’s added, and add it to the
series map if it’s not already there. To ensure that the data is never in an
inconsistent state, where an episode is added but not its corresponding series, and it
will be done using Clojure’s refs and software transactional memory system.
Episodes are added using assoc. Much like the swap! function does for atoms, and
the send function does for agents, alter takes a reference, function, and
arguments.
Unlike swap! and send!, alter must be called inside of a transaction. To create a
transaction, wrap calls to alter inside of dosync. This acts like a database
transaction, and it turns multiple calls to alter into one atomic unit. If one thread
wants to modify either map while another is in the process of doing so, one thread
would win and its transaction would commit. The other would roll its transaction
back and try it again. This means that, just like with the atoms, transactions need to
be avoid side effects.
19
#(assoc % (:id current-episode) current-episode))
(alter all-series
#(assoc % (:id current-series) current-series)))))
Now create a few test episodes, and add them to the system:
(def e1 {:id 1
:name "33"
:series {:id 10 :name "Battlestar Galactica"}})
(def e2 {:id 2
:name "Water"
:series {:id 10 :name "Battlestar Galactica"}})
(def e3 {:id 3
:name "The Target"
:series {:id 11 :name "The Wire"}})
Both of the maps contain the correct value, and they got there in a threadsafe
way!
=> @all-episodes
{3 {:name "The Target",
:series {:name "The Wire", :id 11},
:id 3},
2 {:name "Water",
:series {:name "Battlestar Galactica", :id 10},
:id 2},
1 {:name "33",
:series {:name "Battlestar Galactica", :id 10},
:id 1}}
20
=> @all-series
{11 {:name "The Wire", :id 11},
10 {:name "Battlestar Galactica", :id 10}}
Clojure and Scala both rely on the JVM and have some similarities and some
differences in syntax and in their implementation of functional principles.
21