0% found this document useful (0 votes)
8 views

Clojure Guides_ Data Structures

Uploaded by

eowug
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Clojure Guides_ Data Structures

Uploaded by

eowug
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Data Structures

Intro
Vectors
Intro
Constructing Vectors
Accessing Elements
(Non-)emptiness
Linear-time Operations
Searching
Maps
Intro
Building Maps
Accessing Entries
Lists
Intro
Accessing Elements
Sets
Intro
Constructing sets
Sets as predicates
Relations
Sequences
Intro
Exercise: search for clojure.core functions with a coll argument
Exercise: Convoluted Encoding
Contributors

This work is licensed under a Creative Commons Attribution 3.0 Unported License
(https://creativecommons.org/licenses/by/3.0/) (including images & stylesheets). The source is available
on Github (https://github.com/clojure-doc/clojure-doc.github.io).

Intro
This cookbook covers some common tasks with core Clojure data structures. It assumes that the reader is
familiar with foundational read and write operations (get, conj, assoc, update, etc). For more coverage of
getting started with Clojure data structures, some recommended resources are:

Clojure.org's reference on collections (https://clojure.org/reference/data_structures#Collections)


Intro to Clojure: Data Structures (/articles/tutorials/introduction/#data-structures)
Intro to Clojure: Functions for Creating Data Structures (/articles/tutorials/introduction/#functions-for-
creating-data-structures) - also, the section right after this one, which covers manipulating them
Vectors
Intro
Vectors are probably the most commonly used data structure for sequential data that isn't code. The
random access capability and the fact that vectors aren't treated as function calls make them often a
better choice than lists.

Constructing Vectors
mapv and filterv are eager versions of map and filter that return vectors

user=> (mapv second [[1 1] [2 4] [3 9]])


[1 4 9]

user=> (filterv even? (range 10))


[0 2 4 6 8]

Accessing Elements
Vectors implement the stack protocol, meaning peek and pop return the last element and the vector
without the last element, respectively.

user=> (peek [7 8 9])


9
user=> (pop [7 8 9])
[7 8]

(Non-)emptiness
Sometimes it's desirable to treat an empty vector as a falsey value, especially in conditionals. The not-
empty function will return its argument if it's not empty, otherwise it will return nil.
user=> (defn notate-range
"takes a vector of [start end] (both optional) and returns a range notat
[coords]
(if-not (empty? coords)
(let [[start end] coords]
(str start ".." end))
"empty range"))
#'user/notate-range
user=> (notate-range [1 5])
"1..5"
user=> (notate-range [7])
"7.."
user=> (notate-range [])
"empty range"

; we can also write this as


user=> (defn notate-range
"takes a vector of [start end] (both optional) and returns a range notat
[coords]
(if-let [[start end] (not-empty coords)]
(str start ".." end)
"empty range"))

Here, not-empty is used to make the conditional false (the empty vector is truthy, and not-empty
makes it nil ).

Linear-time Operations
Operations that need to search through elements one-by-one or operate on a large number of elements
are less efficient, but still valuable in some contexts.

The replace function has a special case for vectors that returns a vector. It takes a map where the keys
are the values to replace and the values are the replacements.

; change :begin to :start and :finish to :end


user=> (replace {:begin :start, :finish :end}
[:begin :middle :finish :begin :finish])
[:start :middle :end :start :end]

"Deleting" an item from anywhere but the last position is not a 'fast' operation, because conceptually all the
indexes after the deleted item need to be decremented by one. We can use into to combine two sub-
vectors, but the trade-off is that this operation adds the items from the second sub-vector one at a time, so
this function is O(n) in the size of the second sub-vector.

user=> (defn delete-at [v pos]


(into (subvec v 0 pos) (subvec v (inc pos))))
#'user/delete-at
user=> (delete-at [:a :b :c] 1)
[:a :c]
Inserting an item at an arbitrary position has the same issue as deleting - it potentially changes a lot of
indexes. An implementation is also similar to delete-at , with the same caveat about efficiency.

user=> (defn insert-at [v pos value]


(into (conj (subvec v 0 pos) value) (subvec v pos)))
#'user/insert-at
user=> (insert-at [:a :b :c] 2 "interloper!")
[:a :b "interloper!" :c]

Searching
Checking for the existence of a value (as opposed to an index) requires examining the elements one-by-
one. A point of confusion that sometimes arises is the contains? function, which searches an
associative collection for the existence of a key/index, and is very seldom desirable for vectors.

; `contains?` looks for an index!!!


user=> (contains? [:a :b :c] :b)
false
user=> (contains? [:a :b :c] 2)
true ; because the indexes are 0, 1, and 2

A commonly-used way to check for an occurrence of a certain value in a collection is the some function
with a set as a predicate. some stops at the first truthy return, so there's no 'wasted' computation.

user=> (some #{:b} [:a :b :c])


:b

Note that searching for false or nil in this way won't work. In those cases use the nil? or false?
functions.

Java interop can be used to get the first or last index of a specific value.

user=> (.indexOf [:a :b :c :b :a] :b)


1
user=> (.lastIndexOf [:a :b :c :b :a] :b)
3
; indexOf returns -1 when the value isn't found
user=> (.indexOf [:a :b] :c)
-1

keep-indexed can be used to find all the indexes of a value (or indexes that match some predicate).
Note that this is actually a function that works on (and returns) a sequence.

user=> (defn indexes-of [search-value coll]


(keep-indexed (fn [i v] (when (= search-value v) i)) coll))
#'user/indexes-of
user=> (indexes-of 15 [5 10 15 30 15 5])
(2 4)
Maps
Intro
Maps can be hashed or sorted (array maps are also available, but are mostly used for maps with < 10
entries). Sorted maps aren't as fast to lookup by key, but the sorting by key makes them useful for iteration
in some circumstances.

Building Maps
zipmap associates corresponding entries from two seqs into a map. This is used here to assign people to
teams.

; assign participants to teams


user=> (let [participants ["Mike" "Tina" "Alice" "Fred"]
team-nums (cycle [1 2])]
(zipmap participants team-nums))
{"Mike" 1, "Tina" 2, "Alice" 1, "Fred" 2}

group-by could be used to build the team rosters, but the result still has the team numbers in the roster.

user=> (group-by val {"Mike" 1, "Tina" 2, "Alice" 1, "Fred" 2})


{1 [["Mike" 1] ["Alice" 1]], 2 [["Tina" 2] ["Fred" 2]]}

update-vals (added in Clojure 1.11) could be tacked on to clean up the output

user=> (-> (group-by val {"Mike" 1, "Tina" 2, "Alice" 1, "Fred" 2})


(update-vals #(mapv first %)))
{1 ["Mike" "Alice"], 2 ["Tina" "Fred"]}

reduce-kv allows more control over the values, meaning it's possible to go directly to "clean" output (and
maybe even different data structures).

user=> (reduce-kv
(fn [acc k v] (update acc v (fnil conj #{}) k))
{} {"Mike" 1, "Tina" 2, "Alice" 1, "Fred" 2})
{1 #{"Alice" "Mike"}, 2 #{"Tina" "Fred"}}

frequencies takes a collection and returns a map of elements in that collection to how many times it
appears. This is used here for a rudimentary word counter.

user=> (-> "the cat in the hat fell in the vat"


(str/split #" ")
frequencies)
{"the" 3, "cat" 1, "in" 2, "hat" 1, "fell" 1, "vat" 1}

clojure.set has a map-invert function that will swap the keys and values. The "unique key" facet of
maps means that duplicate values in the input map will get one of the keys.
user=> (require '[clojure.set :as set])
user=> (let [squares {1 1, 2 4, 3 9, 4 16}
sqrts (set/map-invert squares)]
(sqrts 9))
3

Accessing Entries
While select-keys can be used to create a submap, it is sometimes desirable to pull some keys into a
sequence. In this example, juxt is used to make a vector of the :x and :y values of maps for passing
to another function.

user=> (defn manhattan-distance [[x1 y1] [x2 y2]]


(let [distance (comp abs -)]
(+ (distance x1 x2) (distance y1 y2))))
#'user/manhattan-distance
user=> (let [point1 {:x 2 :y 0}
point2 {:x 0 :y 2}
xy-coords (juxt :x :y)]
(manhattan-distance (xy-coords point1)
(xy-coords point2)))
4

Lists
Intro
Lists are primarily used for code in Clojure. While there aren't a ton of functions aimed specifically at lists,
lists do implement the sequence interface, so all sequence functions work on lists without any conversion.

Accessing Elements
Like vectors, lists implement the stack protocol, but unlike vectors, the 'top' of the stack is the front of the
list. So, 'first' on a list returns the same element as peek (the top of the stack), but on a vector 'first' returns
the bottom of the stack (this can be useful for queueing in LIFO vs FIFO order, for example).

user=> (peek '(1 2 3))


1
user=> (pop '(1 2 3))
(2 3)

Sets
Intro
In addition to the distinctness of sets, they're fast to check membership, so if values are being collected for
the purposes of checking whether they've been seen or not, a set is often a good choice.
Constructing sets
While keys on a map returns a sequence, the Java interop call to keySet can be used to get a set of
keys on the map. This set is technically an anonymous instance of Java's AbstractSet .

user=> (.keySet {:a 1 :b 2})


#{:a :b}

Sets as predicates
Because sets can be used as functions, they can be used as predicates with various higher-order
functions.

; are there any fives in the sequence?


user=> (some #{5} [1 2 3 5 8 13])
5

; how many `a`s and `c`s are in the DNA sequence?


user=> (count (filter #{\a \c} "agctgcgcatagcgt"))
7

;which word(s) can be typed using only the top row of a qwerty keyboard?
user=> (let [top-row (set "qwertyuiop")
candidates ["poet" "computer" "typewriter" "desk"]]
(filter #(every? top-row %) candidates))
("poet" "typewriter")

Relations
In addition to the more primitive set functions, the clojure.set namespace contains the fundamental
relational algebra (the underpinnings of SQL) operations. Sets of maps can be treated as relations,
providing the ability to do joins, projections, etc on in-memory data structures.

user=> (require '[clojure.set :as set])


user=> (let [owners #{{:name "Jane" :pet "Fido"}
{:name "Tim" :pet "Scaly"}}
pets #{{:name "Fido" :species "dog"}
{:name "Scaly" :species "snake"}}]
(-> (set/rename pets {:name :pet-name}) ; rename the :name key to disambi
(set/join owners {:pet-name :pet}) ; join with owners (on pet-name =
(set/project [:name :species]))) ; project (select) only the owner's
#{{:name "Jane", :species "dog"}
{:name "Tim", :species "snake"}}
Sequences
Intro
Lots of clojure.core functions operate on sequences. In general, a function that operates on
sequences will return a sequence, even when the coll (a common name for a collection argument) is
something like a vector or map.

Exercise: search for clojure.core functions with a coll


argument
While not perfectly accurate or comprehensive, a possible starting point for exploring sequence functions
is to search for functions in the clojure.core namespace that have an argument named coll .

; since the arglist is nested, helper function to check for a `coll` arg
user=> (defn has-coll-arg?
"Arglists are stored as a list of arg vectors, e.g. ([x] [x y]).
Returns the first `coll` in an arg vector, else nil."
[arglist]
(some ; some over the arg-vecs
(fn [arg-vec] (some ; some over the members of each arg-vec
#{'coll} arg-vec)) arglist))
user=> (->> (ns-publics (the-ns 'clojure.core))
vals
(map meta)
(filter (comp has-coll-arg? :arglists))
(map :name)
sort)
(->Eduction
assoc!
associative?
...)

This example, in addition to illustrating some of the runtime introspection capabilities of Clojure, illustrates
a few sequence functions. some is used on both lists and vectors (and uses the 'set as predicate' idiom
from above to search through the vectors). Depending on exactly which version of Clojure is in use, this
code returns a sequence of around 80 functions.

Exercise: Convoluted Encoding


Here's a contrived encoding algorithm to demonstrate a few sequence functions. The decoding has the
following rules:

the encoded string will be a sequence/string of numbers


for every three digits, multiply the first two and add the third until a strictly descending three digits is
reached (or until all digits are exhausted)
each resulting number corresponds to its ordinal position in the alphabet (e.g. 1 = a, 2 = b, etc)

So, for example, "45011429135594375" becomes...

450 - 4 * 5 + 0 = 20 = t
114 - 1 * 1 + 4 = 4 = e
291 - 2 * 9 + 1 = 19 = s
355 - 3 * 5 + 5 = 20 = t
943 - these digits are strictly descending, so from it onwards is thrown out, and the decoded text is
'test'

; digit char -> int, e.g. \9 -> 9


user=> (defn int-char->int [c] (- (int c) (int \0)))

; {1 \a, 2 \b, ... 26 \z}


user=> (def number->letter
(let [numbers (range 1 27)
letters (map char (iterate inc (int \a)))]
(zipmap numbers letters)))

user=> (defn decode [s]


(->> s
(map int-char->int) ; string of digits -> seq of ints
(partition 3) ; split into triples
(take-while (complement #(apply > %))) ; stop if there's a descending tr
(map (fn [[a b c]] (+ c (* a b)))) ; do the decoding math
(map number->letter) ; get the character for each number
(apply str))) ; put the chars into a string

user=> (decode "45011429135594375")


"test"

An exercise for the reader is to use for to generate the 3 digit sequences that 'compute' to (<= 1 n
26) and aren't in descending order, as part of the encoding procedure.

Contributors
@bobisgeek (https://github.com/bobisageek) - original author

« Building Projects: tools.build and the Clojure CLI (/articles/cookbooks/cli_build_projects/) || Strings »


(/articles/cookbooks/strings/)

Links
About (/articles/about/)
Table of Contents (/articles/content/)
Getting Started (/articles/tutorials/getting_started/)
Introduction to Clojure (/articles/tutorials/introduction/)
Clojure Editors (/articles/tutorials/editors/)
Clojure Community (/articles/ecosystem/community/)
Basic Web Development (/articles/tutorials/basic_web_development/)
Language: Functions (/articles/language/functions/)
Language: clojure.core (/articles/language/core_overview/)
Language: Collections and Sequences (/articles/language/collections_and_sequences/)
Language: Namespaces (/articles/language/namespaces/)
Language: Java Interop (/articles/language/interop/)
Language: Polymorphism (/articles/language/polymorphism/)
Language: Concurrency and Parallelism (/articles/language/concurrency_and_parallelism/)
Language: Macros (/articles/language/macros/)
Language: Laziness (/articles/language/laziness/)
Language: Glossary (/articles/language/glossary/)
Ecosystem: Library Development and Distribution (/articles/ecosystem/libraries_authoring/)
Ecosystem: Web Development (/articles/ecosystem/web_development/)
Ecosystem: Generating Documentation (/articles/ecosystem/generating_documentation/)
Building Projects: tools.build and the Clojure CLI (/articles/cookbooks/cli_build_projects/)
Data Structures
Strings (/articles/cookbooks/strings/)
Mathematics with Clojure (/articles/cookbooks/math/)
Date and Time (/articles/cookbooks/date_and_time/)
Working with Files and Directories in Clojure (/articles/cookbooks/files_and_directories/)
Middleware in Clojure (/articles/cookbooks/middleware/)
Parsing XML in Clojure (/articles/cookbooks/parsing_xml_with_zippers/)
Growing a DSL with Clojure (/articles/cookbooks/growing_a_dsl_with_clojure/)

Copyright © 2024 Multiple Authors


Powered by Cryogen (https://cryogenweb.org)

You might also like