L21 Sets and Maps
L21 Sets and Maps
Paul Fodor
CSE260, Computer Science B: Honors
Stony Brook University
http://www.cs.stonybrook.edu/~cse260
1
Objectives
To store unordered, nonduplicate elements using sets
To explore how and when to use HashSet,
LinkedHashSet or TreeSet to store elements
To compare performance of sets and lists
To use sets to develop a program that counts the distinct
keywords in a Java source file
To tell the differences between Collection and Map and
describe when and how to use HashMap,
LinkedHashMap, and TreeMap to store values associated
with keys
To use maps to develop a program that counts the occurrence of
the words in a text
2
(c) Paul Fodor (CS Stony Brook) & Pearson
Motivation
Suppose we need to write a program that checks whether a
student is in a class
You can use a list to store the names of the students and search
for the student with linear search
Or sort the list of students and search with binary search
However, a more efficient data structure for this application is a
set with efficient methods to search for elements
Moreover, suppose your program also needs to store detailed
information about the students in the class (e.g., grades for labs,
homework submissions, submission times, GPA) and all can be
retrieved using the name of the student as the key
A map is an efficient data structure for such a task
3
(c) Paul Fodor (CS Stony Brook) & Pearson
Review of Java Collection
Framework hierarchy
Set is a sub-interface of Collection
You can create a set using one of its three concrete classes:
HashSet, LinkedHashSet, or TreeSet
4
(c) Paul Fodor (CS Stony Brook) & Pearson
Reminder Collection
The Collection interface is the root interface
for manipulating a collection of objects.
5
(c) Paul Fodor (CS Stony Brook) & Pearson
The Set Interface
The Set interface extends the Collection
interface, but it does not introduce new methods or
constants, but it stipulates that an instance of Set
contains no duplicate elements
That is, no two elements e1 and e2 can be in the
set such that e1.equals(e2) is true
The concrete classes that implement Set must ensure that
no duplicate elements can be added to the set
6
(c) Paul Fodor (CS Stony Brook) & Pearson
AbstractSet
The AbstractSet class extends
AbstractCollection and partially implements
Set
7
(c) Paul Fodor (CS Stony Brook) & Pearson
AbstractSet
The AbstractSet class provides concrete
implementations for the equals method and the
hashCode method
The hash code of a set is the sum of the hash
codes of all the elements in the set
Since the size method and iterator method are
not implemented in the AbstractSet class,
AbstractSet is an abstract class
8
(c) Paul Fodor (CS Stony Brook) & Pearson
Hash codes
Hash codes:
hashCode method is defined in the Object class
The hash codes of two objects must be the same if the two objects are
equal
Two unequal objects may have the same hash code, but you should
implement the hashCode method to avoid too many such cases
API Java hashcode examples:
hashCode in the Integer class returns its int value
hashCode in the Character class returns the Unicode of the character
hashCode in the String class returns
s0 *31(n - 1) + s1 *31(n - 2) + … + sn - 1
where si is s.charAt(i).
31 is an odd prime with a nice property that the multiplication can be replaced
by a shift and a subtraction for better performance: 31*i == (i<<5)-i.
Modern VMs do this sort of optimization automatically.
9
(c) Paul Fodor (CS Stony Brook) & Pearson
The Set Interface
Hierarchy
10
(c) Paul Fodor (CS Stony Brook) & Pearson
The HashSet Class
The HashSet class is a concrete class that implements Set
You can create an empty hash set using its no-arg constructor
or create a hash set from an existing collection
The elements are not stored in the order in which they are
inserted into the set
There is no particular order for the elements in a hash set
To impose such an order on them, you need to use the
LinkedHashSet class
11
(c) Paul Fodor (CS Stony Brook) & Pearson
The HashSet Class
By default, the initial capacity is 16 and the load factor is 0.75
The load factor is the number of elements in the set divided by the
capacity
It is a value between 0.0 (the set is empty) and 1.0 (the set is full to capacity)
It measures how full the set is allowed to be before its capacity is increased
When the number of elements exceeds (greater or equal) the product of the
capacity and load factor, the capacity is automatically doubled
For example, if the capacity is 16 and load factor is 0.75, when the size
reaches 12 (16*0.75 = 12) the capacity will be doubled to 32
A higher load factor decreases the space costs but increases the
search time
The default load factor 0.75 is a good tradeoff between time and
space costs – we will see how search works when we implement
hashing.
The position of an element in the Set is close to the remainder of the
division of the hashCode and the capacity
12
(c) Paul Fodor (CS Stony Brook) & Pearson
Example: Using HashSet and Iterator
This example creates a hash set filled with strings, and uses an iterator
to traverse the elements in the list.
import java.util.*;
public class TestHashSet {
public static void main(String[] args) {
// Create a hash set
Set<String> set = new HashSet<>();
// Add strings to the set
set.add("London");
set.add("Paris");
set.add("New York");
set.add("San Francisco");
set.add("Beijing");
set.add("New York");
System.out.println(set);
// Display the elements in the hash set
for (String s: set) {
System.out.print(s.toUpperCase() + " ");
}
System.out.println();
// Process the elements using the forEach method
set.forEach(e -> System.out.print(e.toLowerCase() + " "));
13 }
(c) Paul Fodor (CS Stony Brook) & Pearson
}
The HashSet Class
Since a set is an instance of Collection, all methods
defined in Collection can be used for sets
Including for-each loops can be used to traverse all
the elements in the set
Collection interface extends the Iterable
interface, so the elements in a set are iterable
14
(c) Paul Fodor (CS Stony Brook) & Pearson
Collection methods
public class TestMethodsInCollection {
public static void main(String[] args) {
// Create set1
java.util.Set<String> set1 = new java.util.HashSet<>();
// Create set2
java.util.Set<String> set2 = new java.util.HashSet<>();
15
(c) Paul Fodor (CS Stony Brook) & Pearson
// Add strings to set2
set2.add("London");
set2.add("Shanghai");
set2.add("Paris");
System.out.println("\nset2 is " + set2);
System.out.println(set2.size() + " elements in set2");
set1.addAll(set2);
System.out.println("\nAfter adding set2 to set1, set1 is "
+ set1);
set1.removeAll(set2);
System.out.println("After removing set2 from set1, set1 is "
+ set1);
set1.retainAll(set2);
System.out.println("After removing common elements in set2 "
+ "from set1, set1 is " + set1);
}
}
16
(c) Paul Fodor (CS Stony Brook) & Pearson
Output (cont.):
17
(c) Paul Fodor (CS Stony Brook) & Pearson
The LinkedHashSet Class
LinkedHashSet extends HashSet with a linked-
list implementation that supports an ordering of the
elements in the set
The elements in a LinkedHashSet can be retrieved in
the order in which they were inserted into the set
18
(c) Paul Fodor (CS Stony Brook) & Pearson
Example: Using LinkedHashSet
This example creates a hash set filled with strings, and uses an iterator
to traverse the elements in the list.
import java.util.*;
public class TestLinkedHashSet {
public static void main(String[] args) {
// Create a hash set
Set<String> set = new LinkedHashSet<>();
// Add strings to the set
set.add("London");
set.add("Paris");
set.add("New York");
set.add("San Francisco");
set.add("Beijing");
set.add("New York");
System.out.println(set);
20
(c) Paul Fodor (CS Stony Brook) & Pearson
The SortedSet Interface and
the TreeSet Class
SortedSet is a sub-interface of Set, which guarantees that the
elements in the set are sorted
NavigableSet extends SortedSet to provide navigation methods
lower(e), floor(e), ceiling(e), and higher(e) that
return elements respectively less than, less than or equal, greater than or
equal, and greater than a given element and return null if there is no such
element
The pollFirst() and pollLast() methods remove and return the
first and last element in the tree set, respectively
headSet(toElement) and tailSet(fromElement) return
a portion of the set whose elements are less than toElement and greater
than or equal to fromElement, respectively
21
(c) Paul Fodor (CS Stony Brook) & Pearson
The SortedSet Interface and
the TreeSet Class
You can use an iterator to traverse the elements in the
sorted order
The elements can be sorted in two ways
One way is to use the Comparable interface
The other way (order by comparator) is to specify a
comparator for the elements in the set if the class for
the elements does not implement the Comparable
interface, or you don’t want to use the compareTo
method in the class that implements the
22
Comparable interface
(c) Paul Fodor (CS Stony Brook) & Pearson
The SortedSet Interface and
the TreeSet Class
TreeSet is a concrete class that implements the SortedSet and
NavigableSet interfaces
It provides the methods first() and last() for returning the first
and last elements in the set
You can add objects into a tree set as long as they can be compared with
each other
The following example creates a hash set filled with strings, and
then creates a tree set for the same strings
The strings are sorted in the tree set using the compareTo
method in the Comparable interface
23
(c) Paul Fodor (CS Stony Brook) & Pearson
import java.util.*;
pollFirst(): Beijing
pollLast(): San Francisco
25
New tree set: [London, New York, Paris]
(c) Paul Fodor (CS Stony Brook) & Pearson
Example: Using Comparator to
Sort Elements in a Set
The following example creates a tree set of geometric
objects
The geometric objects are sorted using the compare
method in the Comparator interface
26
(c) Paul Fodor (CS Stony Brook) & Pearson
import java.util.*;
27
(c) Paul Fodor (CS Stony Brook) & Pearson
import java.util.Comparator;
29
(c) Paul Fodor (CS Stony Brook) & Pearson
import java.util.*;
30
(c) Paul Fodor (CS Stony Brook) & Pearson
public static void main(String[] args) {
31
(c) Paul Fodor (CS Stony Brook) & Pearson
// Create a tree set, and test its performance
Collection<Integer> set3 = new TreeSet<>(list);
System.out.println("Member test time for tree set is " +
getTestTime(set3) + " milliseconds");
System.out.println("Remove element time for tree set is " +
getRemoveTime(set3) + " milliseconds\n");
32
(c) Paul Fodor (CS Stony Brook) & Pearson
Member test time for hash set is 20 milliseconds
Remove element time for hash set is 27 milliseconds
33
(c) Paul Fodor (CS Stony Brook) & Pearson
Case Study: Counting Keywords
An application that counts the number of the
keywords in a Java source file
For each word in a Java source file, we need to
determine whether the word is a keyword
To handle this efficiently, store all the keywords
in a HashSet and use the contains
method to test if a word is in the keyword set
34
(c) Paul Fodor (CS Stony Brook) & Pearson
import java.util.*;
import java.io.*;
Set<String> keywordSet =
new HashSet<>(Arrays.asList(keywordString));
int count = 0;
while (input.hasNext()) {
// read the file line by line
String line = input.next();
String[] words = line.split("\\W");
for(String word:words)
if (keywordSet.contains(word))
count++;
}
return count;
}
}
36
(c) Paul Fodor (CS Stony Brook) & Pearson
The Map Interface
The Map interface maps keys to elements
The keys are like indexes, but can be anything (not restricted to
integers
In List, the indexes are integer
In Map, the keys can be any objects
37
(c) Paul Fodor (CS Stony Brook) & Pearson
Map Interface and Class Hierarchy
There are three types of maps: HashMap, LinkedHashMap, and
TreeMap
The common features of these maps are defined in the Map interface
38
(c) Paul Fodor (CS Stony Brook) & Pearson
The Map Interface UML Diagram
The Map interface provides the methods for querying,
updating, and obtaining a collection of values and a set of keys
39
(c) Paul Fodor (CS Stony Brook) & Pearson
Concrete Map Classes
40
(c) Paul Fodor (CS Stony Brook) & Pearson
The Map Interface UML Diagram
You can obtain a set of the keys in the map using the
keySet() method
The entrySet() method returns a set of entries
The entries are instances of the Map.Entry interface,
where Entry is an inner interface for the Map interface
41
(c) Paul Fodor (CS Stony Brook) & Pearson
HashMap and TreeMap
The HashMap, LinkedHashMap and TreeMap classes are the
concrete implementations of the Map interface
The HashMap class is efficient for locating a value, inserting a
mapping, and deleting a mapping
LinkedHashMap extends HashMap with a linked-list
implementation that supports an ordering of the entries in the map
the entries in a LinkedHashMap can be retrieved either in the
order in which they were inserted into the map (known as the insertion
order) or in the order in which they were last accessed, from least
recently to most recently accessed (access order).
The TreeMap class, implementing SortedMap, is efficient for
traversing the keys in a sorted order using the Comparable
interface or the Comparator interface
42
(c) Paul Fodor (CS Stony Brook) & Pearson
Example: Using HashMap and
TreeMap
A hash map with the student’s name as its key and the
grade as its value
The program then creates a tree map from the hash map
and displays the entries in ascending order of the keys
Finally, the program creates a linked hash map with
access order, adds the same entries to the map, and
displays the entries
E.g., the entry with the key Lewis is last accessed, so it
is displayed last
43
(c) Paul Fodor (CS Stony Brook) & Pearson
import java.util.*;
// Create a LinkedHashMap
Map<String, Integer> linkedHashMap =
new LinkedHashMap<>(16, 0.75f, true);
linkedHashMap.put("Smith", 100);
linkedHashMap.put("Anderson", 91);
linkedHashMap.put("Lewis", 99);
44 linkedHashMap.put("Cook", 89);
(c) Paul Fodor (CS Stony Brook) & Pearson
// Display the grade for Lewis
System.out.println("\nThe grade for " + "Lewis is " +
linkedHashMap.get("Lewis"));
Output:
Display entries in HashMap
{Cook=89, Smith=100, Lewis=99, Anderson=91}
if (key.length() > 0) {
if (!map.containsKey(key)) {
map.put(key, 1);
}
else {
int value = map.get(key);
value++;
map.put(key, value);
}
}
47 }
(c) Paul Fodor (CS Stony Brook) & Pearson
// Display key and value for each entry
map.forEach((k, v) -> System.out.println(k + "\t" + v));
}
}
Output:
a 2
class 1
fun 1
good 3
have 3
morning 1
visit 1
48
(c) Paul Fodor (CS Stony Brook) & Pearson