https://www.linkedin.
com/in/artimumbarkar/
Arti Mumbarkar
Internal Implementation of Java collections
Data structures used:
• ArrayList: Object[]
• LinkedList: doubly-linked-list
• HashMap: (Java 6) Array of singly-linked-list, (Java 8) Array of Binary Tree
• LinkedHashMap: Array of singly-linked-list + doubly-linked-list
• TreeMap: Red-Black Tree
• HashSet: HashMap
• LinkedHashSet: LinkedHashMap
• TreeSet: TreeMap
• PriorityQueue: priority-heap
The internal working of Java Collections varies depending on the specific implementation class,
as different classes utilize different underlying data structures to achieve their respective
functionalities.
1. ArrayList:
• Internally uses a dynamic array (Object[]) to store elements.
• When the array reaches its capacity, a new, larger array is allocated (typically 1.5 times
the original size), and elements are copied to the new array.
• Provides O(1) average time complexity for element access and O(n) for insertions or
deletions in the middle, as it requires shifting elements.
Underlying Array:
An ArrayList internally uses an Object[] array to store its elements. This array is the core data
structure that holds the actual data.
Initial Capacity and Growth:
• When an ArrayList is created without specifying an initial capacity, it defaults to
a small initial capacity (typically 10).
• As elements are added to the ArrayList, if the number of elements exceeds the
current capacity of the underlying array, the ArrayList automatically resizes.
• The resizing process involves creating a new, larger array (typically 1.5 times the
original capacity in older Java versions, or a more flexible growth strategy in
newer versions) and then copying all the elements from the old array to the new,
larger array. The old array is then discarded.
Adding Elements:
• When an element is added using the add() method, the ArrayList first checks if
there is sufficient space in the underlying array.
• If space is available, the element is simply placed at the next available index.
• If the array is full, the resizing mechanism described above is triggered before
the element is added.
https://www.linkedin.com/in/artimumbarkar/
Arti Mumbarkar
Removing Elements:
• When an element is removed using the remove() method (either by index or by
object), the space created by the removal must be filled.
• This is achieved by shifting all subsequent elements to the left to fill the gap,
typically using System.arraycopy().
Accessing Elements:
• Accessing elements by index using get(int index) is highly efficient (O(1) time
complexity) because it directly accesses the element at the specified index in
the underlying array.
Capacity vs. Size:
• Size: Refers to the actual number of elements currently stored in the ArrayList.
• Capacity: Refers to the total size of the underlying internal array, representing
the maximum number of elements the ArrayList can hold without resizing.
2. LinkedList:
• Internally uses a doubly-linked list.
• Each element (node) stores the data and references to the next and previous nodes.
• Offers O(1) time complexity for insertions and deletions at the beginning or end, and
O(n) for access or operations in the middle, as it requires traversing the list.
Internal Structure:
• Nodes: Each element stored in a LinkedList is encapsulated within a Node object.
• Node Components: Each Node contains three primary components:
o item: The actual data element stored in the node.
o next: A reference (pointer) to the subsequent Node in the sequence.
o prev: A reference (pointer) to the preceding Node in the sequence.
• Head and Tail References: The LinkedList itself maintains references to the first (head)
and last (tail) Node objects in the list. The prev reference of the first node and
the next reference of the last node are null.
https://www.linkedin.com/in/artimumbarkar/
Arti Mumbarkar
How it Works:
Adding Elements:
• When an element is added, a new Node is created to hold the element.
• This new Node is then linked into the existing sequence by updating
the next and prev references of the surrounding nodes and the new node itself.
• Adding at the beginning (addFirst) or end (addLast) is efficient (O(1)) as it only
involves updating the head/tail references and a few node pointers. Adding at a
specific index involves traversing to that index first.
Removing Elements:
• When an element is removed, the next and prev references of the surrounding
nodes are adjusted to bypass the removed node, effectively removing it from the
sequence.
• Similar to adding, removing from the beginning (removeFirst) or end
(removeLast) is efficient (O(1)). Removing from a specific index requires
traversal.
Accessing Elements:
• Unlike ArrayList which provides direct access via index
(O(1)), LinkedList requires sequential traversal to access elements by index
(O(n)). To find the element at a given index, the list must be traversed from either
the head or the tail, whichever is closer to the target index.
Key Characteristics:
Dynamic Size:
LinkedList can grow or shrink dynamically as elements are added or removed, without the need
for resizing like arrays.
Efficient Insertions/Deletions at Ends:
Operations like addFirst(), addLast(), removeFirst(), and removeLast() are very efficient (O(1)).
Inefficient Random Access:
Accessing elements by index (get(index)) is less efficient compared to ArrayList due to the
sequential traversal requirement.
Higher Memory Overhead:
Each element requires additional memory to store the next and prev references within its Node,
leading to higher memory consumption compared to ArrayList for the same number of
elements.
https://www.linkedin.com/in/artimumbarkar/
Arti Mumbarkar
3. HashSet:
• Internally uses a HashMap to store elements.
• Each element added to a HashSet is stored as a key in the underlying HashMap, with a
dummy Object as the value.
• Relies on hashing for efficient storage and retrieval, ensuring uniqueness of elements.
Internal HashMap:
When a HashSet object is created, it internally instantiates a HashMap. This HashMap is the
actual data structure used to store the elements added to the HashSet.
Element Storage as Keys:
When an element is added to the HashSet using the add() method, it is stored as a key in the
internal HashMap.
Dummy Value:
Since HashMap requires both a key and a value, a constant, dummy Object (often
named PRESENT in the source code) is used as the value associated with each key in the
internal HashMap. This value serves no functional purpose other than fulfilling the HashMap's
requirement for a value.
Uniqueness Enforcement:
The HashMap's fundamental property of having unique keys is what guarantees the uniqueness
of elements in a HashSet. When add(element) is called:
• The hashCode() method of the element is called to determine its hash code.
• This hash code is used to find the appropriate "bucket" or index within
the HashMap's internal array.
• If a collision occurs (multiple elements hash to the same bucket),
the equals() method of the element is called to compare it with existing
elements in that bucket.
• If both the hash code and the equals() method indicate that the element is
already present, the add() method returns false, and the duplicate element is not
added. Otherwise, it is added as a new key-value pair.
Performance:
The use of HashMap and its hashing mechanism provides HashSet with average O(1) time
complexity for add(), remove(), and contains() operations, assuming a
good hashCode() implementation for the stored objects.
https://www.linkedin.com/in/artimumbarkar/
Arti Mumbarkar
4. HashMap:
• Internally uses a hash table data structure, which is an array of linked lists (or balanced
trees in Java 8+ for larger buckets).
• Uses a hash function to calculate an index (bucket) for a given key.
• Handles collisions by chaining (adding entries to the linked list/tree within the same
bucket).
• Dynamically resizes the underlying array when the load factor exceeds a threshold.
• Provides O(1) average time complexity for put, get, and remove operations.
Key Components:
Buckets (Table Array):
Internally, a HashMap uses an array of "buckets" to store its entries. This array is often referred
to as the table.
Nodes (Entries):
Each entry in the HashMap is stored as a Node object (or Entry in older Java
versions). A Node typically contains the key, value, hash of the key, and a reference to the next
node in case of collisions.
Internal Working:
Hashing:
When a key-value pair is added to the HashMap, the hashCode() method of the key object is
invoked to generate an integer hash code. This hash code is then used to determine the index of
the bucket where the entry should be stored within the table array.
Collision Handling:
It is possible for different keys to produce the same hash code or map to the same bucket
index. This is known as a "collision." To handle collisions, HashMap uses chaining:
• Linked Lists: If multiple entries map to the same bucket, they are stored in a
linked list at that bucket's index. New entries are typically added to the end of
the list.
• Treeification (Java 8+): To improve performance in scenarios with frequent
collisions, Java 8 introduced a mechanism where if the number of entries in a
single bucket's linked list exceeds a certain TREEIFY_THRESHOLD (default 8),
the linked list is converted into a balanced binary tree (Red-Black Tree). This
improves search time from O(n) in a linked list to O(log n) in a tree.
equals() Method:
When retrieving a value or handling collisions, the equals() method of the key is used to ensure
that the correct value associated with a specific key is retrieved or to differentiate between keys
that have the same hash code but are not equal.
https://www.linkedin.com/in/artimumbarkar/
Arti Mumbarkar
Resizing (Rehashing):
HashMap has a load factor (default 0.75), which determines when the map should increase its
capacity. When the number of entries exceeds the current capacity multiplied by the load
factor, the HashMap resizes itself by creating a new, larger table array (typically double the size)
and rehashes all existing entries into the new array. This process ensures efficient distribution of
entries and reduces the likelihood of excessive collisions.
Operations:
• put(key, value):
Calculates the hash of the key, determines the bucket index, and inserts the entry. Handles
collisions by chaining or treeification.
• get(key):
Calculates the hash of the key, determines the bucket index, and then traverses the linked list or
tree at that bucket to find the matching key using equals().
• remove(key):
Similar to get, locates the entry and removes it from the linked list or tree.
5. PriorityQueue:
• Internally uses a min-heap (or max-heap depending on the comparator) data structure,
typically implemented using an array.
• Elements are arranged based on their natural ordering or a custom Comparator.
• Operations like add and poll maintain the heap property, ensuring that the smallest (or
largest) element is always at the root.
Here's a breakdown of its internal working:
Min-Heap Representation:
The PriorityQueue maintains the heap property where the value of each parent node is less than
or equal to the value of its child nodes. This ensures that the root of the heap (the first element
in the underlying array) always holds the element with the highest priority (lowest value).
Array Storage:
The heap is stored in a simple array. The relationship between parent and child nodes is
determined by their indices in the array. For example, for a node at index i, its left child is at
2*i + 1 and its right child is at 2*i + 2. Its parent is at (i-1) / 2.
https://www.linkedin.com/in/artimumbarkar/
Arti Mumbarkar
Insertion (add/offer):
• When an element is added, it is initially placed at the end of the array.
• To maintain the min-heap property, a "heapify-up" operation is performed. The
newly added element is repeatedly compared with its parent. If the element is
smaller than its parent, they are swapped. This process continues until the
element is in its correct position relative to its parent, or it becomes the root.
Extraction (poll/remove):
• To remove the highest priority element, the root of the heap (the first element in
the array) is retrieved.
• The last element in the array is then moved to the root position.
• A "heapify-down" operation is performed. The new root is compared with its
children. If it's larger than either child, it's swapped with the smaller child. This
process continues down the heap until the element is in its correct position, or it
has no children.
Dynamic Resizing:
The underlying array automatically grows in size as needed to accommodate new elements,
preventing overflow. Conversely, it may also shrink to conserve memory when elements are
removed.
Ordering:
By default, PriorityQueue orders elements based on their natural ordering (using
the Comparable interface). However, a custom Comparator can be provided during
construction to define a different ordering, such as descending order or a specific custom
priority logic.
6. TreeMap:
Java's TreeMap internally utilizes a Red-Black Tree, which is a self-balancing binary search
tree. This underlying data structure ensures that TreeMap provides efficient operations
(insertion, deletion, retrieval) with a time complexity of O(log n), even in the worst-case
scenarios.
https://www.linkedin.com/in/artimumbarkar/
Arti Mumbarkar
Here's a breakdown of its internal working:
Red-Black Tree Implementation:
TreeMap does not use hashing or an array-based structure like HashMap. Instead, it builds a
Red-Black Tree, where each key-value pair is stored as a node.
Self-Balancing Mechanism:
The "Red-Black" designation refers to the color property assigned to each node (either red or
black). These colors, along with specific rules governing their arrangement, allow the tree to
automatically rebalance itself after insertions or deletions. This balancing prevents the tree
from becoming skewed, which would degrade performance to O(n) in the worst case for a
regular binary search tree.
• Sorted Ordering:
Keys in a TreeMap are always stored in sorted order. This ordering can be:
• Natural Ordering: If no custom Comparator is provided, keys are sorted based
on their natural ordering (e.g., numerically for integers, alphabetically for
strings). This requires keys to implement the Comparable interface.
• Custom Ordering: A Comparator can be supplied during TreeMap creation to
define a specific sorting logic for the keys.
• Node Structure:
Each node in the Red-Black Tree within TreeMap typically contains:
• The key of the entry.
• The value associated with the key.
• A color attribute (red or black).
• References to its parent, left child, and right child nodes.
• Operation Efficiency:
The self-balancing nature of the Red-Black Tree ensures that operations like put(), get(),
and remove() maintain their O(log n) time complexity because the maximum height of the tree
remains logarithmic to the number of nodes.
https://www.linkedin.com/in/artimumbarkar/
Arti Mumbarkar
7. LinkedHashSet:
LinkedHashSet in Java provides a set implementation that maintains insertion order while
ensuring uniqueness of elements. Its internal working combines the features of a hash table for
fast lookups and a doubly linked list for order preservation.
Internal Structure:
Underlying LinkedHashMap:
LinkedHashSet internally uses a LinkedHashMap. When a LinkedHashSet object is created, it
instantiates a LinkedHashMap. The elements added to the LinkedHashSet become keys in this
internal LinkedHashMap, and a dummy Object (a constant PRESENT object) serves as the value
for all entries.
Hash Table:
The LinkedHashMap provides the hash table functionality, allowing for efficient storage and
retrieval of elements based on their hash codes. This ensures that only unique elements are
stored, as duplicate keys in a HashMap are not allowed.
Doubly Linked List:
The LinkedHashMap also maintains a doubly linked list that links all its entries in the order they
were inserted. This linked list is distinct from the hash table structure and is responsible for
preserving the insertion order. Each entry in the linked list holds references to its previous and
next elements, forming a chain that reflects the order of insertion.
How Operations Work:
Adding Elements:
When an element is added to a LinkedHashSet, it is internally added as a key to the
underlying LinkedHashMap with a dummy value. The LinkedHashMap then handles the hashing
and storage in the hash table, and simultaneously, a new entry is created in its internal doubly
linked list, maintaining the insertion order.
Retrieving Elements:
Iteration over a LinkedHashSet traverses the elements in the order they were inserted by
following the links in the internal doubly linked list of the LinkedHashMap.
Uniqueness:
The hash table component ensures that only unique elements are stored, as attempts to add a
duplicate element will simply update the existing entry in the LinkedHashMap (though the
dummy value remains the same), without adding a new entry to the linked list.
https://www.linkedin.com/in/artimumbarkar/
Arti Mumbarkar
Key Differences from HashSet:
The primary difference between LinkedHashSet and HashSet lies in their internal map
implementation. HashSet uses a HashMap internally, which does not guarantee insertion order,
while LinkedHashSet uses a LinkedHashMap to maintain the insertion order. This order
preservation comes at a slight performance overhead compared to HashSet for basic
operations, but benefits iteration performance when order is important.
8. TreeSet:
The Java TreeSet internally utilizes a TreeMap to store and manage its elements. This
underlying TreeMap is crucial to TreeSet's functionality, particularly its ability to maintain
elements in a sorted order and prevent duplicates.
Key aspects of TreeSet's internal working:
Underlying TreeMap:
When a TreeSet is instantiated, it internally creates a TreeMap instance. All operations
performed on the TreeSet (e.g., add, remove, contains) are delegated to this internal TreeMap.
Self-Balancing Binary Search Tree (Red-Black Tree):
The TreeMap itself is implemented using a self-balancing binary search tree, specifically a Red-
Black Tree. This data structure ensures that the tree remains balanced during insertions and
deletions, resulting in efficient logarithmic time complexity (O(log N)) for operations like adding,
removing, and searching elements.
Element Storage:
When an element is added to a TreeSet, it is stored as a key in the internal TreeMap, with a
dummy Object as its value. This leverages TreeMap's key-based sorting and uniqueness
properties.
Sorting:
TreeSet maintains elements in a sorted order. By default, it uses the natural ordering of the
elements (if they implement the Comparable interface). Alternatively, a
custom Comparator can be provided during TreeSet creation to define a specific sorting order.
No Duplicates:
Since TreeSet uses a TreeMap, which does not allow duplicate keys, TreeSet inherently prevents
the storage of duplicate elements. If an attempt is made to add an element that already exists,
the operation will not modify the set.
NavigableSet and SortedSet Implementation:
TreeSet implements the NavigableSet interface, which extends SortedSet. This provides
methods for navigating the sorted set (e.g., floor(), ceiling(), lower(), higher()) and retrieving
subsets based on ranges (e.g., subSet(), headSet(), tailSet()).
https://www.linkedin.com/in/artimumbarkar/
Arti Mumbarkar
9. LinkedHashMap:
LinkedHashMap in Java extends HashMap and provides the additional functionality of
maintaining the order of elements. Its internal working combines the principles of a hash table
with a doubly linked list.
1. Hash Table Foundation:
• Like HashMap, LinkedHashMap uses a hash table for efficient storage and retrieval of
key-value pairs.
• When a key-value pair is inserted, the hashCode() of the key is used to determine the
bucket (array index) where the entry will be stored.
• Collisions (multiple keys hashing to the same bucket) are handled by storing entries in a
linked list or, in Java 8 and later, a balanced tree structure (TreeNode) within that bucket
if the number of entries exceeds a certain threshold.
2. Doubly Linked List for Ordering:
• The key distinction of LinkedHashMap lies in its use of a doubly linked list that connects
all entries in the map.
• Each Entry object within the LinkedHashMap (which extends HashMap.Node) contains
two additional references: before and after.
• These before and after pointers link entries to their preceding and succeeding elements
in the linked list, respectively.
• This linked list can maintain either:
o Insertion Order (default): Entries are linked in the order they were initially
inserted into the map.
o Access Order: If configured in the constructor, entries are reordered based on
their access (get or put operations), moving recently accessed elements to the
end of the list.
3. Maintaining Order during Operations:
• put():
When a new entry is added, it is appended to the end of the doubly linked list, and
its before and after pointers are updated to link it correctly. If an existing key is updated, the
associated entry might be moved in the access-ordered list.
• get():
In an access-ordered LinkedHashMap, accessing an entry (using get()) causes it to be moved to
the end of the linked list, marking it as recently accessed.
• remove():
When an entry is removed, its before and after pointers are adjusted to bypass the removed
entry, effectively removing it from the linked list while also removing it from the hash table.
https://www.linkedin.com/in/artimumbarkar/
Arti Mumbarkar
4. removeEldestEntry() for LRU Cache:
• LinkedHashMap provides a protected method removeEldestEntry(Map.Entry eldest) that
can be overridden.
• This method is invoked after an entry is added, allowing custom logic to determine
whether the "eldest" (oldest in insertion or access order) entry should be removed,
making it suitable for implementing Least Recently Used (LRU) caches.
General Principles:
• Hashing:
Classes like HashSet and HashMap rely on the hashCode() and equals() methods of objects for
efficient storage and retrieval.
• Dynamic Resizing:
Many collection implementations, such as ArrayList and HashMap, dynamically adjust their
underlying data structures' capacity to accommodate new elements.
• Iterators:
Collections provide Iterators (and Spliterators in Java 8+) for traversing and accessing elements.
• Synchronization:
Some legacy collections like Vector and Hashtable are synchronized, providing thread safety,
while newer collections often require external synchronization or use concurrent alternatives
like ConcurrentHashMap.
https://www.linkedin.com/in/artimumbarkar/