PYTHON DATA STRUCTURES AND COLLECTIONS
This comprehensive guide covers built-in Python data structures and additional collection types,
including implementation details and time complexities.
TABLE OF CONTENTS:
1. Lists
2. Tuples
3. Dictionaries
4. Sets
5. Strings (as collections)
6. Collections module
Counter
defaultdict
OrderedDict
deque
namedtuple
7. Heapq module (Priority Queues)
8. Custom Data Structures
Linked Lists
Trees
1. LISTS
Lists are ordered, mutable collections that can contain elements of different types.
They are implemented as dynamic arrays.
Time Complexities:
Access by index: O(1)
Insertion/deletion at the end: Amortized O(1)
Insertion/deletion at beginning or middle: O(n)
Search: O(n)
Length: O(1)
Creating lists
python
empty_list = []
numbers = [1, 2, 3, 4, 5]
mixed_list = [1, "hello", 3.14, True]
Common operations
python
numbers.append(6) # Add to end: O(1) amortized
numbers.insert(0, 0) # Insert at specific position: O(n)
popped_item = numbers.pop() # Remove and return last item: O(1)
numbers.pop(0) # Remove from beginning: O(n)
numbers.remove(3) # Remove first occurrence of value: O(n)
numbers.sort() # In-place sort: O(n log n)
sorted_numbers = sorted(numbers) # Return sorted copy: O(n log n)
numbers.reverse() # Reverse in-place: O(n)
found_index = numbers.index(5) # Find index of first occurrence: O(n)
count = numbers.count(2) # Count occurrences: O(n)
List comprehensions
python
squares = [x**2 for x in range(10)]
even_squares = [x**2 for x in range(10) if x % 2 == 0]
Slicing: O(k) where k is the slice length
python
first_three = numbers[:3]
last_three = numbers[-3:]
every_second = numbers[::2]
2. TUPLES
Tuples are ordered, immutable collections that can contain elements of different types.
Similar to lists but cannot be modified after creation.
Time Complexities:
Access by index: O(1)
Search: O(n)
Length: O(1)
Creating tuples
python
empty_tuple = ()
single_item = (1,) # Note: comma is needed for single-item tuples
coordinates = (10, 20)
mixed_tuple = (1, "hello", 3.14)
Tuple unpacking
python
x, y = coordinates
a, b, c = mixed_tuple
Common operations
python
length = len(coordinates)
concatenated = coordinates + (30, 40) # Creates new tuple
repeated = coordinates * 3 # Creates new tuple
element_index = mixed_tuple.index("hello") # O(n)
element_count = mixed_tuple.count(1) # O(n)
3. DICTIONARIES
Dictionaries are mutable, unordered collections of key-value pairs.
Implemented as hash tables in Python.
Time Complexities (average case):
Access/insert/delete by key: O(1)
Search by value: O(n)
Length: O(1)
Worst case for all operations becomes O(n) if many hash collisions occur.
Creating dictionaries
python
empty_dict = {}
student = {"name": "Alice", "age": 20, "grades": [85, 90, 92]}
Accessing and modifying
python
name = student["name"] # O(1) average case
student["age"] = 21 # O(1) average case
student["major"] = "Computer Science" # O(1) average case
student.pop("minor", None) # safely remove "minor" if no minor return none
Safely accessing (to avoid KeyError)
python
age = student.get("age", 0) # Returns 0 if "age" not found
major = student.get("major", "Undeclared")
Because the views are live, any dict mutation shows up immediately
Dictionary methods
python
keys = student.keys() # Returns view of keys & Updates in real time if values change.
values = student.values() # Returns view of values
items = student.items() # Returns view of (key, value) tuples
major = student.pop("major", None) # returns None instead of error if missing & Removes key from the dict and returns i
default_value = student.setdefault("gpa", 4.0) # Set default if key doesn't exist
if "name" in student.keys():
print("Has a name field")
first_key = list(student.keys())[0]
first_value = list(student.values())[0]
for k, v in student.items():
print(f"{k} → {v}")
Dictionary comprehensions
python
squared_numbers = {x: x**2 for x in range(5)}
4. SETS
Sets are unordered collections of unique elements.
Implemented as hash tables with only keys (no values).
Time Complexities (average case):
Add/remove/check membership: O(1)
Union/intersection/difference: O(min(len(s1), len(s2)))
Length: O(1)
Worst case becomes O(n) if many hash collisions occur.
Creating sets
python
empty_set = set() # Not {}, which creates an empty dictionary
fruits = {"apple", "banana", "orange"}
numbers_set = set([1, 2, 3, 2, 1]) # Creates {1, 2, 3}
Common operations
python
fruits.add("pear") # O(1) average case
fruits.remove("apple") # O(1) average case, raises KeyError if not found
popped_item = fruits.pop() # Remove and return arbitrary element
is_member = "banana" in fruits # O(1) average case
Set operations
python
a = {1, 2, 3, 4}
b = {3, 4, 5, 6}
union = a | b # or a.union(b)
intersection = a & b # or a.intersection(b)
difference = a - b # or a.difference(b)
Set comprehensions
python
even_set = {x for x in range(10) if x % 2 == 0}
6. COLLECTIONS MODULE
The collections module provides specialized container data types that
extend the functionality of built-in types.
python
from collections import Counter, deque
Counter
Counter is a subclass of dict for counting hashable objects.
python
# Creating counters
words = ["apple", "banana", "apple", "orange", "banana", "apple"]
word_counts = Counter(words) # Counter({'apple': 3, 'banana': 2, 'orange': 1})
most_common = word_counts.most_common(2) # [('apple', 5), ('banana', 2)]
total = sum(word_counts.values()) # Total count
deque (Double-ended queue)
deque is an optimized list-like container with fast appends and pops
on both ends.
Time Complexities:
Append/pop from either end: O(1)
Access by index: O(1) from ends, O(n) from middle
Length: O(1)
python
# Creating deques
empty_deque = deque()
numbers_deque = deque([1, 2, 3, 4, 5])
limited_deque = deque([1, 2, 3], maxlen=5) # Max size enforced
# Common operations
numbers_deque.append(6) # Add to right: O(1)
numbers_deque.appendleft(0) # Add to left: O(1)
right_item = numbers_deque.pop() # Remove from right: O(1)
left_item = numbers_deque.popleft() # Remove from left: O(1)
# dq[0] and dq[-1] are O(1) - very fast for ends
# dq[middle] is O(n) - slower for middle elements
MATH MODULE
python
import math
x=16
print("sqrt:",math.sqrt(x))
print("pow:",math.pow(2,5))
print("log_e:",math.log(100))
print("log_base_2:",math.log(8,2))
a,b=48,18
print("gcd:",math.gcd(a,b))
math.lcm(1,2)
print("factorial:",math.factorial(5))
lst=[3,1,4,1,5,9]
print("min:",min(lst))
print("max:",max(lst))
print("len:",len(lst))
print("sum:",sum(lst))
PYTHON HEAPS - CLEAN & CONCISE GUIDE
python
import heapq
BASIC HEAP OPERATIONS
python
# Min heap (default)
heap = []
heapq.heappush(heap, 5)
heapq.heappush(heap, 3)
heapq.heappush(heap, 8)
print(heap) # [3, 5, 8]
# Get minimum (peek)
min_val = heap[0] # 3
# Extract minimum
min_val = heapq.heappop(heap) # Returns 3
# Convert list to heap
nums = [5, 3, 8, 1, 9]
heapq.heapify(nums) # O(n) time
print(nums) # [1, 3, 8, 5, 9]
MAX HEAP (using negatives)
python
# Max heap - negate values
max_heap = []
heapq.heappush(max_heap, -5)
heapq.heappush(max_heap, -3)
heapq.heappush(max_heap, -8)
# Extract max (remember to negate back)
max_val = -heapq.heappop(max_heap) # Returns 8
# Convert to max heap
nums = [5, 3, 8, 1, 9]
max_heap = [-x for x in nums]
heapq.heapify(max_heap)
CUSTOM OBJECTS IN HEAPS
WHY WE NEED CUSTOM METHODS
python
# Problem: Python doesn't know how to compare custom objects
class Student:
def __init__(self, name, marks):
self.name = name
self.marks = marks
# This will FAIL:
# heap = []
# heapq.heappush(heap, Student("Alice", 85))
# heapq.heappush(heap, Student("Bob", 90))
# TypeError: '<' not supported between instances
# Solution: Define __lt__ method to tell Python how to compare
BASIC CUSTOM OBJECT
python
class Task:
def __init__(self, priority, name):
self.priority = priority
self.name = name
def __lt__(self, other):
# This method is called when heap compares two Task objects
# self < other means "should self come before other in heap?"
# Lower priority number = higher importance
return self.priority < other.priority
def __repr__(self):
# This makes print() show readable output instead of <object at 0x...>
return f"Task({self.priority}, {self.name})"
# Usage
heap = []
heapq.heappush(heap, Task(3, 'low'))
heapq.heappush(heap, Task(1, 'high')) # Goes to top because 1 < 3
heapq.heappush(heap, Task(2, 'medium'))
print(heap) # [Task(1, 'high'), Task(3, 'low'), Task(2, 'medium')]
# Pop returns the "smallest" (highest priority)
task = heapq.heappop(heap) # Task(1, 'high')
WHAT IF WE CHANGE THE COMPARISON?
python
class TaskReversed:
def __init__(self, priority, name):
self.priority = priority
self.name = name
def __lt__(self, other):
# Changed: Higher number = higher priority
return self.priority > other.priority # Notice the > instead of <
def __repr__(self):
return f"TaskReversed({self.priority}, {self.name})"
# Now higher numbers come first
heap2 = []
heapq.heappush(heap2, TaskReversed(3, 'high'))
heapq.heappush(heap2, TaskReversed(1, 'low')) # Goes to bottom because 1 > 3 is False
heapq.heappush(heap2, TaskReversed(2, 'medium'))
print(heap2) # [TaskReversed(3, 'high'), TaskReversed(1, 'low'), TaskReversed(2, 'medium')]
MULTI-LEVEL SORTING (TUPLE-LIKE)
python
class Person:
def __init__(self, name, marks, age, salary):
self.name = name
self.marks = marks
self.age = age
self.salary = salary
def __lt__(self, other):
# Sort by: marks (ascending), then age (ascending), then salary (ascending)
# If marks are different, compare by marks
if self.marks != other.marks:
return self.marks < other.marks
# If marks are same, compare by age
if self.age != other.age:
return self.age < other.age
# If both marks and age are same, compare by salary
return self.salary < other.salary
def __repr__(self):
# Return as tuple for easy reading
return f"({self.marks}, {self.age}, {self.salary})"
# Test multi-level sorting
people = [
Person("Alice", 85, 22, 50000),
Person("Bob", 85, 20, 55000), # Same marks, younger age
Person("Charlie", 90, 25, 60000), # Higher marks
Person("David", 85, 20, 45000), # Same marks, same age, lower salary
]
heap = []
for person in people:
heapq.heappush(heap, person)
print("Sorted order:")
while heap:
person = heapq.heappop(heap)
print(person)
# Output:
# (85, 20, 45000) <- David: lowest marks, then age, then salary
# (85, 20, 55000) <- Bob: same marks/age, higher salary
# (85, 22, 50000) <- Alice: same marks, older age
# (90, 25, 60000) <- Charlie: highest marks
ALTERNATIVE: USING TUPLES DIRECTLY
python
# Instead of custom __lt__, you can use tuples
# Tuples compare element by element automatically
students = [
(85, 22, 50000, "Alice"), # (marks, age, salary, name)
(85, 20, 55000, "Bob"),
(90, 25, 60000, "Charlie"),
(85, 20, 45000, "David"),
]
heap = []
for student in students:
heapq.heappush(heap, student)
print("\nUsing tuples:")
while heap:
marks, age, salary, name = heapq.heappop(heap)
print(f"({marks}, {age}, {salary}) - {name}")
SUMMARY
1. Custom objects need lt method for heap comparison
2. repr method makes debugging easier (optional but recommended)
3. lt returns True if self should come before other in heap
4. For multi-level sorting, use if-elif-else chain
5. Alternative: Use tuples instead of custom objects
6. Min heap: smaller values come first
7. To reverse order, change < to > in lt method
COMMON LEETCODE PROBLEMS
1. Kth Largest Element in Stream
python
class KthLargest:
def __init__(self, k, nums):
self.k = k
self.heap = nums
heapq.heapify(self.heap)
# Keep only k largest
while len(self.heap) > k:
heapq.heappop(self.heap)
def add(self, val):
heapq.heappush(self.heap, val)
if len(self.heap) > self.k:
heapq.heappop(self.heap)
return self.heap[0]
2. Meeting Rooms II
python
def min_meeting_rooms(intervals):
if not intervals:
return 0
intervals.sort() # Sort by start time
heap = [] # Track end times
for start, end in intervals:
# Remove finished meetings
if heap and heap[0] <= start:
heapq.heappop(heap)
heapq.heappush(heap, end)
return len(heap)
3. Task Scheduler with Cooldown
python
def task_scheduler(tasks, n):
from collections import Counter
count = Counter(tasks)
# Max heap of frequencies
heap = [-freq for freq in count.values()]
heapq.heapify(heap)
time = 0
while heap:
cycle = []
# Process n+1 tasks (or until heap empty)
for _ in range(n + 1):
if heap:
cycle.append(heapq.heappop(heap))
# Put back tasks that still have work
for freq in cycle:
if freq < -1: # Still has work (remember it's negated)
heapq.heappush(heap, freq + 1)
# Add time for this cycle
time += len(cycle) if not heap else n + 1
return time
KEY POINTS TO REMEMBER
1. Python's heapq is MIN heap by default
2. For MAX heap, negate values: -x
3. Custom objects need lt method
4. Heap[0] is always the minimum (or maximum for max heap)
5. Time complexity: Insert/Extract O(log n), Peek O(1)
6. Space complexity: O(n)
8. CUSTOM DATA STRUCTURES
Linked List
Linked lists store elements in nodes that point to the next node.
Time Complexities:
Access: O(n)
Insertion/deletion at beginning: O(1)
Insertion/deletion at end: O(n) (without tail pointer), O(1) (with tail pointer)
Search: O(n)
python
class Node:
def __init__(self, data):
self.data = data
self.next = None
class LinkedList:
def __init__(self):
self.head = None
self.tail = None # Optional: for O(1) append operations
self.size = 0
def prepend(self, data):
"""Add element to beginning of list - O(1)"""
new_node = Node(data)
if not self.head:
self.head = new_node
self.tail = new_node
else:
new_node.next = self.head
self.head = new_node
self.size += 1
def append(self, data):
"""Add element to end of list - O(1) with tail pointer"""
new_node = Node(data)
if not self.head:
self.head = new_node
self.tail = new_node
else:
self.tail.next = new_node
self.tail = new_node
self.size += 1
def insert_after(self, prev_node, data):
"""Insert after a given node - O(1)"""
if not prev_node:
print("Previous node cannot be None")
return
new_node = Node(data)
new_node.next = prev_node.next
prev_node.next = new_node
if prev_node == self.tail:
self.tail = new_node
self.size += 1
def delete_node(self, key):
"""Delete first occurrence of node with data = key - O(n)"""
if not self.head:
return
# If head node holds the key
if self.head.data == key:
self.head = self.head.next
if not self.head:
self.tail = None
self.size -= 1
return
# Search for the key and keep track of previous node
current = self.head
while current.next and current.next.data != key:
current = current.next
# If key was found
if current.next:
if current.next == self.tail:
self.tail = current
current.next = current.next.next
self.size -= 1
def search(self, key):
"""Search for node with data = key - O(n)"""
current = self.head
while current:
if current.data == key:
return True
current = current.next
return False
def get_at_index(self, index):
"""Get data at specified index - O(n)"""
if index < 0 or index >= self.size:
raise IndexError("Index out of range")
current = self.head
for i in range(index):
current = current.next
return current.data
def print_list(self):
"""Print all elements in the list - O(n)"""
current = self.head
while current:
print(current.data, end=" -> ")
current = current.next
print("None")
# Example usage
linked_list = LinkedList()
linked_list.append(1)
linked_list.append(2)
linked_list.prepend(0)
linked_list.print_list() # 0 -> 1 -> 2 -> None
Binary Search Tree
Binary Search Trees (BST) store data in a hierarchical structure
where each node has at most two children.
Time Complexities (balanced tree):
Search/Insert/Delete: O(log n)
Worst case (unbalanced): O(n)
python
class TreeNode:
def __init__(self, key):
self.key = key
self.left = None
self.right = None
class BinarySearchTree:
def __init__(self):
self.root = None
def insert(self, key):
"""Insert a new key in BST - O(log n) average, O(n) worst case"""
self.root = self._insert_recursive(self.root, key)
def _insert_recursive(self, root, key):
# If tree is empty, return a new node
if not root:
return TreeNode(key)
# Otherwise, recur down the tree
if key < root.key:
root.left = self._insert_recursive(root.left, key)
else:
root.right = self._insert_recursive(root.right, key)
# Return the unchanged node pointer
return root
def search(self, key):
"""Search for a key in BST - O(log n) average, O(n) worst case"""
return self._search_recursive(self.root, key)
def _search_recursive(self, root, key):
# Base case: root is None or key is at root
if not root or root.key == key:
return root
# Key is greater than root's key
if key > root.key:
return self._search_recursive(root.right, key)
# Key is smaller than root's key
return self._search_recursive(root.left, key)
def delete(self, key):
"""Delete a key from BST - O(log n) average, O(n) worst case"""
self.root = self._delete_recursive(self.root, key)
def _delete_recursive(self, root, key):
# Base case
if not root:
return root
# Recursive calls for ancestors of node to be deleted
if key < root.key:
root.left = self._delete_recursive(root.left, key)
elif key > root.key:
root.right = self._delete_recursive(root.right, key)
else:
# Node with only one child or no child
if not root.left:
return root.right
elif not root.right:
return root.left
# Node with two children: Get the inorder successor
# (smallest in the right subtree)
root.key = self._min_value_node(root.right).key
# Delete the inorder successor
root.right = self._delete_recursive(root.right, root.key)
return root
def _min_value_node(self, node):
"""Find the node with minimum key value in a subtree"""
current = node
while current.left:
current = current.left
return current
def inorder_traversal(self):
"""Inorder traversal of BST - O(n)"""
result = []
self._inorder_recursive(self.root, result)
return result
def _inorder_recursive(self, root, result):
if root:
self._inorder_recursive(root.left, result)
result.append(root.key)
self._inorder_recursive(root.right, result)
def preorder_traversal(self):
"""Preorder traversal of BST - O(n)"""
result = []
self._preorder_recursive(self.root, result)
return result
def _preorder_recursive(self, root, result):
if root:
result.append(root.key)
self._preorder_recursive(root.left, result)
self._preorder_recursive(root.right, result)
def postorder_traversal(self):
"""Postorder traversal of BST - O(n)"""
result = []
self._postorder_recursive(self.root, result)
return result
def _postorder_recursive(self, root, result):
if root:
self._postorder_recursive(root.left, result)
self._postorder_recursive(root.right, result)
result.append(root.key)
def height(self):
"""Calculate height of BST - O(n)"""
return self._height_recursive(self.root)
def _height_recursive(self, node):
if not node:
return -1
left_height = self._height_recursive(node.left)
right_height = self._height_recursive(node.right)
return max(left_height, right_height) + 1
# Example usage
bst = BinarySearchTree()
keys = [50, 30, 70, 20, 40, 60, 80]
for key in keys:
bst.insert(key)
print("Inorder traversal:", bst.inorder_traversal()) # [20, 30, 40, 50, 60, 70, 80]
print("Height of tree:", bst.height()) # 2
bst.delete(20)
print("After deleting 20:", bst.inorder_traversal()) # [30, 40, 50, 60, 70, 80]