0% found this document useful (0 votes)
5 views24 pages

Python Ds

This document is a comprehensive guide on Python data structures and collections, covering built-in types like lists, tuples, dictionaries, and sets, as well as specialized types from the collections module and heapq for priority queues. It details their implementation, time complexities, common operations, and provides examples in Python. Additionally, it discusses custom data structures and includes practical applications such as Kth largest element in a stream and meeting room scheduling.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views24 pages

Python Ds

This document is a comprehensive guide on Python data structures and collections, covering built-in types like lists, tuples, dictionaries, and sets, as well as specialized types from the collections module and heapq for priority queues. It details their implementation, time complexities, common operations, and provides examples in Python. Additionally, it discusses custom data structures and includes practical applications such as Kth largest element in a stream and meeting room scheduling.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

PYTHON DATA STRUCTURES AND COLLECTIONS

This comprehensive guide covers built-in Python data structures and additional collection types,
including implementation details and time complexities.

TABLE OF CONTENTS:
1. Lists

2. Tuples

3. Dictionaries

4. Sets

5. Strings (as collections)


6. Collections module
Counter

defaultdict

OrderedDict
deque

namedtuple

7. Heapq module (Priority Queues)

8. Custom Data Structures


Linked Lists
Trees

1. LISTS
Lists are ordered, mutable collections that can contain elements of different types.
They are implemented as dynamic arrays.

Time Complexities:

Access by index: O(1)


Insertion/deletion at the end: Amortized O(1)

Insertion/deletion at beginning or middle: O(n)


Search: O(n)

Length: O(1)

Creating lists

python

empty_list = []
numbers = [1, 2, 3, 4, 5]
mixed_list = [1, "hello", 3.14, True]

Common operations

python

numbers.append(6) # Add to end: O(1) amortized


numbers.insert(0, 0) # Insert at specific position: O(n)
popped_item = numbers.pop() # Remove and return last item: O(1)
numbers.pop(0) # Remove from beginning: O(n)
numbers.remove(3) # Remove first occurrence of value: O(n)
numbers.sort() # In-place sort: O(n log n)
sorted_numbers = sorted(numbers) # Return sorted copy: O(n log n)
numbers.reverse() # Reverse in-place: O(n)
found_index = numbers.index(5) # Find index of first occurrence: O(n)
count = numbers.count(2) # Count occurrences: O(n)

List comprehensions

python

squares = [x**2 for x in range(10)]


even_squares = [x**2 for x in range(10) if x % 2 == 0]

Slicing: O(k) where k is the slice length

python

first_three = numbers[:3]
last_three = numbers[-3:]
every_second = numbers[::2]
2. TUPLES
Tuples are ordered, immutable collections that can contain elements of different types.
Similar to lists but cannot be modified after creation.

Time Complexities:

Access by index: O(1)

Search: O(n)
Length: O(1)

Creating tuples

python

empty_tuple = ()
single_item = (1,) # Note: comma is needed for single-item tuples
coordinates = (10, 20)
mixed_tuple = (1, "hello", 3.14)

Tuple unpacking

python

x, y = coordinates
a, b, c = mixed_tuple

Common operations

python

length = len(coordinates)
concatenated = coordinates + (30, 40) # Creates new tuple
repeated = coordinates * 3 # Creates new tuple
element_index = mixed_tuple.index("hello") # O(n)
element_count = mixed_tuple.count(1) # O(n)

3. DICTIONARIES
Dictionaries are mutable, unordered collections of key-value pairs.
Implemented as hash tables in Python.

Time Complexities (average case):

Access/insert/delete by key: O(1)

Search by value: O(n)

Length: O(1)

Worst case for all operations becomes O(n) if many hash collisions occur.

Creating dictionaries

python

empty_dict = {}
student = {"name": "Alice", "age": 20, "grades": [85, 90, 92]}

Accessing and modifying

python

name = student["name"] # O(1) average case


student["age"] = 21 # O(1) average case
student["major"] = "Computer Science" # O(1) average case
student.pop("minor", None) # safely remove "minor" if no minor return none

Safely accessing (to avoid KeyError)

python

age = student.get("age", 0) # Returns 0 if "age" not found


major = student.get("major", "Undeclared")

Because the views are live, any dict mutation shows up immediately

Dictionary methods

python
keys = student.keys() # Returns view of keys & Updates in real time if values change.
values = student.values() # Returns view of values
items = student.items() # Returns view of (key, value) tuples
major = student.pop("major", None) # returns None instead of error if missing & Removes key from the dict and returns i
default_value = student.setdefault("gpa", 4.0) # Set default if key doesn't exist

if "name" in student.keys():
print("Has a name field")

first_key = list(student.keys())[0]
first_value = list(student.values())[0]

for k, v in student.items():
print(f"{k} → {v}")

 

Dictionary comprehensions

python

squared_numbers = {x: x**2 for x in range(5)}

4. SETS
Sets are unordered collections of unique elements.
Implemented as hash tables with only keys (no values).

Time Complexities (average case):

Add/remove/check membership: O(1)

Union/intersection/difference: O(min(len(s1), len(s2)))

Length: O(1)

Worst case becomes O(n) if many hash collisions occur.

Creating sets

python
empty_set = set() # Not {}, which creates an empty dictionary
fruits = {"apple", "banana", "orange"}
numbers_set = set([1, 2, 3, 2, 1]) # Creates {1, 2, 3}

Common operations

python

fruits.add("pear") # O(1) average case


fruits.remove("apple") # O(1) average case, raises KeyError if not found
popped_item = fruits.pop() # Remove and return arbitrary element
is_member = "banana" in fruits # O(1) average case

Set operations

python

a = {1, 2, 3, 4}
b = {3, 4, 5, 6}
union = a | b # or a.union(b)
intersection = a & b # or a.intersection(b)
difference = a - b # or a.difference(b)

Set comprehensions

python

even_set = {x for x in range(10) if x % 2 == 0}

6. COLLECTIONS MODULE
The collections module provides specialized container data types that
extend the functionality of built-in types.

python

from collections import Counter, deque

Counter
Counter is a subclass of dict for counting hashable objects.

python

# Creating counters
words = ["apple", "banana", "apple", "orange", "banana", "apple"]
word_counts = Counter(words) # Counter({'apple': 3, 'banana': 2, 'orange': 1})
most_common = word_counts.most_common(2) # [('apple', 5), ('banana', 2)]
total = sum(word_counts.values()) # Total count

deque (Double-ended queue)


deque is an optimized list-like container with fast appends and pops
on both ends.

Time Complexities:

Append/pop from either end: O(1)

Access by index: O(1) from ends, O(n) from middle

Length: O(1)

python

# Creating deques
empty_deque = deque()
numbers_deque = deque([1, 2, 3, 4, 5])
limited_deque = deque([1, 2, 3], maxlen=5) # Max size enforced

# Common operations
numbers_deque.append(6) # Add to right: O(1)
numbers_deque.appendleft(0) # Add to left: O(1)
right_item = numbers_deque.pop() # Remove from right: O(1)
left_item = numbers_deque.popleft() # Remove from left: O(1)
# dq[0] and dq[-1] are O(1) - very fast for ends
# dq[middle] is O(n) - slower for middle elements

MATH MODULE

python
import math

x=16
print("sqrt:",math.sqrt(x))
print("pow:",math.pow(2,5))
print("log_e:",math.log(100))
print("log_base_2:",math.log(8,2))

a,b=48,18
print("gcd:",math.gcd(a,b))
math.lcm(1,2)
print("factorial:",math.factorial(5))

lst=[3,1,4,1,5,9]
print("min:",min(lst))
print("max:",max(lst))
print("len:",len(lst))
print("sum:",sum(lst))

PYTHON HEAPS - CLEAN & CONCISE GUIDE

python

import heapq

BASIC HEAP OPERATIONS

python
# Min heap (default)
heap = []
heapq.heappush(heap, 5)
heapq.heappush(heap, 3)
heapq.heappush(heap, 8)
print(heap) # [3, 5, 8]

# Get minimum (peek)


min_val = heap[0] # 3

# Extract minimum
min_val = heapq.heappop(heap) # Returns 3

# Convert list to heap


nums = [5, 3, 8, 1, 9]
heapq.heapify(nums) # O(n) time
print(nums) # [1, 3, 8, 5, 9]

MAX HEAP (using negatives)

python

# Max heap - negate values


max_heap = []
heapq.heappush(max_heap, -5)
heapq.heappush(max_heap, -3)
heapq.heappush(max_heap, -8)

# Extract max (remember to negate back)


max_val = -heapq.heappop(max_heap) # Returns 8

# Convert to max heap


nums = [5, 3, 8, 1, 9]
max_heap = [-x for x in nums]
heapq.heapify(max_heap)

CUSTOM OBJECTS IN HEAPS

WHY WE NEED CUSTOM METHODS


python

# Problem: Python doesn't know how to compare custom objects


class Student:
def __init__(self, name, marks):
self.name = name
self.marks = marks

# This will FAIL:


# heap = []
# heapq.heappush(heap, Student("Alice", 85))
# heapq.heappush(heap, Student("Bob", 90))
# TypeError: '<' not supported between instances

# Solution: Define __lt__ method to tell Python how to compare

BASIC CUSTOM OBJECT

python
class Task:
def __init__(self, priority, name):
self.priority = priority
self.name = name

def __lt__(self, other):


# This method is called when heap compares two Task objects
# self < other means "should self come before other in heap?"
# Lower priority number = higher importance
return self.priority < other.priority

def __repr__(self):
# This makes print() show readable output instead of <object at 0x...>
return f"Task({self.priority}, {self.name})"

# Usage
heap = []
heapq.heappush(heap, Task(3, 'low'))
heapq.heappush(heap, Task(1, 'high')) # Goes to top because 1 < 3
heapq.heappush(heap, Task(2, 'medium'))

print(heap) # [Task(1, 'high'), Task(3, 'low'), Task(2, 'medium')]

# Pop returns the "smallest" (highest priority)


task = heapq.heappop(heap) # Task(1, 'high')

WHAT IF WE CHANGE THE COMPARISON?

python
class TaskReversed:
def __init__(self, priority, name):
self.priority = priority
self.name = name

def __lt__(self, other):


# Changed: Higher number = higher priority
return self.priority > other.priority # Notice the > instead of <

def __repr__(self):
return f"TaskReversed({self.priority}, {self.name})"

# Now higher numbers come first


heap2 = []
heapq.heappush(heap2, TaskReversed(3, 'high'))
heapq.heappush(heap2, TaskReversed(1, 'low')) # Goes to bottom because 1 > 3 is False
heapq.heappush(heap2, TaskReversed(2, 'medium'))

print(heap2) # [TaskReversed(3, 'high'), TaskReversed(1, 'low'), TaskReversed(2, 'medium')]

MULTI-LEVEL SORTING (TUPLE-LIKE)

python
class Person:
def __init__(self, name, marks, age, salary):
self.name = name
self.marks = marks
self.age = age
self.salary = salary

def __lt__(self, other):


# Sort by: marks (ascending), then age (ascending), then salary (ascending)
# If marks are different, compare by marks
if self.marks != other.marks:
return self.marks < other.marks

# If marks are same, compare by age


if self.age != other.age:
return self.age < other.age

# If both marks and age are same, compare by salary


return self.salary < other.salary

def __repr__(self):
# Return as tuple for easy reading
return f"({self.marks}, {self.age}, {self.salary})"

# Test multi-level sorting


people = [
Person("Alice", 85, 22, 50000),
Person("Bob", 85, 20, 55000), # Same marks, younger age
Person("Charlie", 90, 25, 60000), # Higher marks
Person("David", 85, 20, 45000), # Same marks, same age, lower salary
]

heap = []
for person in people:
heapq.heappush(heap, person)

print("Sorted order:")
while heap:
person = heapq.heappop(heap)
print(person)
# Output:
# (85, 20, 45000) <- David: lowest marks, then age, then salary
# (85, 20, 55000) <- Bob: same marks/age, higher salary
# (85, 22, 50000) <- Alice: same marks, older age
# (90, 25, 60000) <- Charlie: highest marks

ALTERNATIVE: USING TUPLES DIRECTLY

python

# Instead of custom __lt__, you can use tuples


# Tuples compare element by element automatically

students = [
(85, 22, 50000, "Alice"), # (marks, age, salary, name)
(85, 20, 55000, "Bob"),
(90, 25, 60000, "Charlie"),
(85, 20, 45000, "David"),
]

heap = []
for student in students:
heapq.heappush(heap, student)

print("\nUsing tuples:")
while heap:
marks, age, salary, name = heapq.heappop(heap)
print(f"({marks}, {age}, {salary}) - {name}")

SUMMARY

1. Custom objects need lt method for heap comparison

2. repr method makes debugging easier (optional but recommended)


3. lt returns True if self should come before other in heap

4. For multi-level sorting, use if-elif-else chain


5. Alternative: Use tuples instead of custom objects

6. Min heap: smaller values come first


7. To reverse order, change < to > in lt method
COMMON LEETCODE PROBLEMS

1. Kth Largest Element in Stream

python

class KthLargest:
def __init__(self, k, nums):
self.k = k
self.heap = nums
heapq.heapify(self.heap)

# Keep only k largest


while len(self.heap) > k:
heapq.heappop(self.heap)

def add(self, val):


heapq.heappush(self.heap, val)
if len(self.heap) > self.k:
heapq.heappop(self.heap)
return self.heap[0]

2. Meeting Rooms II

python

def min_meeting_rooms(intervals):
if not intervals:
return 0

intervals.sort() # Sort by start time


heap = [] # Track end times

for start, end in intervals:


# Remove finished meetings
if heap and heap[0] <= start:
heapq.heappop(heap)

heapq.heappush(heap, end)

return len(heap)
3. Task Scheduler with Cooldown

python

def task_scheduler(tasks, n):


from collections import Counter
count = Counter(tasks)

# Max heap of frequencies


heap = [-freq for freq in count.values()]
heapq.heapify(heap)

time = 0
while heap:
cycle = []
# Process n+1 tasks (or until heap empty)
for _ in range(n + 1):
if heap:
cycle.append(heapq.heappop(heap))

# Put back tasks that still have work


for freq in cycle:
if freq < -1: # Still has work (remember it's negated)
heapq.heappush(heap, freq + 1)

# Add time for this cycle


time += len(cycle) if not heap else n + 1

return time

KEY POINTS TO REMEMBER

1. Python's heapq is MIN heap by default

2. For MAX heap, negate values: -x

3. Custom objects need lt method

4. Heap[0] is always the minimum (or maximum for max heap)

5. Time complexity: Insert/Extract O(log n), Peek O(1)

6. Space complexity: O(n)


8. CUSTOM DATA STRUCTURES

Linked List
Linked lists store elements in nodes that point to the next node.

Time Complexities:

Access: O(n)

Insertion/deletion at beginning: O(1)


Insertion/deletion at end: O(n) (without tail pointer), O(1) (with tail pointer)
Search: O(n)

python
class Node:
def __init__(self, data):
self.data = data
self.next = None

class LinkedList:
def __init__(self):
self.head = None
self.tail = None # Optional: for O(1) append operations
self.size = 0

def prepend(self, data):


"""Add element to beginning of list - O(1)"""
new_node = Node(data)
if not self.head:
self.head = new_node
self.tail = new_node
else:
new_node.next = self.head
self.head = new_node
self.size += 1

def append(self, data):


"""Add element to end of list - O(1) with tail pointer"""
new_node = Node(data)
if not self.head:
self.head = new_node
self.tail = new_node
else:
self.tail.next = new_node
self.tail = new_node
self.size += 1

def insert_after(self, prev_node, data):


"""Insert after a given node - O(1)"""
if not prev_node:
print("Previous node cannot be None")
return

new_node = Node(data)
new_node.next = prev_node.next
prev_node.next = new_node

if prev_node == self.tail:
self.tail = new_node
self.size += 1

def delete_node(self, key):


"""Delete first occurrence of node with data = key - O(n)"""
if not self.head:
return

# If head node holds the key


if self.head.data == key:
self.head = self.head.next
if not self.head:
self.tail = None
self.size -= 1
return

# Search for the key and keep track of previous node


current = self.head
while current.next and current.next.data != key:
current = current.next

# If key was found


if current.next:
if current.next == self.tail:
self.tail = current
current.next = current.next.next
self.size -= 1

def search(self, key):


"""Search for node with data = key - O(n)"""
current = self.head
while current:
if current.data == key:
return True
current = current.next
return False

def get_at_index(self, index):


"""Get data at specified index - O(n)"""
if index < 0 or index >= self.size:
raise IndexError("Index out of range")

current = self.head
for i in range(index):
current = current.next
return current.data

def print_list(self):
"""Print all elements in the list - O(n)"""
current = self.head
while current:
print(current.data, end=" -> ")
current = current.next
print("None")

# Example usage
linked_list = LinkedList()
linked_list.append(1)
linked_list.append(2)
linked_list.prepend(0)
linked_list.print_list() # 0 -> 1 -> 2 -> None

Binary Search Tree


Binary Search Trees (BST) store data in a hierarchical structure
where each node has at most two children.

Time Complexities (balanced tree):

Search/Insert/Delete: O(log n)

Worst case (unbalanced): O(n)

python
class TreeNode:
def __init__(self, key):
self.key = key
self.left = None
self.right = None

class BinarySearchTree:
def __init__(self):
self.root = None

def insert(self, key):


"""Insert a new key in BST - O(log n) average, O(n) worst case"""
self.root = self._insert_recursive(self.root, key)

def _insert_recursive(self, root, key):


# If tree is empty, return a new node
if not root:
return TreeNode(key)

# Otherwise, recur down the tree


if key < root.key:
root.left = self._insert_recursive(root.left, key)
else:
root.right = self._insert_recursive(root.right, key)

# Return the unchanged node pointer


return root

def search(self, key):


"""Search for a key in BST - O(log n) average, O(n) worst case"""
return self._search_recursive(self.root, key)

def _search_recursive(self, root, key):


# Base case: root is None or key is at root
if not root or root.key == key:
return root

# Key is greater than root's key


if key > root.key:
return self._search_recursive(root.right, key)
# Key is smaller than root's key
return self._search_recursive(root.left, key)

def delete(self, key):


"""Delete a key from BST - O(log n) average, O(n) worst case"""
self.root = self._delete_recursive(self.root, key)

def _delete_recursive(self, root, key):


# Base case
if not root:
return root

# Recursive calls for ancestors of node to be deleted


if key < root.key:
root.left = self._delete_recursive(root.left, key)
elif key > root.key:
root.right = self._delete_recursive(root.right, key)
else:
# Node with only one child or no child
if not root.left:
return root.right
elif not root.right:
return root.left

# Node with two children: Get the inorder successor


# (smallest in the right subtree)
root.key = self._min_value_node(root.right).key

# Delete the inorder successor


root.right = self._delete_recursive(root.right, root.key)

return root

def _min_value_node(self, node):


"""Find the node with minimum key value in a subtree"""
current = node
while current.left:
current = current.left
return current

def inorder_traversal(self):
"""Inorder traversal of BST - O(n)"""
result = []
self._inorder_recursive(self.root, result)
return result

def _inorder_recursive(self, root, result):


if root:
self._inorder_recursive(root.left, result)
result.append(root.key)
self._inorder_recursive(root.right, result)

def preorder_traversal(self):
"""Preorder traversal of BST - O(n)"""
result = []
self._preorder_recursive(self.root, result)
return result

def _preorder_recursive(self, root, result):


if root:
result.append(root.key)
self._preorder_recursive(root.left, result)
self._preorder_recursive(root.right, result)

def postorder_traversal(self):
"""Postorder traversal of BST - O(n)"""
result = []
self._postorder_recursive(self.root, result)
return result

def _postorder_recursive(self, root, result):


if root:
self._postorder_recursive(root.left, result)
self._postorder_recursive(root.right, result)
result.append(root.key)

def height(self):
"""Calculate height of BST - O(n)"""
return self._height_recursive(self.root)

def _height_recursive(self, node):


if not node:
return -1

left_height = self._height_recursive(node.left)
right_height = self._height_recursive(node.right)

return max(left_height, right_height) + 1

# Example usage
bst = BinarySearchTree()
keys = [50, 30, 70, 20, 40, 60, 80]
for key in keys:
bst.insert(key)

print("Inorder traversal:", bst.inorder_traversal()) # [20, 30, 40, 50, 60, 70, 80]
print("Height of tree:", bst.height()) # 2

bst.delete(20)
print("After deleting 20:", bst.inorder_traversal()) # [30, 40, 50, 60, 70, 80]

You might also like