Advanced Python
Advanced Python
Data structures
•Sets
•Tuples
•Dictionary comprehensions
Sets
A set is an unordered collection of unique elements.
Key Features:
•No duplicates
•Unordered
•Mutable (can add/remove items)
•Elements must be immutable (e.g., no lists or dicts as elements)
Common Operations:
s = {1, 2, 3, 3, 4} # {1, 2, 3, 4}
s.add(5) # {1, 2, 3, 4, 5}
s.remove(2) # {1, 3, 4, 5}
s.discard(10) # No error if not present
s.pop() # Removes a random element
Set Algebra:
a = {1, 2, 3}
b = {3, 4, 5}
Tuples
A tuple is an ordered, immutable collection of
elements.
Key Features:
•Ordered
•Immutable (cannot be changed after creation)
•Can contain mixed data types
Usage:
t = (1, 2, 3)
t[0] #1
len(t) #3
a, b, c = t # Tuple unpacking
Dictionary Comprehensions
A dictionary comprehension is a concise way to create dictionaries.
Basic Syntax:
{key_expr: value_expr for item in iterable if condition}
Example:
nums = [1, 2, 3, 4, 5]
squared = {x: x**2 for x in nums} # {1: 1, 2: 4, 3: 9, 4: 16,
5: 25}
Inverting a Dictionary:
Summary Table
Unique Common Use
Data Structure Ordered Mutable
Elements Case
Removing
duplicates,
set ❌ ✅ ✅
membership
tests
Fixed
collections,
tuple ✅ ❌ ❌
function
returns
dict Efficient
(comprehensio ✅ ✅ N/A dictionary
n) construction
Iterators in Python
What is an Iterator?
An iterator is an object that implements two methods:
•__iter__() — returns the iterator object itself.
•__next__() — returns the next value and raises StopIteration when exhausted.
def __iter__(self):
return self
def __next__(self):
if self.current > self.high:
raise StopIteration
else:
current = self.current
self.current += 1
return current
my_list = [1, 2, 3]
iterator = iter(my_list)
print(next(iterator)) # 1
print(next(iterator)) # 2
Generators
What is a Generator?
A generator is a function that:
•Uses yield instead of return.
•Produces a generator object (which is an iterator).
•Suspends execution on yield and resumes from there on next call.
Example:
def countdown(n):
while n > 0:
yield n
n -= 1
for i in countdown(5):
print(i)
Advantages of Generators
Feature Benefit
Memory-efficient (especially
Lazy Evaluation
with big data)
Infinite Sequences Naturally handles endless data
Easy to chain transformations
Composable
(e.g. pipelines)
Use Cases
1.Reading large files
def read_large_file(file_name):
with open(file_name) as f:
for line in f:
yield line
2. Infinite sequences
def naturals():
i=0
while True:
yield i
i += 1
Generator vs Iterator Summary
Example:
Advanced Features:
1. Nested Comprehensions (Matrix Flattening):
matrix = [[1, 2], [3, 4], [5, 6]]
flattened = [num for row in matrix for num in row]
Output: [1, 2, 3, 4, 5, 6]
2. Calling Functions Inside:
def capitalize_name(name): return name.capitalize()
names = ["john", "jane"]
capitalized = [capitalize_name(n) for n in names]
3. Multiple Conditions:
filtered = [x for x in range(20) if x % 2 == 0 if x > 10]
Example:
square = lambda x: x ** 2
print(square(5)) # Output: 25
Use Case:
When you need a quick function temporarily (e.g., inside map, filter, sorted, etc.)
2. map() – Apply a Function to All Items in an Iterable
Applies a function to every item of an iterable (like a list) and returns a map object
(convert to list if needed).
Syntax:
map(function, iterable)
Example:
numbers = [1, 2, 3, 4]
squared = list(map(lambda x: x ** 2, numbers))
print(squared) # Output: [1, 4, 9, 16]
Example:
from functools import reduce
numbers = [1, 2, 3, 4]
product = reduce(lambda x, y: x * y, numbers)
print(product) # Output: 24
numbers = [1, 2, 3, 4, 5, 6]
print(total)
OOP & Design Patterns for AI
Pipelines
Class inheritance & polymorphism in model wrappers in details
In AI pipeline development, Object-Oriented
Programming (OOP) principles like inheritance and
polymorphism are pivotal for creating modular,
reusable, and maintainable code. These principles
facilitate the construction of flexible model wrappers
that can adapt to various machine learning models and
tasks.
redundant code.
Example:
class BaseModel:
def train(self, data):
pass
class RandomForestModel(BaseModel):
def train(self, data):
# Implement training logic
pass
Practical Applications
•Scikit-learn: Utilizes inheritance to provide a consistent interface across various models, such
as BaseEstimator, TransformerMixin, and ClassifierMixin. This allows for uniform
handling of different models within pipelines.
• PyTorch: Employs inheritance through nn.Module to define neural network layers,
enabling the construction of complex models by stacking simple components.
Best Practices
•Favor Composition Over Inheritance: Use
composition to assemble behaviors, as it offers greater
flexibility and reduces tight coupling between classes.
•Implement Common Interfaces: Define common
interfaces for models to ensure consistency and ease of
integration within pipelines.
•Use Design Patterns Appropriately: Apply design
patterns like Adapter and Strategy to solve common
problems in AI pipeline development, ensuring code
maintainability and scalability.
1. __init__(self, ...)
The constructor method initializes a new instance of a class. It's automatically
called when a new object is created.
class Product:
def __init__(self, name, price):
self.name = name
self.price = price
2. __repr__(self)
Defines the "official" string representation of an object, useful for debugging and logging.
If __str__ is not defined, __repr__ is used as a fallback
class Product:
def __repr__(self):
return f"Product(name={self.name!r}, price={self.price!r})"
3. __str__(self)
Provides a user-friendly string representation of the object, commonly used by print()
and str().
class Product:
def __str__(self):
return f"{self.name} costs ${self.price:.2f}"
4. __eq__(self, other)
Allows comparison using the == operator
class Product:
def __eq__(self, other):
return self.name == other.name and self.price == other.price
5. __lt__(self, other)
Enables less-than comparison using the < operator.
class Product:
def __lt__(self, other):
return self.price < other.price
6. __add__(self, other)
Defines behavior for the + operator
class Product:
def __add__(self, other):
return self.price + other.price
7. __call__(self, ...)
Makes an instance callable like a function.
class Product:
def __call__(self, discount):
return self.price * (1 - discount)
8. __getitem__(self, key)
Allows indexing into an object using square brackets.
class Product:
def __getitem__(self, key):
return getattr(self, key)
9. __setattr__(self, name, value)
Intercepts attribute assignments.
class Product:
def __setattr__(self, name, value):
if name == "price" and value < 0:
raise ValueError("Price cannot be negative")
super().__setattr__(name, value)
def __iter__(self):
return iter(self.products)
12. __enter__(self) and __exit__(self)
Define behavior for context management, allowing the use of with statements.
class Product:
def __enter__(self):
return self
def __repr__(self):
return f"Product(name={self.name!r}, price={self.price!r})"
def __str__(self):
return f"{self.name} costs ${self.price:.2f}"
def __iter__(self):
return iter([self.name, self.price])
def __enter__(self):
return self
class ConfigurationManager:
_instance = None
_lock = threading.Lock()
def load_config(self):
try:
with open(self.config_file, 'r') as file:
self.config = json.load(file)
except FileNotFoundError:
raise ValueError(f"Config file not found: {self.config_file}")
except json.JSONDecodeError:
raise ValueError(f"Invalid JSON format in config file: {self.config_file}")
Best Practices
•Lazy Initialization: Delay the creation of the Singleton instance until it's
needed.
•Thread Safety: Use synchronization mechanisms to ensure that the
instance is created safely in multi-threaded contexts.
•Avoid Overuse: Use the Singleton pattern judiciously to prevent
Implementing the Singleton pattern in Python ensures that a class has only
one instance and provides a global point of access to it. This is particularly
useful for managing shared resources like configuration settings, logging
services, or database connections.
Classic Singleton Implementation
A straightforward approach to implementing the Singleton pattern in Python is by
overriding the __new__ method to control the instance creation:
class Singleton:
_instance = None
class SingletonMeta(type):
_instances = {}
class Singleton(metaclass=SingletonMeta):
pass
With this implementation, any class that uses SingletonMeta as its metaclass will
follow the Singleton pattern.
Testing Singleton Classes
When testing Singleton classes, it's crucial to ensure that the singleton
instance behaves as expected across tests. One approach is to provide a
method to reset the singleton instance:
class TestableSingleton:
_instance = None
def increment_value(self):
self.value += 1
def reset_value(self):
self.value = 0
def outer(message):
def inner():
print(message)
return inner
@memoize
def expensive_computation(x):
# Simulate a time-consuming computation
return x * x
@simple_decorator
def greet():
print("Hello!")
greet()
Output:
Before the function call
Hello!
After the function call
In this example, wrapper is a closure that extends the behavior of the
greet function.
Example: Counting Function Calls with Decorators
def count_calls(func):
def wrapper(*args, **kwargs):
wrapper.calls += 1
print(f"Call {wrapper.calls} of {func.__name__}")
return func(*args, **kwargs)
wrapper.calls = 0
return wrapper
@count_calls
def say_hello():
print("Hello!")
say_hello()
say_hello()
Output:
Call 1 of say_hello
Hello!
Call 2 of say_hello
Hello!
Here, wrapper is a closure that maintains a count of how many times say_hello is
Combining Closures and Decorators
Decorators often utilize closures to retain state between function
calls. This combination allows decorators to be highly flexible and
powerful.
Example: Parameterized Decorator
def repeat(n):
def decorator(func):
def wrapper(*args, **kwargs):
for _ in range(n):
func(*args, **kwargs)
return wrapper
return decorator
@repeat(3)
def greet():
print("Hello!")
greet()
Output:
Hello!
Hello!
Hello!
Relevance in AI Development
In AI development, closures and decorators can be used for:
•Stateful Callbacks: Closures can maintain state across multiple
invocations, useful for callbacks in machine learning pipelines.
•Performance Optimization: Decorators can implement caching or
logging mechanisms to monitor and optimize AI model performance.
•Code Modularity: Both closures and decorators promote modular
code, making it easier to manage and extend AI systems.
functools.lru_cache for
memoization
In Python, the functools.lru_cache decorator provides an efficient
way to implement memoization, which is particularly useful in
artificial intelligence (AI) and machine learning tasks that involve
repetitive computations, such as recursive algorithms or dynamic
programming problems.
What Is functools.lru_cache?
The lru_cache decorator caches the results of function calls based on
their arguments. When the function is called again with the same arguments,
the cached result is returned instead of recomputing it, leading to performance
improvements. It uses a Least Recently Used (LRU) strategy to manage the
cache, discarding the least recently used entries when the cache exceeds its
size limit.
Syntax
@lru_cache(maxsize=128, typed=False)
def some_function(args):
# function body
@lru_cache(maxsize=None)
def fibonacci(n):
if n < 2:
return n
return fibonacci(n - 1) + fibonacci(n - 2)
print(fibonacci.cache_info())
This will output information about cache hits, misses, and the current
cache size.
2. Factorial Function
The factorial function is another example where memoization can be
beneficial.
from functools import lru_cache
@lru_cache(maxsize=None)
def factorial(n):
if n <= 1:
return 1
return n * factorial(n - 1)
Important Considerations
•Argument Types: Arguments passed to the decorated function must be hashable,
as they are used as keys in the cache.
•Mutable Arguments: Avoid using mutable types (e.g., lists, dictionaries) as
arguments, as their hash values can change, leading to inconsistencies in the cache.
•Thread Safety: The cache is thread-safe, allowing the decorated function to be
used in multi-threaded environments.
•Cache Size: Setting maxsize=None allows the cache to grow indefinitely,
which can lead to increased memory usage. It's advisable to set an appropriate
maxsize based on your application's memory constraints
partial functions for hyperparameter
tuning
Functional programming concepts, particularly partial functions and
decorators, can be very powerful in AI and machine learning
pipelines, especially when tuning hyperparameters, composing model
components, or organizing code in a reusable and expressive way.
Advantages:
•Reusability: SVC_partial can be reused across experiments.
•Clean separation between model configuration and tuning.
from functools import partial
from sklearn.svm import SVC
from sklearn.model_selection import GridSearchCV
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
# Grid of hyperparameters
param_grid = {
'svm__C': [0.1, 1, 10],
'svm__gamma': [0.01, 0.1, 1]
}
# During training
output = model(input)
loss = tuned_loss(output, target)
loss.backward()
Use case: You can easily adjust the penalty_factor and pass it to
hyperparameter search tools like Optuna or Ray Tune.
Decorators in AI: Use for Logging, Tuning, and Caching
Decorators let you wrap functions with additional behavior. Common
AI uses include:
•Timing model training
•Logging evaluation metrics
•Caching results
Example: Decorator to Log Model Accuracy
import time
from functools import wraps
def log_accuracy(func):
@wraps(func)
def wrapper(*args, **kwargs):
start = time.time()
acc = func(*args, **kwargs)
print(f"Accuracy: {acc:.4f} | Time: {time.time() -
start:.2f}s")
return acc
return wrapper
@log_accuracy
def evaluate_model(model, X_test, y_test):
return model.score(X_test, y_test)
Exercise:
•Write a decorator to log input shapes for AI model training functions.
Use functools.partial to fix certain hyperparameters in a model training
function
Data Handling with NumPy &
Pandas
Efficiently process AI datasets and
Handle missing data and normalize features.
NumPy vectorized operations
import numpy as np
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
c = a + b # Vectorized addition
# Vectorized normalization
normalized_data = (data - mean) / std
import numpy as np
NumPy's vectorized addition and np.clip() ensure fast and safe transformation.
Two real-world examples—one with stock prices and one with
sensor data—both using NumPy vectorized operations.
Daily Returns:
[0.0133 0.0066 0.0131 0.0129 -0.0063 0.0128
0.0127 0.0125 -0.0061]
7-Day Moving Average:
[154.7143 155.8571 157.2857 158.5714]
This avoids loops entirely and uses NumPy’s convolve and slicing
features.
Sensor Data – Temperature Threshold Alerts
Goal:
•Identify temperature spikes above a certain threshold
•Normalize temperature values between 0 and 1 (min-max scaling)
Simulated Sensor Data:
# Hourly temperature readings from a sensor (°C)
temps = np.array([22.5, 23.0, 25.1, 30.0, 31.2, 28.7, 26.5, 24.3,
23.7, 22.9])
# Threshold detection
threshold = 29.0
alerts = temps > threshold
Threshold Alerts:
[False False False True True False False False False False]
Normalized Temperatures:
[0. 0.06 0.52 0.93 1. 0.76 0.58 0.32 0.26 0.04]
These are real-time operations that could feed into an alert system
or a dashboard—done efficiently with NumPy.
Example:
import pandas as pd
import numpy as np
print(df)
This will output a DataFrame with a multi-level index, where each row is
uniquely identified by a combination of 'City' and 'Quarter'.
Accessing Data:
•To access data for a specific city and quarter:
df.loc['New York']
df.xs('Q1', level='Quarter')
Conditional Indexing
Conditional indexing allows you to filter data based on specific
conditions.
Example:
# Filtering rows where Sales are greater than 300
df_filtered = df[df['Sales'] > 300]
print(df_filtered)
This will return all rows where the 'Sales' value is greater than 300.
1. Basic Grouping
You can group data by one or more columns and perform aggregate functions like
sum, mean, or count.
Example:
# Grouping by 'City' and calculating the mean of 'Sales' and
'Profit'
df_grouped = df.groupby('City').agg({
'Sales': 'mean',
'Profit': 'mean'
})
print(df_grouped)
This will return the mean 'Sales' and 'Profit' for each city.
This will return the total 'Sales' for each combination of 'City' and
'Quarter'.
3. Applying Custom Functions
You can apply custom functions to each group using the apply method.
Example:
# Defining a custom function to calculate the range (max -
min)
def calc_range(group):
return group.max() - group.min()
This will group the data by the sum of 'Sales' and 'Profit' and return
the mean 'Sales' for each group.
5. Advanced Aggregations
You can perform multiple aggregations at once using the agg method.
Example:
df_sales = pd.DataFrame(data)
# Grouping by 'Store' and 'Product' and calculating total revenue and units
sold
df_sales_grouped = df_sales.groupby(['Store', 'Product']).agg({
'Revenue': 'sum',
'Units Sold': 'sum'
}).reset_index()
print(df_sales_grouped)
This will return a DataFrame showing the total 'Revenue' and 'Units Sold'
for each combination of 'Store' and 'Product'.
Data cleaning for AI input pipelines
Data cleaning is a crucial step in preparing data for AI input
pipelines. It ensures that the data is accurate, consistent, and
formatted correctly, which is essential for building effective AI
models.
Key Data Cleaning Techniques
2. Removing Duplicates
•Identifying and eliminating duplicate records to prevent bias in
model training.
3. Standardizing Formats
•Converting text data to a consistent case (e.g., lowercase).
•Stripping leading/trailing spaces.
•Normalizing date formats.
4. Handling Outliers
•Detecting anomalies using statistical methods or domain knowledge.
•Deciding whether to correct, remove, or retain outliers based on their impact.
5. Data Transformation
•Scaling numerical values.
•Encoding categorical variables.
•Feature extraction and engineering.
Challenges:
•Inconsistent product names (e.g., "Product A" vs. "product a").
•Missing revenue entries.
•Duplicate records for the same transaction.
Cleaning Steps:
•Standardize product names to lowercase.
•Impute missing revenue using the mean of the category.
•Remove duplicate entries based on transaction ID.
Challenges:
•Inconsistent formatting of addresses.
•Missing patient names.
•Duplicate records due to data entry errors.
Cleaning Steps:
•Use regular expressions to extract and standardize address
components.
•Impute missing names using available information or flag for
review.
•Merge duplicate records based on patient ID.
Features:
Feature Details
Format Text (plain text, human-readable)
Structure Rows and columns (no nesting)
Language Support Universally supported
File Size Moderate
Speed Slower to read/write for large data
import pandas as pd
Real-world Example:
•Storing settings in web applications
•Transmitting data between frontend and backend systems
Features:
Feature Details
Format Text (human-readable, hierarchical)
Key-value pairs (can nest
Structure
objects/lists)
Language Support Wide support
File Size Moderate
Speed Slower for large files
import json
# Save to JSON
data = {'name': 'Alice', 'age': 25, 'skills': ['Python', 'ML']}
with open('data.json', 'w') as f:
json.dump(data, f)
Real-world Example:
•Saving trained machine learning models
•Storing intermediate computation results
Features:
Feature Details
Format Binary (Python-specific)
Structure Any Python object
Language Support Python-only
File Size Efficient
Speed Fast serialization/deserialization
import pickle
4. Joblib
Optimized for large NumPy arrays and scikit-learn models.
Real-world Example:
•Saving a trained scikit-learn pipeline
•Caching computationally expensive functions
Features:
Feature Details
Format Binary (optimized for performance)
Structure Large data, NumPy arrays
Language Support Python-only
File Size Compressed and efficient
Faster than Pickle for large
Speed
numerical data
# Train model
model = RandomForestClassifier().fit([[0, 0], [1, 1]], [0, 1])
# Save model
dump(model, 'rf_model.joblib')
# Load model
loaded_model = load('rf_model.joblib')
HDF5 basics with h5py
HDF5 is a versatile file format designed to store large, complex datasets efficiently.
The h5py library provides a Pythonic interface to HDF5, allowing seamless
integration with NumPy arrays and enabling the creation, manipulation,
and querying of HDF5 files.
1. Installation
To install h5py, you can use either pip or conda:
pip install h5py
# or
conda install h5py
Ensure you also have numpy installed, as it's commonly used alongside h5py.
2. Basic Operations
📂 Opening a File
To open an HDF5 file, use:
import h5py
import h5py
import numpy as np
This code creates a file named newfile.h5 and adds two datasets, dataset1 and
dataset2, each containing random data.
Groups and Hierarchy
HDF5 files support a hierarchical structure similar to directories. You
can create groups within the file:
his creates a group named group1 and adds a dataset data within it
Accessing Data
To read data from a dataset:
with h5py.File('myfile.h5', 'r') as f:
data = f['group1/data'][:]
print(data)
This adds a description attribute to the dataset data and prints its value.
Instead of storing each image as a separate file, which can be cumbersome and
inefficient, we can organize them into an HDF5 file using h5py. This approach
allows for efficient storage and easy access.
Organizing Data in HDF5
We create an HDF5 file with two main groups: ships and no_ships. Each group
contains a dataset named data, which stores the images as NumPy arrays.
Additionally, we can store metadata such as image dimensions and labels as
attributes.
Code Example
import h5py
import numpy as np
# Add attributes
ships_group.attrs['label'] = 'ship'
no_ships_group.attrs['label'] = 'no-ship'
ships_group.attrs['image_shape'] = ships_images.shape
no_ships_group.attrs['image_shape'] = no_ships_images.shape
Accessing Data
To access the data, we can open the HDF5 file and
retrieve the images and their metadata:
with h5py.File('satellite_images.h5', 'r') as f:
ships_data = f['ships/data'][:]
no_ships_data = f['no_ships/data'][:]
ships_label = f['ships'].attrs['label']
no_ships_label = f['no_ships'].attrs['label']
ships_shape = f['ships'].attrs['image_shape']
no_ships_shape = f['no_ships'].attrs['image_shape']