0% found this document useful (0 votes)
0 views90 pages

Advanced Python

The document covers advanced Python topics including data structures like sets, tuples, and dictionary comprehensions, as well as iterators, generators, and functional programming techniques. It also discusses Object-Oriented Programming (OOP) principles and design patterns for AI pipelines, emphasizing the importance of clean model interfaces through magic methods. Additionally, it provides practical examples and best practices for implementing these concepts effectively.

Uploaded by

sirjojo407
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
0 views90 pages

Advanced Python

The document covers advanced Python topics including data structures like sets, tuples, and dictionary comprehensions, as well as iterators, generators, and functional programming techniques. It also discusses Object-Oriented Programming (OOP) principles and design patterns for AI pipelines, emphasizing the importance of clean model interfaces through magic methods. Additionally, it provides practical examples and best practices for implementing these concepts effectively.

Uploaded by

sirjojo407
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 90

ADVANCED PYTHON

Data structures
•Sets
•Tuples
•Dictionary comprehensions

Sets
A set is an unordered collection of unique elements.

Key Features:
•No duplicates
•Unordered
•Mutable (can add/remove items)
•Elements must be immutable (e.g., no lists or dicts as elements)
Common Operations:

s = {1, 2, 3, 3, 4} # {1, 2, 3, 4}
s.add(5) # {1, 2, 3, 4, 5}
s.remove(2) # {1, 3, 4, 5}
s.discard(10) # No error if not present
s.pop() # Removes a random element
Set Algebra:

a = {1, 2, 3}
b = {3, 4, 5}

a|b # Union → {1, 2, 3, 4, 5}


a&b # Intersection → {3}
a-b # Difference → {1, 2}
a^b # Symmetric difference → {1,
2, 4, 5}

Tuples
A tuple is an ordered, immutable collection of
elements.

Key Features:
•Ordered
•Immutable (cannot be changed after creation)
•Can contain mixed data types
Usage:
t = (1, 2, 3)
t[0] #1
len(t) #3
a, b, c = t # Tuple unpacking

nested = (1, (2, 3), [4, 5])

Even though tuples are immutable, they can contain mutable


elements like lists.

Dictionary Comprehensions
A dictionary comprehension is a concise way to create dictionaries.

Basic Syntax:
{key_expr: value_expr for item in iterable if condition}

Example:
nums = [1, 2, 3, 4, 5]
squared = {x: x**2 for x in nums} # {1: 1, 2: 4, 3: 9, 4: 16,
5: 25}
Inverting a Dictionary:

original = {'a': 1, 'b': 2}


inverted = {v: k for k, v in original.items()} # {1: 'a', 2: 'b'}

Summary Table
Unique Common Use
Data Structure Ordered Mutable
Elements Case

Removing
duplicates,
set ❌ ✅ ✅
membership
tests

Fixed
collections,
tuple ✅ ❌ ❌
function
returns

dict Efficient
(comprehensio ✅ ✅ N/A dictionary
n) construction
Iterators in Python

What is an Iterator?
An iterator is an object that implements two methods:
•__iter__() — returns the iterator object itself.
•__next__() — returns the next value and raises StopIteration when exhausted.

Example: Custom Iterator


class Counter:
def __init__(self, low, high):
self.current = low
self.high = high

def __iter__(self):
return self

def __next__(self):
if self.current > self.high:
raise StopIteration
else:
current = self.current
self.current += 1
return current

for num in Counter(1, 5):


print(num)
Iterable vs Iterator

Concept Description Example


Iterable Has __iter__() method Lists, strings, tuples
Has both __iter__() and
Iterator What iter() returns
__next__()

my_list = [1, 2, 3]
iterator = iter(my_list)
print(next(iterator)) # 1
print(next(iterator)) # 2
Generators

What is a Generator?
A generator is a function that:
•Uses yield instead of return.
•Produces a generator object (which is an iterator).
•Suspends execution on yield and resumes from there on next call.

Example:
def countdown(n):
while n > 0:
yield n
n -= 1

for i in countdown(5):
print(i)

Each time yield is hit:


•The function’s state is saved.
•Execution pauses.
•Resumes when next() is called again
Generator Expressions
Just like list comprehensions, but lazy (do not compute all at once):

gen = (x * x for x in range(5))


print(next(gen)) # 0

Advantages of Generators

Feature Benefit
Memory-efficient (especially
Lazy Evaluation
with big data)
Infinite Sequences Naturally handles endless data
Easy to chain transformations
Composable
(e.g. pipelines)
Use Cases
1.Reading large files
def read_large_file(file_name):
with open(file_name) as f:
for line in f:
yield line

2. Infinite sequences
def naturals():
i=0
while True:
yield i
i += 1
Generator vs Iterator Summary

Feature Iterator Generator


Class with
Syntax Function with yield
__iter__/__next__
Boilerplate More Less
Readability Lower Higher
Slightly better in lazy
Performance Similar
evaluation
List Comprehension
Basic Syntax:
[expression for item in iterable if condition]

Example:

squares = [x**2 for x in range(10)]

Output: [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]


With Condition:
even_squares = [x**2 for x in range(10) if x % 2 == 0]

Output: [0, 4, 16, 36, 64]

Advanced Features:
1. Nested Comprehensions (Matrix Flattening):
matrix = [[1, 2], [3, 4], [5, 6]]
flattened = [num for row in matrix for num in row]

Output: [1, 2, 3, 4, 5, 6]
2. Calling Functions Inside:
def capitalize_name(name): return name.capitalize()
names = ["john", "jane"]
capitalized = [capitalize_name(n) for n in names]

3. Multiple Conditions:
filtered = [x for x in range(20) if x % 2 == 0 if x > 10]

lambda (Anonymous Functions)

A lambda is a way to create small, unnamed (anonymous) functions in a single line.


Syntax:
lambda arguments: expression

Example:
square = lambda x: x ** 2
print(square(5)) # Output: 25

Use Case:
When you need a quick function temporarily (e.g., inside map, filter, sorted, etc.)
2. map() – Apply a Function to All Items in an Iterable

Applies a function to every item of an iterable (like a list) and returns a map object
(convert to list if needed).
Syntax:
map(function, iterable)

Example:
numbers = [1, 2, 3, 4]
squared = list(map(lambda x: x ** 2, numbers))
print(squared) # Output: [1, 4, 9, 16]

3. filter() – Filter Items in an Iterable


Filters elements from an iterable for which the function returns True.
Syntax:
filter(function, iterable)
Example:
numbers = [1, 2, 3, 4, 5, 6]
even = list(filter(lambda x: x % 2 == 0, numbers))
print(even) # Output: [2, 4, 6]
4. reduce() – Cumulative Operation on an Iterable
Applies a rolling/cumulative function to the iterable (must be imported from
functools).
Syntax:
from functools import reduce
reduce(function, iterable)

Example:
from functools import reduce
numbers = [1, 2, 3, 4]
product = reduce(lambda x, y: x * y, numbers)
print(product) # Output: 24

Function Purpose Returns Use Case


Anonymous inline Quick functions
lambda Function
function without def
Transform each Apply logic to all
map() Map object
element elements
Select elements Keep only
filter() Filter object
based on condition matching elements
Combine all into Summarize or
reduce() Single value
one value aggregate data
Example: Using All Together

from functools import reduce

numbers = [1, 2, 3, 4, 5, 6]

# Step 1: Filter even numbers


even = filter(lambda x: x % 2 == 0, numbers) #
[2, 4, 6]

# Step 2: Square them


squared = map(lambda x: x ** 2, even) # [4,
16, 36]

# Step 3: Sum them


total = reduce(lambda x, y: x + y, squared) # 56

print(total)
OOP & Design Patterns for AI
Pipelines
Class inheritance & polymorphism in model wrappers in details
In AI pipeline development, Object-Oriented
Programming (OOP) principles like inheritance and
polymorphism are pivotal for creating modular,
reusable, and maintainable code. These principles
facilitate the construction of flexible model wrappers
that can adapt to various machine learning models and
tasks.
redundant code.
Example:
class BaseModel:
def train(self, data):
pass

def predict(self, data):


pass

class RandomForestModel(BaseModel):
def train(self, data):
# Implement training logic
pass

def predict(self, data):


# Implement prediction logic
pass
In this example, RandomForestModel inherits from
BaseModel, allowing it to reuse the train and predict
Polymorphism enables objects of different classes to be
treated as instances of a common superclass, typically
through method overriding. This is particularly useful in
AI pipelines where multiple models share a common
interface.
Example:
class ModelWrapper:
def __init__(self, model):
self.model = model

def train(self, data):


self.model.train(data)

def predict(self, data):


return self.model.predict(data)

Here, ModelWrapper can wrap any model that


implements train and predict methods, allowing for
interchangeable use of different models.
Design Patterns in AI Pipelines
Applying design patterns can further enhance the structure and flexibility of AI
pipelines.
•Adapter Pattern: This pattern allows incompatible interfaces to work
together by providing a wrapper that translates one interface into another. In AI
pipelines, it enables integration of diverse models with different interfaces.
Composition over Inheritance: This principle suggests that classes should
achieve polymorphic behavior and code reuse by composing objects with
desired functionalities rather than through inheritance. In AI pipelines, this
approach can lead to more flexible and maintainable code.

Practical Applications
•Scikit-learn: Utilizes inheritance to provide a consistent interface across various models, such
as BaseEstimator, TransformerMixin, and ClassifierMixin. This allows for uniform
handling of different models within pipelines.
• PyTorch: Employs inheritance through nn.Module to define neural network layers,
enabling the construction of complex models by stacking simple components.
Best Practices
•Favor Composition Over Inheritance: Use
composition to assemble behaviors, as it offers greater
flexibility and reduces tight coupling between classes.
•Implement Common Interfaces: Define common
interfaces for models to ensure consistency and ease of
integration within pipelines.
•Use Design Patterns Appropriately: Apply design
patterns like Adapter and Strategy to solve common
problems in AI pipeline development, ensuring code
maintainability and scalability.

By leveraging OOP principles and design patterns, AI


pipeline developers can create robust, adaptable, and
maintainable systems that efficiently handle diverse
machine learning models and tasks.
Magic methods for clean model
interfaces
Magic methods in Python, often referred to as "dunder"
methods due to their double underscores, allow you to
define or customize the behavior of your classes in a
way that integrates seamlessly with Python's syntax and
built-in functions. This capability is particularly valuable
when designing clean and expressive model interfaces,
as it enables your objects to behave like built-in types
and interact naturally with Python's syntax.
Core Magic Methods for Clean Model Interfaces

1. __init__(self, ...)
The constructor method initializes a new instance of a class. It's automatically
called when a new object is created.

class Product:
def __init__(self, name, price):
self.name = name
self.price = price
2. __repr__(self)
Defines the "official" string representation of an object, useful for debugging and logging.
If __str__ is not defined, __repr__ is used as a fallback

class Product:
def __repr__(self):
return f"Product(name={self.name!r}, price={self.price!r})"

3. __str__(self)
Provides a user-friendly string representation of the object, commonly used by print()
and str().
class Product:
def __str__(self):
return f"{self.name} costs ${self.price:.2f}"

4. __eq__(self, other)
Allows comparison using the == operator
class Product:
def __eq__(self, other):
return self.name == other.name and self.price == other.price
5. __lt__(self, other)
Enables less-than comparison using the < operator.
class Product:
def __lt__(self, other):
return self.price < other.price

6. __add__(self, other)
Defines behavior for the + operator
class Product:
def __add__(self, other):
return self.price + other.price

7. __call__(self, ...)
Makes an instance callable like a function.
class Product:
def __call__(self, discount):
return self.price * (1 - discount)

8. __getitem__(self, key)
Allows indexing into an object using square brackets.
class Product:
def __getitem__(self, key):
return getattr(self, key)
9. __setattr__(self, name, value)
Intercepts attribute assignments.
class Product:
def __setattr__(self, name, value):
if name == "price" and value < 0:
raise ValueError("Price cannot be negative")
super().__setattr__(name, value)

10. __delattr__(self, name)


Intercepts attribute deletions.
class Product:
def __delattr__(self, name):
if name == "price":
raise AttributeError("Cannot delete price attribute")
super().__delattr__(name)
11. __iter__(self)
Makes an object iterable, enabling iteration in loops.
class ProductList:
def __init__(self, products):
self.products = products

def __iter__(self):
return iter(self.products)
12. __enter__(self) and __exit__(self)
Define behavior for context management, allowing the use of with statements.
class Product:
def __enter__(self):
return self

def __exit__(self, exc_type, exc_value, traceback):


pass

Example: Building a Clean Model Interface


Here's an example that combines several of these magic methods to create a Product
class with a clean and intuitive interface:

This class allows you to:


Create instances with Product("Laptop", 999.99)
•Print instances to get a user-friendly string representation
•Compare instances using == and <
•Add prices with +
•Apply discounts with ()
•Access attributes with []
•Iterate over attributes
class Product:
def __init__(self, name, price):
self.name = name
self.price = price

def __repr__(self):
return f"Product(name={self.name!r}, price={self.price!r})"

def __str__(self):
return f"{self.name} costs ${self.price:.2f}"

def __eq__(self, other):


return self.name == other.name and self.price == other.price

def __lt__(self, other):


return self.price < other.price

def __add__(self, other):


return self.price + other.price

def __call__(self, discount):


return self.price * (1 - discount)

def __getitem__(self, key):


return getattr(self, key)

def __setattr__(self, name, value):


if name == "price" and value < 0:
raise ValueError("Price cannot be negative")
super().__setattr__(name, value)

def __delattr__(self, name):


if name == "price":
raise AttributeError("Cannot delete price attribute")
super().__delattr__(name, value)

def __iter__(self):
return iter([self.name, self.price])

def __enter__(self):
return self

def __exit__(self, exc_type, exc_value, traceback):


pass
Singleton pattern for configuration
management
The Singleton pattern is a design pattern that ensures a class has only one
instance and provides a global point of access to it. It's particularly useful
for managing shared resources, such as configuration settings, logging
services, or database connections, where multiple instances could lead to
inconsistent states or resource wastage.
The Singleton pattern restricts the instantiation of a class to a single object.
This is achieved by:
1.Private Constructor: Prevents external instantiation.
2.Static Instance Variable: Holds the single instance.
3.Public Static Method: Provides access to the instance, creating it if
necessary.
This approach ensures that the class is only instantiated once, and
subsequent calls return the same instance.
Singleton for Configuration Management
In configuration management, the Singleton pattern ensures that all components
of an application access the same configuration settings, preventing
discrepancies and redundant resource usage.
Key Benefits:
•Centralized Configuration: All modules access the same configuration,
ensuring consistency.
•Resource Efficiency: Avoids the overhead of loading configuration settings
multiple times.
•Thread Safety: Proper implementation ensures that the configuration instance
is safely shared across threads.
Example Implementation in Python
import json
import threading

class ConfigurationManager:
_instance = None
_lock = threading.Lock()

def __new__(cls, config_file):


with cls._lock:
if cls._instance is None:
cls._instance = super(ConfigurationManager, cls).__new__(cls)
cls._instance.config_file = config_file
cls._instance.load_config()
return cls._instance

def load_config(self):
try:
with open(self.config_file, 'r') as file:
self.config = json.load(file)
except FileNotFoundError:
raise ValueError(f"Config file not found: {self.config_file}")
except json.JSONDecodeError:
raise ValueError(f"Invalid JSON format in config file: {self.config_file}")

def get_settings(self, section=None):


if section is None:
return self.config
return self.config.get(section)
In this example, ConfigurationManager ensures that the configuration file is loaded
only once, and all parts of the application access the same settings.
Real-World Applications
•AI Model Training: Ensures consistent hyperparameters across training
modules.
•Inference Pipelines: Maintains uniform configuration across
preprocessing, model loading, and post-processing steps.
•Distributed Systems: Guarantees that all nodes in a cluster use the
same configuration settings.
Considerations and Pitfalls
While the Singleton pattern offers several advantages, it's important to be
aware of potential drawbacks:
•Hidden Dependencies: Over-reliance on a Singleton can make the system
harder to understand and maintain.
•Testing Challenges: The global state introduced by Singletons can
complicate unit testing.
•Concurrency Issues: Improper implementation can lead to race conditions in
multi-threaded environments.

Best Practices
•Lazy Initialization: Delay the creation of the Singleton instance until it's
needed.
•Thread Safety: Use synchronization mechanisms to ensure that the
instance is created safely in multi-threaded contexts.
•Avoid Overuse: Use the Singleton pattern judiciously to prevent
Implementing the Singleton pattern in Python ensures that a class has only
one instance and provides a global point of access to it. This is particularly
useful for managing shared resources like configuration settings, logging
services, or database connections.
Classic Singleton Implementation
A straightforward approach to implementing the Singleton pattern in Python is by
overriding the __new__ method to control the instance creation:

class Singleton:
_instance = None

def __new__(cls, *args, **kwargs):


if cls._instance is None:
cls._instance = super().__new__(cls, *args, **kwargs)
return cls._instance

In this implementation, the __new__ method checks if an instance already exists.


If not, it creates one; otherwise, it returns the existing instance.
Metaclass-Based Singleton
Using a metaclass is a more Pythonic and flexible approach. A metaclass
allows you to control the creation of classes, making it ideal for implementing
the Singleton pattern:

class SingletonMeta(type):
_instances = {}

def __call__(cls, *args, **kwargs):


if cls not in cls._instances:
instance = super().__call__(*args, **kwargs)
cls._instances[cls] = instance
return cls._instances[cls]

class Singleton(metaclass=SingletonMeta):
pass

With this implementation, any class that uses SingletonMeta as its metaclass will
follow the Singleton pattern.
Testing Singleton Classes
When testing Singleton classes, it's crucial to ensure that the singleton
instance behaves as expected across tests. One approach is to provide a
method to reset the singleton instance:

class TestableSingleton:
_instance = None

def __new__(cls, *args, **kwargs):


if cls._instance is None:
cls._instance = super().__new__(cls, *args, **kwargs)
cls._instance.value = 0
return cls._instance

def increment_value(self):
self.value += 1

def reset_value(self):
self.value = 0

In this example, the TestableSingleton class provides a reset_value method to


reset its state for testing purposes
Considerations
•Thread Safety: In multi-threaded environments, ensure that the
Singleton implementation is thread-safe to prevent multiple instances
from being created simultaneously.
•Testing Challenges: Singletons can make unit testing difficult due
to their global state. Consider providing methods to reset the
singleton instance between tests.
•Overuse: While useful, overusing the Singleton pattern can lead to
tightly coupled code and make the system harder to maintain. Use it
judiciously.

By implementing the Singleton pattern thoughtfully, you can manage


shared resources effectively while maintaining clean and maintainable
code.
Functional Programming & Decorators in
AI
Closures and decorators
Understanding Closures
A closure occurs when a nested function captures and remembers
the environment in which it was created, even after the outer
function has finished executing. This allows the inner function to
access variables from the outer function's scope

Example: Basic Closure

def outer(message):
def inner():
print(message)
return inner

closure = outer("Hello, World!")


closure() # Output: Hello, World!

In this example, inner is a closure that captures the message variable


from its enclosing scope (outer).
Example: Memoization with Closures
Closures can be used to implement memoization, a technique to
cache function results for improved performance
def memoize(func):
cache = {}
def wrapper(arg):
if arg not in cache:
cache[arg] = func(arg)
return cache[arg]
return wrapper

@memoize
def expensive_computation(x):
# Simulate a time-consuming computation
return x * x

Here, wrapper is a closure that retains access to the cache


variable, allowing it to store and retrieve computed results
efficiently.
Understanding Decorators
A decorator is a function that takes another function and extends
its behavior without explicitly modifying it. Decorators are often
implemented using closures.
Example: Basic Decorator
def simple_decorator(func):
def wrapper():
print("Before the function call")
func()
print("After the function call")
return wrapper

@simple_decorator
def greet():
print("Hello!")

greet()

Output:
Before the function call
Hello!
After the function call
In this example, wrapper is a closure that extends the behavior of the
greet function.
Example: Counting Function Calls with Decorators

def count_calls(func):
def wrapper(*args, **kwargs):
wrapper.calls += 1
print(f"Call {wrapper.calls} of {func.__name__}")
return func(*args, **kwargs)
wrapper.calls = 0
return wrapper

@count_calls
def say_hello():
print("Hello!")

say_hello()
say_hello()

Output:

Call 1 of say_hello
Hello!
Call 2 of say_hello
Hello!
Here, wrapper is a closure that maintains a count of how many times say_hello is
Combining Closures and Decorators
Decorators often utilize closures to retain state between function
calls. This combination allows decorators to be highly flexible and
powerful.
Example: Parameterized Decorator

def repeat(n):
def decorator(func):
def wrapper(*args, **kwargs):
for _ in range(n):
func(*args, **kwargs)
return wrapper
return decorator

@repeat(3)
def greet():
print("Hello!")

greet()

Output:
Hello!
Hello!
Hello!
Relevance in AI Development
In AI development, closures and decorators can be used for:
•Stateful Callbacks: Closures can maintain state across multiple
invocations, useful for callbacks in machine learning pipelines.
•Performance Optimization: Decorators can implement caching or
logging mechanisms to monitor and optimize AI model performance.
•Code Modularity: Both closures and decorators promote modular
code, making it easier to manage and extend AI systems.
functools.lru_cache for
memoization
In Python, the functools.lru_cache decorator provides an efficient
way to implement memoization, which is particularly useful in
artificial intelligence (AI) and machine learning tasks that involve
repetitive computations, such as recursive algorithms or dynamic
programming problems.

What Is functools.lru_cache?
The lru_cache decorator caches the results of function calls based on
their arguments. When the function is called again with the same arguments,
the cached result is returned instead of recomputing it, leading to performance
improvements. It uses a Least Recently Used (LRU) strategy to manage the
cache, discarding the least recently used entries when the cache exceeds its
size limit.
Syntax

from functools import lru_cache

@lru_cache(maxsize=128, typed=False)
def some_function(args):
# function body

•maxsize: Maximum number of entries to store in the cache. Setting it to None


means an unlimited cache size.
•typed: If True, arguments of different types will be cached separately
(e.g., 3 and 3.0 will be treated as different).
Practical Examples
1. Fibonacci Sequence Calculation
Calculating Fibonacci numbers is a classic example where
memorization can significantly improve performance.
from functools import lru_cache

@lru_cache(maxsize=None)
def fibonacci(n):
if n < 2:
return n
return fibonacci(n - 1) + fibonacci(n - 2)

print(fibonacci(30)) # Output: 832040

In this example, the fibonacci function is decorated with lru_cache, which


caches the results of previous calls. For instance, when fibonacci(30) is
called, it computes the result and stores it in the cache. Subsequent calls
with the same argument will retrieve the result from the cache, avoiding redundant
calculations
You can inspect the cache's performance using the cache_info() method:

print(fibonacci.cache_info())

This will output information about cache hits, misses, and the current
cache size.

2. Factorial Function
The factorial function is another example where memoization can be
beneficial.
from functools import lru_cache

@lru_cache(maxsize=None)
def factorial(n):
if n <= 1:
return 1
return n * factorial(n - 1)

print(factorial(5)) # Output: 120


In this case, the factorial function calculates the factorial of a number n. By
using lru_cache, it stores the results of previous calculations. For example,
when factorial(5) is called, it computes the result and caches it. If
factorial(5) is called again, the cached result is returned, saving
computation time.

Important Considerations
•Argument Types: Arguments passed to the decorated function must be hashable,
as they are used as keys in the cache.
•Mutable Arguments: Avoid using mutable types (e.g., lists, dictionaries) as
arguments, as their hash values can change, leading to inconsistencies in the cache.
•Thread Safety: The cache is thread-safe, allowing the decorated function to be
used in multi-threaded environments.
•Cache Size: Setting maxsize=None allows the cache to grow indefinitely,
which can lead to increased memory usage. It's advisable to set an appropriate
maxsize based on your application's memory constraints
partial functions for hyperparameter
tuning
Functional programming concepts, particularly partial functions and
decorators, can be very powerful in AI and machine learning
pipelines, especially when tuning hyperparameters, composing model
components, or organizing code in a reusable and expressive way.

Functional Programming in AI: Why It Matters


Functional programming emphasizes immutability, pure functions, and first-
class functions. These principles are valuable in AI:
•Clean and testable code: Functions are easier to debug and reuse.
•Composability: Combine transformations like pipelines (e.g.,
sklearn.pipeline.Pipeline).
•Reproducibility: Less side-effect-prone code is easier to reproduce in
experiments.
functools.partial for Hyperparameter Tuning
The functools.partial function allows you to pre-fill some arguments of a
function, returning a new function with fewer arguments. This is useful in
hyperparameter tuning where you want to try different models with
preset configurations.

Example 1: Tuning an SVM Classifier

Advantages:
•Reusability: SVC_partial can be reused across experiments.
•Clean separation between model configuration and tuning.
from functools import partial
from sklearn.svm import SVC
from sklearn.model_selection import GridSearchCV
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler

# Create a partial function with some fixed parameters


SVC_partial = partial(SVC, kernel='rbf', probability=True)

# Now use this in a pipeline


pipeline = Pipeline([
('scaler', StandardScaler()),
('svm', SVC_partial()) # Instantiate it here
])

# Grid of hyperparameters
param_grid = {
'svm__C': [0.1, 1, 10],
'svm__gamma': [0.01, 0.1, 1]
}

# Perform grid search


search = GridSearchCV(pipeline, param_grid, cv=5)
search.fit(X_train, y_train)
Example 2: Using partial for Custom Loss in Neural Networks
(PyTorch)
In PyTorch, you can pass a custom loss function with certain fixed
parameters to a training loop.
import torch
import torch.nn as nn
from functools import partial

# Custom loss with weighted penalties


def custom_loss(output, target, penalty_factor=1.0):
loss = nn.MSELoss()(output, target)
penalty = penalty_factor * torch.mean(output**2)
return loss + penalty

# Fix penalty_factor for tuning


tuned_loss = partial(custom_loss, penalty_factor=0.1)

# During training
output = model(input)
loss = tuned_loss(output, target)
loss.backward()

Use case: You can easily adjust the penalty_factor and pass it to
hyperparameter search tools like Optuna or Ray Tune.
Decorators in AI: Use for Logging, Tuning, and Caching
Decorators let you wrap functions with additional behavior. Common
AI uses include:
•Timing model training
•Logging evaluation metrics
•Caching results
Example: Decorator to Log Model Accuracy
import time
from functools import wraps

def log_accuracy(func):
@wraps(func)
def wrapper(*args, **kwargs):
start = time.time()
acc = func(*args, **kwargs)
print(f"Accuracy: {acc:.4f} | Time: {time.time() -
start:.2f}s")
return acc
return wrapper

@log_accuracy
def evaluate_model(model, X_test, y_test):
return model.score(X_test, y_test)
Exercise:
•Write a decorator to log input shapes for AI model training functions.
Use functools.partial to fix certain hyperparameters in a model training
function
Data Handling with NumPy &
Pandas
Efficiently process AI datasets and
Handle missing data and normalize features.
NumPy vectorized operations

NumPy's vectorized operations are fundamental to its performance


and popularity in scientific computing. They allow you to apply
operations to entire arrays (vectors, matrices, etc.) without using
explicit loops, making the code more concise and significantly
faster due to underlying C-optimized implementations.

 What Are Vectorized Operations?


In NumPy, vectorized operations refer to operations that act
element-wise on entire arrays using simple syntax similar to scalar
operations.
For example:

import numpy as np

a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

c = a + b # Vectorized addition

This performs element-wise addition: [1+4, 2+5, 3+6] → array([5, 7, 9])

Benefits of Vectorized Operations


•Performance: Faster than Python loops due to optimized C backend
•Simplicity: Cleaner, more readable code
•Memory Efficiency: Operations work in-place (where possible)
Practical Example 1: Normalizing a Dataset
Problem: Given a 2D dataset, normalize each feature (column) to
have zero mean and unit variance.
import numpy as np

# Simulate a dataset: 5 samples, 3 features


data = np.array([[1.0, 200.0, 50.0],
[2.0, 210.0, 60.0],
[3.0, 190.0, 55.0],
[4.0, 205.0, 58.0],
[5.0, 195.0, 52.0]])

# Vectorized mean and std dev


mean = np.mean(data, axis=0)
std = np.std(data, axis=0)

# Vectorized normalization
normalized_data = (data - mean) / std

print("Normalized Data:\n", normalized_data)

This eliminates the need for manual iteration over each


Practical Example 2: Image Brightness Adjustment
Problem: Increase the brightness of a grayscale image represented as a 2D
NumPy array.

import numpy as np

# Simulated grayscale image (3x3)


image = np.array([[100, 120, 130],
[90, 110, 140],
[80, 105, 125]], dtype=np.uint8)

# Increase brightness by 30 (clip values at


255)
brightened = np.clip(image + 30, 0, 255)

print("Brightened Image:\n", brightened)

NumPy's vectorized addition and np.clip() ensure fast and safe transformation.
Two real-world examples—one with stock prices and one with
sensor data—both using NumPy vectorized operations.

Stock Prices – Daily Returns & Moving Average


Goal:
•Compute daily returns for a stock
•Calculate a 7-day moving average of the closing prices
Simulated Stock Price Data:
import numpy as np

# Simulated closing prices for 10 days


prices = np.array([150, 152, 153, 155, 157, 156, 158, 160, 162, 161])

# Daily returns: (today's price - yesterday's price) / yesterday's price


daily_returns = (prices[1:] - prices[:-1]) / prices[:-1]

print("Daily Returns:\n", daily_returns)

# 7-day moving average using convolution (vectorized!)


weights = np.ones(7) / 7
moving_avg = np.convolve(prices, weights, mode='valid')

print("7-Day Moving Average:\n", moving_avg)


Output:

Daily Returns:
[0.0133 0.0066 0.0131 0.0129 -0.0063 0.0128
0.0127 0.0125 -0.0061]
7-Day Moving Average:
[154.7143 155.8571 157.2857 158.5714]

This avoids loops entirely and uses NumPy’s convolve and slicing
features.
Sensor Data – Temperature Threshold Alerts
Goal:
•Identify temperature spikes above a certain threshold
•Normalize temperature values between 0 and 1 (min-max scaling)
Simulated Sensor Data:
# Hourly temperature readings from a sensor (°C)
temps = np.array([22.5, 23.0, 25.1, 30.0, 31.2, 28.7, 26.5, 24.3,
23.7, 22.9])

# Threshold detection
threshold = 29.0
alerts = temps > threshold

print("Threshold Alerts:\n", alerts)

# Normalize using min-max scaling


min_temp = temps.min()
max_temp = temps.max()
normalized_temps = (temps - min_temp) / (max_temp - min_temp)

print("Normalized Temperatures:\n", normalized_temps)


Output:

Threshold Alerts:
[False False False True True False False False False False]
Normalized Temperatures:
[0. 0.06 0.52 0.93 1. 0.76 0.58 0.32 0.26 0.04]

These are real-time operations that could feed into an alert system
or a dashboard—done efficiently with NumPy.

Summary of Vectorized Ops Used:

Task Vectorized Operation


(prices[1:] - prices[:-1]) /
Daily return
prices[:-1]
Moving average np.convolve()
Thresholding temps > threshold
Normalization (array - min) / (max - min)
Pandas advanced indexing and
groupby
Pandas provides powerful tools for data manipulation, with advanced indexing
and the groupby method being central to its functionality

Advanced Indexing in Pandas


1. Hierarchical (Multi-Level) Indexing
Multi-level indexing allows you to work with higher-dimensional data in a lower-
dimensional DataFrame. This is particularly useful for representing data with
multiple dimensions, such as time series data across different categories.

Example:
import pandas as pd
import numpy as np

# Creating a multi-level index


arrays = [['New York', 'New York', 'Los Angeles', 'Los Angeles'],
['Q1', 'Q2', 'Q1', 'Q2']]
index = pd.MultiIndex.from_arrays(arrays, names=('City',
'Quarter'))

# Creating the DataFrame


df = pd.DataFrame({
'Sales': np.random.randint(100, 500, 4),
'Profit': np.random.rand(4)
}, index=index)

print(df)
This will output a DataFrame with a multi-level index, where each row is
uniquely identified by a combination of 'City' and 'Quarter'.
Accessing Data:
•To access data for a specific city and quarter:

df.loc[('New York', 'Q1')]

• To access all data for a specific city:

df.loc['New York']

• To access data for all cities in a specific quarter:

df.xs('Q1', level='Quarter')
Conditional Indexing
Conditional indexing allows you to filter data based on specific
conditions.
Example:
# Filtering rows where Sales are greater than 300
df_filtered = df[df['Sales'] > 300]
print(df_filtered)

This will return all rows where the 'Sales' value is greater than 300.

Grouping Data with groupby


The groupby method in Pandas is used to split the data into groups based
on some criteria, apply a function to each group independently, and then
combine the results back into a DataFrame.

1. Basic Grouping
You can group data by one or more columns and perform aggregate functions like
sum, mean, or count.
Example:
# Grouping by 'City' and calculating the mean of 'Sales' and
'Profit'
df_grouped = df.groupby('City').agg({
'Sales': 'mean',
'Profit': 'mean'
})
print(df_grouped)
This will return the mean 'Sales' and 'Profit' for each city.

2. Grouping with Multiple Keys


You can group by multiple columns to perform more granular
analysis.
Example:
# Grouping by 'City' and 'Quarter' and calculating the
sum of 'Sales'
df_grouped = df.groupby(['City', 'Quarter'])['Sales'].sum()
print(df_grouped)

This will return the total 'Sales' for each combination of 'City' and
'Quarter'.
3. Applying Custom Functions
You can apply custom functions to each group using the apply method.
Example:
# Defining a custom function to calculate the range (max -
min)
def calc_range(group):
return group.max() - group.min()

# Applying the custom function to 'Sales' grouped by 'City'


df_range = df.groupby('City')['Sales'].apply(calc_range)
print(df_range)
This will return the range of 'Sales' for each city.

4. Grouping with Expressions


You can group by expressions or transformations of the data.
Example:
# Grouping by the sum of 'Sales' and 'Profit' and calculating the
mean of 'Sales'
df_grouped = df.groupby(df['Sales'] + df['Profit'])['Sales'].mean()
print(df_grouped)

This will group the data by the sum of 'Sales' and 'Profit' and return
the mean 'Sales' for each group.
5. Advanced Aggregations
You can perform multiple aggregations at once using the agg method.
Example:

# Grouping by 'City' and performing multiple aggregations on


'Sales'
df_agg = df.groupby('City')['Sales'].agg(['sum', 'mean', 'std'])
print(df_agg)
This will return the sum, mean, and standard deviation of 'Sales' for
each city.

Real-World Example: Sales Analysis


Let's consider a real-world scenario where we analyze sales
data across different stores and products.
Example:
# Sample sales data
data = {
'Store': ['Store A', 'Store A', 'Store B', 'Store B', 'Store A', 'Store B'],
'Product': ['A', 'B', 'A', 'B', 'C', 'C'],
'Revenue': [100, 200, 150, 250, 300, 350],
'Units Sold': [10, 20, 15, 25, 30, 35]
}

df_sales = pd.DataFrame(data)

# Grouping by 'Store' and 'Product' and calculating total revenue and units
sold
df_sales_grouped = df_sales.groupby(['Store', 'Product']).agg({
'Revenue': 'sum',
'Units Sold': 'sum'
}).reset_index()

print(df_sales_grouped)
This will return a DataFrame showing the total 'Revenue' and 'Units Sold'
for each combination of 'Store' and 'Product'.
Data cleaning for AI input pipelines
Data cleaning is a crucial step in preparing data for AI input
pipelines. It ensures that the data is accurate, consistent, and
formatted correctly, which is essential for building effective AI
models.
Key Data Cleaning Techniques

1. Handling Missing Values


Missing data can lead to biased or inaccurate models. Depending on
the context, you can either fill in missing values or remove them.

2. Removing Duplicates
•Identifying and eliminating duplicate records to prevent bias in
model training.
3. Standardizing Formats
•Converting text data to a consistent case (e.g., lowercase).
•Stripping leading/trailing spaces.
•Normalizing date formats.

4. Handling Outliers
•Detecting anomalies using statistical methods or domain knowledge.
•Deciding whether to correct, remove, or retain outliers based on their impact.

5. Data Transformation
•Scaling numerical values.
•Encoding categorical variables.
•Feature extraction and engineering.

6. Automated Cleaning Pipelines


•Utilizing tools like Pandas in Python to automate repetitive cleaning tasks.
• Implementing modular pipelines for scalability and reusability
Examples
1. Retail Sales Data Pipeline
Scenario: A retail company collects daily sales data, including product
names, categories, units sold, and revenue.

Challenges:
•Inconsistent product names (e.g., "Product A" vs. "product a").
•Missing revenue entries.
•Duplicate records for the same transaction.

Cleaning Steps:
•Standardize product names to lowercase.
•Impute missing revenue using the mean of the category.
•Remove duplicate entries based on transaction ID.

Outcome: A clean dataset ready for analysis, enabling accurate sales


forecasting and inventory management.
2. Healthcare Patient Data Pipeline
Scenario: A healthcare provider maintains a database of patient
information, including names, addresses, and medical histories.

Challenges:
•Inconsistent formatting of addresses.
•Missing patient names.
•Duplicate records due to data entry errors.

Cleaning Steps:
•Use regular expressions to extract and standardize address
components.
•Impute missing names using available information or flag for
review.
•Merge duplicate records based on patient ID.

Outcome: A unified and standardized patient dataset, improving


the quality of medical analytics and patient care.

Implementing robust data cleaning practices is essential for


developing effective AI models. By addressing data quality issues
File I/O & Model Persistence
 File I/O & Model Persistence" refers to the ability of a program—often
in the context of machine learning or data science—to:
1. File I/O (Input/Output): Read from and write data to files on disk.
2. Model Persistence: Save (serialize) and load (deserialize) trained
models for later use without retraining.
 1. File I/O (Input/Output)
This involves working with files to store or retrieve data.
Python Examples:
• Reading a file:
with open('data.txt', 'r') as file:
data = file.read()
Writing to a file:
with open('output.txt', 'w') as file:
file.write("Hello, world!")
1. CSV (Comma-Separated Values)
Use Case:
Used for storing tabular data like spreadsheets or databases.
Real-world Example:
•Exporting Excel data for data analysis
•Uploading financial transaction records to a system

Features:

Feature Details
Format Text (plain text, human-readable)
Structure Rows and columns (no nesting)
Language Support Universally supported
File Size Moderate
Speed Slower to read/write for large data
import pandas as pd

# Save DataFrame to CSV


df = pd.DataFrame({'Name': ['Alice', 'Bob'], 'Age': [25, 30]})
df.to_csv('data.csv', index=False)

# Read from CSV


df_loaded = pd.read_csv('data.csv')

2. JSON (JavaScript Object Notation)

Best for structured data with nesting, such as configuration files


or API data.

Real-world Example:
•Storing settings in web applications
•Transmitting data between frontend and backend systems
Features:

Feature Details
Format Text (human-readable, hierarchical)
Key-value pairs (can nest
Structure
objects/lists)
Language Support Wide support
File Size Moderate
Speed Slower for large files

import json

# Save to JSON
data = {'name': 'Alice', 'age': 25, 'skills': ['Python', 'ML']}
with open('data.json', 'w') as f:
json.dump(data, f)

# Read from JSON


with open('data.json', 'r') as f:
data_loaded = json.load(f)
3. Pickle (Python Object Serialization)
Used for saving Python objects (models, lists, dicts, custom
classes) for reuse.

Real-world Example:
•Saving trained machine learning models
•Storing intermediate computation results

Features:
Feature Details
Format Binary (Python-specific)
Structure Any Python object
Language Support Python-only
File Size Efficient
Speed Fast serialization/deserialization
import pickle

# Save Python object


model = {'model': 'RandomForest', 'accuracy': 0.92}
with open('model.pkl', 'wb') as f:
pickle.dump(model, f)

# Load Python object


with open('model.pkl', 'rb') as f:
loaded_model = pickle.load(f)

4. Joblib
Optimized for large NumPy arrays and scikit-learn models.

Real-world Example:
•Saving a trained scikit-learn pipeline
•Caching computationally expensive functions
Features:
Feature Details
Format Binary (optimized for performance)
Structure Large data, NumPy arrays
Language Support Python-only
File Size Compressed and efficient
Faster than Pickle for large
Speed
numerical data

from sklearn.ensemble import RandomForestClassifier


from joblib import dump, load

# Train model
model = RandomForestClassifier().fit([[0, 0], [1, 1]], [0, 1])

# Save model
dump(model, 'rf_model.joblib')

# Load model
loaded_model = load('rf_model.joblib')
HDF5 basics with h5py
HDF5 is a versatile file format designed to store large, complex datasets efficiently.
The h5py library provides a Pythonic interface to HDF5, allowing seamless
integration with NumPy arrays and enabling the creation, manipulation,
and querying of HDF5 files.

1. Installation
To install h5py, you can use either pip or conda:
pip install h5py
# or
conda install h5py

Ensure you also have numpy installed, as it's commonly used alongside h5py.
2. Basic Operations
📂 Opening a File
To open an HDF5 file, use:

import h5py

# Open an existing file in read mode


with h5py.File('myfile.h5', 'r') as f:
# Interact with the file

he with statement ensures the file is properly closed after operations.


Creating a File and Adding Data
To create a new file and add datasets:

import h5py
import numpy as np

# Create a new file and add datasets


with h5py.File('newfile.h5', 'w') as f:
data1 = np.random.random((100, 100))
data2 = np.random.random((200, 200))
f.create_dataset('dataset1', data=data1)
f.create_dataset('dataset2', data=data2)

This code creates a file named newfile.h5 and adds two datasets, dataset1 and
dataset2, each containing random data.
Groups and Hierarchy
HDF5 files support a hierarchical structure similar to directories. You
can create groups within the file:

with h5py.File('myfile.h5', 'a') as f:


group = f.create_group('group1')
group.create_dataset('data',
data=np.arange(10))

his creates a group named group1 and adds a dataset data within it

Accessing Data
To read data from a dataset:
with h5py.File('myfile.h5', 'r') as f:
data = f['group1/data'][:]
print(data)

This retrieves the entire dataset stored under group1/data.


Attributes
You can add metadata to datasets or groups using attributes:

with h5py.File('myfile.h5', 'a') as f:


f['group1/data'].attrs['description'] =
'Sample data'
print(f['group1/data'].attrs['description'])

This adds a description attribute to the dataset data and prints its value.

Example: Storing Satellite Imagery for Ship Detection


In a practical scenario, consider a dataset containing satellite
images labeled as either "ship" or "no-ship." This dataset comprises
4,000 images, with 1,000 labeled as "ship" and 3,000 as "no-ship."
Each image is resized to 80x80 pixels with 3 color channels (RGB),
resulting in a shape of (80, 80, 3) for each image.

Instead of storing each image as a separate file, which can be cumbersome and
inefficient, we can organize them into an HDF5 file using h5py. This approach
allows for efficient storage and easy access.
Organizing Data in HDF5
We create an HDF5 file with two main groups: ships and no_ships. Each group
contains a dataset named data, which stores the images as NumPy arrays.
Additionally, we can store metadata such as image dimensions and labels as
attributes.
Code Example
import h5py
import numpy as np

# Generate dummy data


ships_images = np.random.random((1000, 80, 80, 3))
no_ships_images = np.random.random((3000, 80, 80, 3))

# Create HDF5 file


with h5py.File('satellite_images.h5', 'w') as f:
# Create groups for ships and no_ships
ships_group = f.create_group('ships')
no_ships_group = f.create_group('no_ships')

# Create datasets for images


ships_group.create_dataset('data', data=ships_images)
no_ships_group.create_dataset('data', data=no_ships_images)

# Add attributes
ships_group.attrs['label'] = 'ship'
no_ships_group.attrs['label'] = 'no-ship'
ships_group.attrs['image_shape'] = ships_images.shape
no_ships_group.attrs['image_shape'] = no_ships_images.shape
Accessing Data
To access the data, we can open the HDF5 file and
retrieve the images and their metadata:
with h5py.File('satellite_images.h5', 'r') as f:
ships_data = f['ships/data'][:]
no_ships_data = f['no_ships/data'][:]

ships_label = f['ships'].attrs['label']
no_ships_label = f['no_ships'].attrs['label']

ships_shape = f['ships'].attrs['image_shape']
no_ships_shape = f['no_ships'].attrs['image_shape']

print(f"Ships label: {ships_label}, Shape: {ships_shape}")


print(f"No-ships label: {no_ships_label}, Shape:
{no_ships_shape}")
Benefits
•Efficient Storage: HDF5 allows for the storage of large datasets in a
compact binary format, reducing disk space usage.
•Fast Access: Data can be accessed quickly without loading the entire
dataset into memory.
•Organized Structure: The hierarchical structure of HDF5 files enables
logical organization of data.
•Metadata Support: Attributes can store metadata, providing context to
the data.
This approach is particularly beneficial in machine learning workflows,

You might also like