0% found this document useful (0 votes)
12 views5 pages

AIML105

The document serves as course material for AIML105 exam preparation, covering key concepts in machine learning such as overfitting, underfitting, feature scaling, classification, regression, clustering, convolutional neural networks, and hyperparameter optimization. It includes practical examples, additional context, and practice questions with answers to reinforce understanding. The material emphasizes the importance of model performance evaluation techniques like cross-validation and data augmentation.

Uploaded by

Shinchan Nohara
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views5 pages

AIML105

The document serves as course material for AIML105 exam preparation, covering key concepts in machine learning such as overfitting, underfitting, feature scaling, classification, regression, clustering, convolutional neural networks, and hyperparameter optimization. It includes practical examples, additional context, and practice questions with answers to reinforce understanding. The material emphasizes the importance of model performance evaluation techniques like cross-validation and data augmentation.

Uploaded by

Shinchan Nohara
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Course Material for AIML105 Exam Preparation

This course material includes expanded examples, additional context, and deeper
insights to ensure comprehensive preparation for the AIML105 examination.

Module I: Introduction to Machine Learning

What is Machine Learning?

Machine Learning (ML) is a branch of Artificial Intelligence that allows systems to learn
and make decisions without explicit programming. It is achieved by training algorithms
on data to identify patterns and make predictions.

Real-Life Example: Spam email detection uses ML algorithms like Naive Bayes to
classify emails as spam or legitimate based on patterns in the content, such as specific
keywords or email structure.

Additional Context: ML is also used in personalized recommendations, such as Netflix


suggesting shows based on a user’s viewing history. It involves supervised,
unsupervised, and reinforcement learning approaches depending on the type of task
and data availability.

Overfitting and Underfitting

• Overfitting: When a model learns both the data and noise, leading to excellent
training performance but poor generalization on new data.

• Underfitting: When a model is too simplistic to capture the underlying structure


of the data, resulting in poor performance on both training and test data.

Preventing Overfitting:

1. Use regularization techniques (e.g., L1, L2 penalties).

2. Employ cross-validation methods to validate model performance.

3. Reduce the model’s complexity or perform feature selection.

Example: Imagine training a house price prediction model with square footage as the
only feature. An overfitted model might predict perfectly for training data but fail on
unseen properties.

Additional Context: Regularization penalizes overly complex models by adding a term


to the cost function, discouraging reliance on large coefficients. Techniques like
dropout in neural networks also reduce overfitting by randomly deactivating neurons
during training.

Feature Scaling

Feature scaling ensures that numerical data is on a similar scale, preventing large
values from dominating small ones during model training.

• Min-Max Scaling: Rescales features to a fixed range, often [0, 1].


• Standardization: Centers features around a mean of 0 and standard deviation
of 1.

Example: In a dataset with weight (in kg) and height (in cm), scaling ensures both
features contribute equally to a BMI prediction model.

Additional Context: Without scaling, features with larger magnitudes might


disproportionately affect model training, particularly for distance-based algorithms
like K-Nearest Neighbors (KNN) or Principal Component Analysis (PCA).

Module II: Machine Learning: Classification, Regression & Clustering

Classification vs Regression

• Classification: Assigns data points to categories. Examples include spam


detection and medical diagnosis (e.g., disease vs no disease).

• Regression: Predicts continuous outcomes, like predicting house prices based


on size, location, and number of bedrooms.

Additional Context: Classification can extend beyond binary (e.g., multi-class


classification in handwritten digit recognition). Regression can also model time-series
data, like stock price prediction.

Logistic Regression

A classification algorithm that predicts probabilities of categorical outcomes using the


sigmoid function.

Real-Life Example: Predicting customer churn (yes or no) based on behaviour metrics
like time spent on the platform and number of purchases.

Additional Context: Logistic regression is widely used due to its simplicity and
interpretability. It works well for binary outcomes but can be extended to multi-class
problems using techniques like One-vs-Rest (OvR).

K-Means Clustering

An unsupervised algorithm that groups data into ‘k’ clusters by minimizing the intra-
cluster distance.

Steps:

1. Initialize ‘k’ cluster centroids randomly.

2. Assign each data point to the nearest centroid.

3. Recalculate centroids based on cluster memberships.

4. Repeat until centroids stabilize.

Real-life Example: Segmenting customers based on purchasing patterns to create


targeted marketing strategies.
Additional Context: Choosing the optimal number of clusters can be challenging and
is often determined using the elbow method, where the sum of squared distances
within clusters is plotted against the number of clusters.

Cross-Validation

This technique evaluates model performance by splitting data into training and
validation subsets multiple times to ensure the model generalizes well.

Example: In a 5-fold cross-validation, data is split into 5 parts. Each part is used as
validation data while the others train the model, cycling through all parts.

Additional Context: Cross-validation helps prevent data leakage and ensures that the
model’s performance is not skewed by a particular split of training and testing data.

Module III: Introduction to Convolutional Neural Networks (CNNs)

Pooling Layers in CNNs

Pooling layers downsample feature maps to reduce spatial dimensions and


computation, retaining essential information.

• Max Pooling: Retains the maximum value in a pooling window.

• Average Pooling: Computes the average of values in a pooling window.

Example: In facial recognition, pooling identifies critical features like eyes or nose,
ignoring unnecessary details like background.

Additional Context: Modern architectures like ResNet often use global average
pooling to reduce feature maps to a single value per channel before classification.

Data Augmentation

Data augmentation increases training data by applying transformations like flipping,


rotation, or zooming to prevent overfitting and improve generalization.

Example: Generating variations of a dog’s photo by rotating it ensures the model


recognizes the dog regardless of orientation.

Additional Context: Advanced augmentation techniques like Mixup or CutMix


combine multiple images to generate more diverse training samples, improving
robustness in neural networks.

Module IV: Model Parameters and Optimization

Hyperparameter Optimization

Hyperparameters are settings not learned by the model (e.g., learning rate, number
of layers). Optimization ensures the best model performance.

• Grid Search: Tests all combinations of hyperparameters systematically.


• Random Search: Samples random combinations of hyperparameters within
defined ranges.

• Bayesian Optimization: Uses probabilistic models to predict the best


hyperparameters based on past evaluations.

Example: Finding the optimal learning rate and batch size for training a deep neural
network.

Additional Context: Early stopping can also be used to halt training when
performance on a validation set stops improving, saving computational resources.

Optimization Algorithms

• Stochastic Gradient Descent (SGD): Updates weights using a small, random


subset of data points, reducing computation.

• Adam Optimizer: Combines SGD with adaptive learning rates for faster
convergence.

Comparison:

• SGD is simple but sensitive to hyperparameters.

• Adam converges faster and handles sparse gradients better.

Additional Context: Optimizers like RMSprop are particularly useful for handling non-
stationary objectives, such as those encountered in reinforcement learning.

Practice Questions with Answers

1. Explain overfitting and underfitting with examples. How can they be prevented?

Answer: Overfitting occurs when a model captures noise along with the signal,
leading to excellent performance on training data but poor generalization to new
data. Example: A model that memorizes exact data points in a training dataset. To
prevent overfitting, use regularization, cross-validation, or data augmentation.

Underfitting happens when a model is too simple, failing to capture the underlying
data patterns. Example: A linear regression model applied to non-linear data.
Solutions include increasing model complexity or training longer.

2. Describe the k-means clustering algorithm. Provide a real-world example of its


application.

Answer: K-means clustering is an unsupervised algorithm that divides data into ‘k’
clusters by iteratively updating centroids and assigning data points based on
proximity. Real-world example: Segmenting customers by purchasing patterns in e-
commerce to target specific marketing campaigns.

Additional Context: Applications of k-means extend to anomaly detection, where


outliers are identified as poorly fitting to any cluster.
3. What is data augmentation, and why is it important in CNNs? Provide practical
use cases.

Answer: Data augmentation artificially increases the size of a dataset by applying


transformations like rotations, flips, and zooms. This technique improves model
generalization by introducing variability. Practical use: Enhancing medical image
datasets by flipping or rotating X-rays to train robust CNN models for diagnosis.

Additional Context: In autonomous vehicles, augmented datasets improve robustness


against different lighting or weather conditions.

4. Compare grid search and Bayesian optimization for hyperparameter tuning.


Which is more efficient for complex models?

Answer: Grid search exhaustively tests all combinations of hyperparameters, which is


computationally expensive but guarantees finding the best configuration within a
defined grid. Bayesian optimization uses probabilistic models to predict optimal
hyperparameters efficiently, making it more suitable for complex models where
computation is a concern.

Additional Context: Random search is a simpler alternative for scenarios with a limited
computational budget, often yielding comparable results to grid search.

5. How does anomaly detection work in unsupervised learning? Give examples


in cybersecurity.

Answer: Anomaly detection identifies patterns or behaviours that deviate from the
norm. In unsupervised learning, clustering algorithms like DBSCAN or statistical
methods detect outliers without labelled data. Example: Detecting unusual login
patterns or excessive file access in cybersecurity to identify potential breaches.

Additional Context: Deep learning methods like autoencoders can learn compressed
representations of normal data, flagging deviations as anomalies.

You might also like