AI Module4
AI Module4
• One variable may be independent (predictor) and the other dependent (response).
Analysis techniques:
• Scatter plots
Real-life example:
In an insurance dataset, Age vs Premium Amount is a bivariate relationship. As age
increases, premium may also increase.
Multivariate Data
• Example: Predicting house price using features like area, location, number of rooms,
and age of the building.
• Multivariate statistics
• Dimensionality reduction
Multivariate Statistics
Definition
Multivariate statistics involve the observation and analysis of more than two statistical
outcomes (variables) at the same time. These techniques are used when the problem
involves multiple dependent and/or independent variables.
Summarizes multiple variables using central tendency and dispersion measures for each
variable, and their relationships (e.g., correlation matrix).
2. Correlation Matrix
• 0: No linear relationship
Example:
In a dataset with height, weight, and age, the correlation matrix helps identify which
variables are strongly related.
4. Multivariate Regression
• Matrix: A 2D structure of numbers with rows and columns. In ML, a matrix is often
used to store data where:
Notation:
• A matrix with m rows and n columns is called an m × n matrix.
Use in ML:
• Performing transformations
• Covariance Matrix is a square matrix giving the covariance between each pair of
variables in a multivariate dataset.
Example:
For features like age, income, and spending score, the covariance matrix can help determine
if higher income is associated with higher spending.
• The first principal component is the direction with the highest variance.
5. Distance Measures
• Manhattan Distance: Distance measured along axes at right angles (like grid
streets).
d=∣x1−x2∣+∣y1−y2∣d = |x_1 - x_2| + |y_1 - y_2|d=∣x1−x2∣+∣y1−y2∣
These metrics are used in classification (k-NN), anomaly detection, and clustering.
These are essential for ML algorithms that rely on distance (e.g., k-NN, SVM).
Conclusion
Mathematics forms the backbone of machine learning and data science. For multivariate
data, understanding vector spaces, distance metrics, matrix transformations, and statistics
is essential to perform accurate analysis, build efficient models, and interpret the results
correctly.
This section bridges the gap between data and algorithms, focusing on the process of
learning and how we can evaluate it theoretically.
• In supervised learning, the hypothesis maps feature vectors (x) to output labels (y).
• The hypothesis is chosen from a set of candidate functions, called the hypothesis
space (H).
Example:
In a spam detection model, a hypothesis could be:
“If an email contains the word ‘lottery’, classify it as spam.”
Input:
Output:
This process is guided by an inductive bias, which is the set of assumptions a learner uses
to predict outputs for unseen inputs.
3. Concept of Target Function and Hypothesis Space
• Target function (f): The ideal function the learner is trying to approximate.
4. Types of Errors
• Overfitting: Hypothesis too complex → low training error, high test error.
• ERM is the principle where we choose a hypothesis that minimizes the error on the
training data.
Real-World Example
Conclusion
Hypothesis formulation and learning theory form the backbone of machine learning. They
offer a framework to understand what the model is learning, how well it is performing, and
how it will behave with new data. Without learning theory, ML would lack generalization
guarantees and reliability.
Definition
Feature engineering is the process of selecting, modifying, or creating new input variables
(features) to improve the performance of a machine learning model. It is considered one of
the most critical steps in the ML pipeline.
1. Feature Creation
Real-world example:
In credit scoring, new features like “credit utilization ratio” can be engineered using credit
limit and outstanding balance—these often perform better than raw inputs.
Definition
Dimensionality reduction is the process of reducing the number of input variables in a
dataset while retaining as much relevant information as possible. It simplifies models and
reduces computation time.
• Unsupervised method.
Example:
Reducing a 20-feature dataset to 3 principal components while preserving 95% of the data
variance.
Conclusion
Feature engineering and dimensionality reduction are essential steps in preparing data for
machine learning. Thoughtful feature creation can enhance model accuracy, while
dimensionality reduction simplifies data without significant loss of information. Together,
they ensure better model performance and interpretability
Definition
k-NN is a non-parametric, instance-based learning algorithm. It classifies a new data point
based on the majority label of its 'k' closest neighbors in the training data.
4. Assign the class label that is most frequent among these k neighbors.
Visualization Example
• Based on which class has the majority in the closest neighbors, the red point will be
classified
Distance Metrics
Distance metric plays a major role in deciding which points are "nearest."
Weighted k-NN
Example:
If two neighbors are 0.5 and 1.0 units away, the closer one will have a weight of 2 while the
farther one has 1.
This technique helps when neighboring classes are mixed, but proximity still matters.
Applications of k-NN
• Recommendation systems
• Fraud detection
• Medical diagnosis (e.g., k-NN used for cancer detection based on gene expression)
Advantages
Disadvantages
Conclusion
Similarity-based learning methods like k-NN are powerful tools when class boundaries are
not complex. Weighted versions improve robustness by giving more importance to closer
neighbors. Though simple, these algorithms serve as strong baselines and are still widely
used across various domains.
Types of Learning
a) Supervised Learning
b) Unsupervised Learning
c) Semi-supervised Learning
d) Reinforcement Learning
Key Concepts
b) Consistency
• A framework where learning is feasible if the algorithm can learn a function that is
approximately correct with high probability.
1. Choosing a model: Decide algorithm (e.g., decision tree, SVM, neural network)
2. Selecting features: What input variables to use
Diagram (conceptual):
• The goal is to learn a concept that maps inputs to YES/NO (positive or negative
class).
Example:
Learning the concept “fruit is an apple” based on attributes like color, shape, and taste.
Famous Algorithm:
• Find-S Algorithm: Starts with most specific hypothesis and generalizes as needed.
Conclusion
Basics of learning theory provide foundational understanding for how learning algorithms
behave, how they generalize, and what makes learning computationally feasible. These
concepts form the backbone of theoretical ML and guide practical implementations.