0% found this document useful (0 votes)
0 views17 pages

AI Module4

The document covers the concepts of bivariate and multivariate data, emphasizing their analysis techniques and real-life applications. It explains essential mathematical foundations for multivariate data, including vectors, matrices, covariance, and distance measures, which are crucial for machine learning. Additionally, it discusses hypothesis formulation, learning theory, feature engineering, dimensionality reduction, and similarity-based learning methods like k-NN.

Uploaded by

deekshagowdam22
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
0 views17 pages

AI Module4

The document covers the concepts of bivariate and multivariate data, emphasizing their analysis techniques and real-life applications. It explains essential mathematical foundations for multivariate data, including vectors, matrices, covariance, and distance measures, which are crucial for machine learning. Additionally, it discusses hypothesis formulation, learning theory, feature engineering, dimensionality reduction, and similarity-based learning methods like k-NN.

Uploaded by

deekshagowdam22
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Module 4: Understanding Data, Basics of

Learning Theory, Similarity Based Learning


Bivariate and Multivariate Data
Bivariate Data

• Involves two variables.

• The goal is to study the relationship between them.

• One variable may be independent (predictor) and the other dependent (response).

• Example: Hours Studied (X) vs Exam Score (Y).

Analysis techniques:

• Scatter plots

• Correlation coefficients (e.g., Pearson’s r)

• Simple linear regression

Real-life example:
In an insurance dataset, Age vs Premium Amount is a bivariate relationship. As age
increases, premium may also increase.

Multivariate Data

• Involves more than two variables simultaneously.

• Used to understand interactions and correlations among multiple attributes.

• Requires more complex mathematical tools to analyze.

• Example: Predicting house price using features like area, location, number of rooms,
and age of the building.

Common analysis methods:

• Multivariate statistics

• Dimensionality reduction

• Machine learning models (multiple regression, decision trees)


Real-life example:
In health analytics, patient diagnosis can depend on multiple features like blood pressure,
age, heart rate, and cholesterol level.

Multivariate Statistics
Definition
Multivariate statistics involve the observation and analysis of more than two statistical
outcomes (variables) at the same time. These techniques are used when the problem
involves multiple dependent and/or independent variables.

Multivariate analysis allows us to study relationships among variables, reduce dimensions,


and detect underlying patterns.

Common Multivariate Statistical Methods

1. Multivariate Descriptive Statistics

Summarizes multiple variables using central tendency and dispersion measures for each
variable, and their relationships (e.g., correlation matrix).

2. Correlation Matrix

A table showing correlation coefficients between variables.

• Values range from -1 to +1.

• +1: Perfect positive correlation

• -1: Perfect negative correlation

• 0: No linear relationship

Example:
In a dataset with height, weight, and age, the correlation matrix helps identify which
variables are strongly related.

3. Multivariate Normal Distribution

• Generalization of the normal distribution to multiple variables.

• Useful in probabilistic learning and classification tasks.

4. Multivariate Regression

• Extension of linear regression where multiple independent variables are used to


predict one dependent variable.
• Example: Predicting salary based on age, education level, and experience.

5. Principal Component Analysis (PCA) (covered in dimensionality reduction)

• A technique used to reduce dimensionality while preserving variance.

Why Multivariate Statistics Matter in Machine Learning

• Helps understand how features relate to one another.

• Assists in selecting relevant variables (feature selection).

• Foundation for many ML techniques (clustering, PCA, regression models).

• Identifies redundancies and dependencies in data.

Essential Mathematics for Multivariate Data


Introduction
Multivariate data consists of datasets with more than two variables. To analyze such data
and build machine learning models, several mathematical foundations are essential. These
include concepts from linear algebra, statistics, and calculus, which help in data
representation, transformation, and learning.

Understanding these foundational topics is necessary for techniques like regression,


classification, PCA, and neural networks.

Key Mathematical Concepts

1. Vectors and Matrices

• Vector: An ordered list of numbers, usually representing a feature or observation.

o Example: A student's test scores in three subjects → [80, 75, 90] is a 3-


dimensional vector.

• Matrix: A 2D structure of numbers with rows and columns. In ML, a matrix is often
used to store data where:

o Rows → observations (examples, records)

o Columns → features (variables, attributes)

Notation:
• A matrix with m rows and n columns is called an m × n matrix.

• Common operations: addition, scalar multiplication, dot product, transpose.

2. Matrix Operations Used in ML

• Transpose (Aᵀ): Switches rows with columns.

• Dot Product (A·B): Measures similarity between vectors.

• Matrix Multiplication (AB): Combines features and weights in ML models (e.g.,


linear regression).

• Inverse (A⁻¹): Used in solving systems of equations.

Use in ML:

• Calculating weighted sums

• Performing transformations

• Defining model parameters in linear models (e.g., y = Xβ + ε)

3. Covariance and Covariance Matrix

• Covariance indicates how much two variables vary together.

o Positive covariance → variables increase together

o Negative covariance → one increases, the other decreases

• Covariance Matrix is a square matrix giving the covariance between each pair of
variables in a multivariate dataset.

Example:
For features like age, income, and spending score, the covariance matrix can help determine
if higher income is associated with higher spending.

4. Eigenvalues and Eigenvectors

• Key tools in dimensionality reduction (PCA).

• Eigenvectors define directions in which data varies.

• Eigenvalues indicate how much variance is explained by each eigenvector.


In PCA:

• The first principal component is the direction with the highest variance.

5. Distance Measures

Understanding distances is fundamental in clustering and k-NN algorithms.

• Euclidean Distance: Straight-line distance between two points in space.


d=(x1−x2)2+(y1−y2)2d = \sqrt{(x_1 - x_2)^2 + (y_1 - y_2)^2}d=(x1−x2)2+(y1−y2)2

• Manhattan Distance: Distance measured along axes at right angles (like grid
streets).
d=∣x1−x2∣+∣y1−y2∣d = |x_1 - x_2| + |y_1 - y_2|d=∣x1−x2∣+∣y1−y2∣

These metrics are used in classification (k-NN), anomaly detection, and clustering.

6. Standardization and Normalization

• Standardization: Rescales data to have a mean of 0 and standard deviation of 1.


z=x−μσz = \frac{x - \mu}{\sigma}z=σx−μ

• Normalization: Scales data into a fixed range, typically [0, 1].

These are essential for ML algorithms that rely on distance (e.g., k-NN, SVM).

Why These Math Concepts Are Important in ML

• ML models represent data as vectors and matrices.

• Models learn by manipulating these structures mathematically.

• Understanding covariance helps with identifying feature relationships.

• Distance metrics are core to classification and clustering.

• Matrix algebra simplifies computation of model parameters.

Conclusion
Mathematics forms the backbone of machine learning and data science. For multivariate
data, understanding vector spaces, distance metrics, matrix transformations, and statistics
is essential to perform accurate analysis, build efficient models, and interpret the results
correctly.

Overview of Hypothesis and Learning Theory


Introduction
In machine learning, a hypothesis is a function that maps inputs to outputs based on
observed data. Learning theory provides the mathematical foundation for understanding
how well a machine can learn such a function from data. It helps us answer: How much can a
model generalize from seen data to unseen data?

This section bridges the gap between data and algorithms, focusing on the process of
learning and how we can evaluate it theoretically.

1. What is a Hypothesis in Machine Learning?

• A hypothesis (denoted as h) is a potential solution or function that approximates


the target concept (c).

• In supervised learning, the hypothesis maps feature vectors (x) to output labels (y).

• The hypothesis is chosen from a set of candidate functions, called the hypothesis
space (H).

Example:
In a spam detection model, a hypothesis could be:
“If an email contains the word ‘lottery’, classify it as spam.”

2. Learning Process Overview

The learning process in ML can be understood as follows:

Input:

• Training dataset D={(x1,y1),(x2,y2),...,(xn,yn)}D = \{(x_1, y_1), (x_2, y_2), ..., (x_n,


y_n)\}D={(x1,y1),(x2,y2),...,(xn,yn)}

• Hypothesis space HHH

Output:

• Best hypothesis h∈Hh \in Hh∈H that minimizes prediction error.

This process is guided by an inductive bias, which is the set of assumptions a learner uses
to predict outputs for unseen inputs.
3. Concept of Target Function and Hypothesis Space

• Target function (f): The ideal function the learner is trying to approximate.

• Hypothesis space (H): All possible functions the learner considers.

Goal: Find a hypothesis h ∈ H such that h ≈ f.

4. Types of Errors

• Training Error: Error on the training dataset.

• Generalization Error: Error on unseen (test) data.

Why errors occur:

• Underfitting: Hypothesis too simple → high training + test error.

• Overfitting: Hypothesis too complex → low training error, high test error.

5. Consistency and Convergence

• A hypothesis is said to be consistent if it correctly classifies all training examples.

• Convergence refers to how closely a hypothesis approaches the target function as


the number of training examples increases.

6. Empirical Risk Minimization (ERM)

• ERM is the principle where we choose a hypothesis that minimizes the error on the
training data.

• Formalized as minimizing the empirical risk:


Remp(h)=1n∑i=1nL(h(xi),yi)R_{emp}(h) = \frac{1}{n} \sum_{i=1}^{n} L(h(x_i),
y_i)Remp(h)=n1∑i=1nL(h(xi),yi)
Where L is the loss function (e.g., 0-1 loss, squared loss).

7. Importance of Learning Theory

• Helps determine the amount of data needed for good generalization.


• Provides theoretical tools to measure model performance.

• Guides the trade-off between complexity and performance (bias-variance trade-


off).

Real-World Example

In facial recognition, a hypothesis could be a model trained on thousands of images to


recognize a person.
Learning theory helps us understand how many images are needed, how accurate the model
will be, and how to avoid overfitting.

Conclusion
Hypothesis formulation and learning theory form the backbone of machine learning. They
offer a framework to understand what the model is learning, how well it is performing, and
how it will behave with new data. Without learning theory, ML would lack generalization
guarantees and reliability.

Feature Engineering and Dimensionality Reduction Techniques


Part 1: Feature Engineering

Definition
Feature engineering is the process of selecting, modifying, or creating new input variables
(features) to improve the performance of a machine learning model. It is considered one of
the most critical steps in the ML pipeline.

Why it's important:

• Good features → better learning and prediction

• Helps reduce model complexity

• Makes the model more interpretable

• Improves accuracy and generalization

Common Techniques in Feature Engineering

1. Feature Creation

• Creating new features from existing ones.

• Example: From a "Date of Birth" column, create "Age".


2. Feature Transformation

• Applying mathematical operations to improve distribution.

o Log, square root, or exponential transforms.

o Example: Applying log transformation to income to reduce skewness.

3. Encoding Categorical Variables

• Converting non-numeric data into numeric form.

o One-hot encoding: Each category becomes a separate binary feature.

o Label encoding: Assigns an integer to each category.

4. Feature Scaling (Normalization & Standardization)

• Scales features to a similar range for better model performance.

o Normalization: Scales between 0 and 1.

o Standardization: Scales data to mean = 0, std = 1.

5. Handling Missing Data

• Imputation techniques: replacing missing values using mean, median, or predictive


models.

Role of Feature Engineering in ML

• Allows algorithms to focus on patterns rather than noise.

• Reduces risk of overfitting.

• Enhances training speed and model simplicity.

Real-world example:
In credit scoring, new features like “credit utilization ratio” can be engineered using credit
limit and outstanding balance—these often perform better than raw inputs.

Part 2: Dimensionality Reduction

Definition
Dimensionality reduction is the process of reducing the number of input variables in a
dataset while retaining as much relevant information as possible. It simplifies models and
reduces computation time.

Why Reduce Dimensions?

• High-dimensional data increases computation and storage needs.

• Can cause overfitting due to sparsity (curse of dimensionality).

• Helps in data visualization and pattern recognition.

Popular Dimensionality Reduction Techniques

1. Principal Component Analysis (PCA)

• Converts correlated features into a set of linearly uncorrelated components


(principal components).

• First few components retain most of the variance.

• Unsupervised method.

Example:
Reducing a 20-feature dataset to 3 principal components while preserving 95% of the data
variance.

2. Linear Discriminant Analysis (LDA)

• Supervised technique that reduces dimensionality by maximizing class separability.

• Often used in classification tasks.

3. t-SNE (t-distributed Stochastic Neighbor Embedding)

• Used for visualization (usually 2D/3D) of high-dimensional data.

• Preserves local relationships in data.

Steps in PCA (Textbook-aligned)

1. Standardize the dataset.

2. Compute the covariance matrix.


3. Calculate eigenvectors and eigenvalues.

4. Select top k eigenvectors with largest eigenvalues.

5. Project data onto new subspace.

Benefits of Dimensionality Reduction

• Removes redundant and noisy features.

• Improves model efficiency and speed.

• Enhances generalization by eliminating overfitting.

Conclusion
Feature engineering and dimensionality reduction are essential steps in preparing data for
machine learning. Thoughtful feature creation can enhance model accuracy, while
dimensionality reduction simplifies data without significant loss of information. Together,
they ensure better model performance and interpretability

Similarity-Based Learning (Including k-NN and Weighted k-NN)


Introduction

Similarity-based learning is a class of machine learning methods that make predictions


based on the similarity between new input data and existing examples. These methods do
not build explicit models during training. Instead, they store the data and perform
computations during inference (also known as lazy learning).

The most widely used similarity-based algorithm is k-Nearest Neighbors (k-NN).

k-Nearest Neighbors (k-NN)

Definition
k-NN is a non-parametric, instance-based learning algorithm. It classifies a new data point
based on the majority label of its 'k' closest neighbors in the training data.

Working of k-NN Algorithm

1. Choose the number of neighbors k.


2. Calculate the distance between the new data point and all training data points
(commonly using Euclidean distance).

3. Select the k data points closest to the new point.

4. Assign the class label that is most frequent among these k neighbors.

Visualization Example

• Blue points → Class 0

• Green points → Class 1

• Red “X” → New point to classify

• Based on which class has the majority in the closest neighbors, the red point will be
classified

Distance Metrics

• Euclidean Distance (most common):


d=(x1−x2)2+(y1−y2)2d = \sqrt{(x_1 - x_2)^2 + (y_1 - y_2)^2}d=(x1−x2)2+(y1−y2)2
• Manhattan Distance:
d=∣x1−x2∣+∣y1−y2∣d = |x_1 - x_2| + |y_1 - y_2|d=∣x1−x2∣+∣y1−y2∣

Distance metric plays a major role in deciding which points are "nearest."

Choosing the Right k

• If k is too small → Model is sensitive to noise (overfitting).

• If k is too large → Model becomes too generalized (underfitting).

• Usually odd values are used for k to avoid ties.

Weighted k-NN

Improvement over k-NN:


In weighted k-NN, closer neighbors have more influence on the classification than farther
ones.

• A common weight is inverse of distance:


w=1dw = \frac{1}{d}w=d1

Example:
If two neighbors are 0.5 and 1.0 units away, the closer one will have a weight of 2 while the
farther one has 1.

This technique helps when neighboring classes are mixed, but proximity still matters.

Applications of k-NN

• Image and handwriting recognition

• Recommendation systems

• Fraud detection

• Medical diagnosis (e.g., k-NN used for cancer detection based on gene expression)

Advantages

• Simple and intuitive


• No training time (lazy learner)

• Works well with small to medium-sized datasets

Disadvantages

• Computationally expensive at prediction time

• Sensitive to irrelevant features and feature scaling

• Performance degrades with high-dimensional data (curse of dimensionality)

Conclusion

Similarity-based learning methods like k-NN are powerful tools when class boundaries are
not complex. Weighted versions improve robustness by giving more importance to closer
neighbors. Though simple, these algorithms serve as strong baselines and are still widely
used across various domains.

🔹 Basics of Learning Theory

Introduction to Learning and Its Types


Definition
Learning is the process of improving system performance from experience (data). In
machine learning, learning occurs when a system can generalize from past examples to
unseen data.

Types of Learning

a) Supervised Learning

• Learns a function from labeled data (input-output pairs).

• Goal: Predict output for new inputs.

• Example: Email spam classification, house price prediction.

b) Unsupervised Learning

• Learns structure from unlabeled data.

• Goal: Group similar data (clustering), reduce features (dimensionality reduction).


• Example: Customer segmentation, anomaly detection.

c) Semi-supervised Learning

• Uses both labeled and unlabeled data.

• Useful when labeled data is scarce.

d) Reinforcement Learning

• Learns via feedback in the form of rewards/punishments.

• Example: Game AI, robotics.

Introduction to Computational Learning Theory


Definition
Computational learning theory provides mathematical models to study how learning
algorithms work and how well they generalize.

Goals of Computational Learning Theory:

• Understand the limits of learnability

• Measure sample complexity (how much data is needed)

• Analyze error bounds and convergence guarantees

Key Concepts

a) Hypothesis Space (H)

• Set of all possible functions the model can learn.

b) Consistency

• A hypothesis is consistent if it correctly classifies all training examples.

c) Probably Approximately Correct (PAC) Learning

• Introduced by Leslie Valiant.

• A framework where learning is feasible if the algorithm can learn a function that is
approximately correct with high probability.

Design of a Learning System


Steps in the Learning System Design:

1. Choosing a model: Decide algorithm (e.g., decision tree, SVM, neural network)
2. Selecting features: What input variables to use

3. Defining hypothesis space (H): All possible candidate solutions

4. Loss function: Measures prediction error (e.g., 0-1 loss, MSE)

5. Training algorithm: Optimizes the model parameters

6. Evaluation: Assess generalization using test/validation data

Diagram (conceptual):

Input Data → Feature Extraction → Learning Algorithm → Hypothesis (Model) → Evaluation

Introduction to Concept Learning


Definition
Concept learning is the process of inferring a boolean function from training data. It is one
of the earliest forms of machine learning studied in theory.

• Each instance is described by a set of attributes.

• The goal is to learn a concept that maps inputs to YES/NO (positive or negative
class).

Example:
Learning the concept “fruit is an apple” based on attributes like color, shape, and taste.

Key Terms in Concept Learning

• Instance space (X): All possible examples

• Concept (c): Target function

• Hypothesis (h): Learner’s approximation of the concept

• Version space: Set of hypotheses consistent with training data

Famous Algorithm:

• Find-S Algorithm: Starts with most specific hypothesis and generalizes as needed.

Conclusion
Basics of learning theory provide foundational understanding for how learning algorithms
behave, how they generalize, and what makes learning computationally feasible. These
concepts form the backbone of theoretical ML and guide practical implementations.

You might also like