Module -02 Machine Learning(BCS602)
Module -02 Machine Learning(BCS602)
Module-2
Bivariate Data
Bivariate data involves two variables, and the goal of bivariate analysis is to
explore the relationship between them.
Bivariate Data involves two variables. Bivariate data deals with causes of
relationships. The aim is to find relationships among data.
Consider the following Table 2.3, with data of the temperature in a shop and
sales of sweaters.
Page 2
VI Semester Machine Learning (BCS602)
RV Institute of Technology & Management®
Scatter Plot
Page 3
VI Semester Machine Learning (BCS602)
RV Institute of Technology & Management®
Strength: Indicates how closely the data points fit a pattern or trend.
Shape: Helps in identifying the type of relationship (linear, quadratic, etc.).
Direction: Shows whether the relationship is positive, negative, or neutral.
Outliers: Helps identify any points that deviate significantly from the trend.
Scatter plots are often used in the exploratory phase of data analysis before
calculating correlation coefficients or fitting regression models.
Bivariate Statistics
Correlation. Covariance
Page 4
VI Semester Machine Learning (BCS602)
RV Institute of Technology & Management®
Covariance values:
Correlation
Page 5
VI Semester Machine Learning (BCS602)
RV Institute of Technology & Management®
Multivariate Statistics
Multivariate data refers to data that involves more than two variables, and in
machine learning, most datasets are multivariate.
This can involve multiple dependent (response) variables, and is often used
for analyzing more complex data scenarios.
Regression Analysis
Principal Component Analysis (PCA)
Path Analysis
The mean vector is used to represent the mean of multiple variables, and
the covariance matrix represents the variance and relationships among all
variables.
The mean vector is also known as the centroid, while the covariance
matrix is also referred to as the dispersion matrix.
Multivariate Analysis
Techniques Regression
Analysis:
Factor Analysis:
Page 6
VI Semester Machine Learning (BCS602)
RV Institute of Technology & Management®
Page 7
VI Semester Machine Learning (BCS602)
RV Institute of Technology & Management®
Data
Heatmap
Applications:
Heatmaps are useful for visualizing complex data like traffic patterns or
patient health data, where you can easily identify regions of higher or lower
values.
Page 8
VI Semester Machine Learning (BCS602)
RV Institute of Technology & Management®
Example:
Page 9
VI Semester Machine Learning (BCS602)
RV Institute of Technology & Management®
In vehicle traffic data, regions with heavy traffic are highlighted with dark
colors, making it easy to spot problem areas.
Visual Layout: Each scatter plot in the matrix shows the relationship
between two variables.
Usefulness: By examining the pairplot, you can easily
identify patterns, correlations, or clusters among the
Page 10
VI Semester Machine Learning (BCS602)
RV Institute of Technology & Management®
variables.
Page 11
VI Semester Machine Learning (BCS602)
RV Institute of Technology & Management®
Linear Algebra
Page 13
VI Semester Machine Learning (BCS602)
RV Institute of Technology & Management®
Statistics
Probability
Page 15
VI Semester Machine Learning (BCS602)
RV Institute of Technology & Management®
Optimization
Optimization is key to finding the best model for multivariate data. Many
machine learning algorithms are formulated as optimization problems.
Multivariate Analysis
Page 17
VI Semester Machine Learning (BCS602)
RV Institute of Technology & Management®
Dimensionality Reduction
Page 19
VI Semester Machine Learning (BCS602)
RV Institute of Technology & Management®
They ensure that models are not only accurate but also efficient,
interpretable, and scalable.
1. Feature Engineering
1. Feature Creation
2. Feature Transformation
o Normalization: Scaling values to a specific range, typically [0,1].
o Standardization: Transforming features to have a mean
of 0 and a standard deviation of 1.
o Log Transformation: Reducing the impact of large values by
applying the log function.
o Power Transformation: Stabilizing variance by applying
functions like square root or exponential transformations.
3. Handling Missing Values
o Imputation: Filling missing values with statistical
measures (mean, median, mode) or predictions from
models.
Page 20
VI Semester Machine Learning (BCS602)
RV Institute of Technology & Management®
Page 21
VI Semester Machine Learning (BCS602)
RV Institute of Technology & Management®
Dimensionality Reduction
It helps combat issues like overfitting, high computational costs, and the curse
of dimensionality.
5. Feature Agglomeration
o Purpose: Groups features with similar characteristics
(hierarchical clustering for features).
o Combines redundant features into a single representative feature.
Page 23
VI Semester Machine Learning (BCS602)
RV Institute of Technology & Management®
7. Factor Analysis
o Purpose: Identifies underlying latent variables (factors)
that explain observed variables.
o Assumes that observed data is influenced by a smaller
number of unobservable factors.
Hybrid
Methods: For
Page 24
VI Semester Machine Learning (BCS602)
RV Institute of Technology & Management®
example:
Page 25
VI Semester Machine Learning (BCS602)
RV Institute of Technology & Management®
Applicati
ons Text
Data:
Image Data:
Genomic Data:
Sensor Data:
Best Practices
Page 26
VI Semester Machine Learning (BCS602)
RV Institute of Technology & Management®
Page 27
VI Semester Machine Learning (BCS602)
RV Institute of Technology & Management®
Page 28
VI Semester Machine Learning (BCS602)
RV Institute of Technology & Management®
Chapter – 02
The first step in building a learning system is selecting the type of training
experience it will use to learn. This involves determining the source of data
and how it will be used.
Direct Experience:
Indirect Experience:
Page 29
VI Semester Machine Learning (BCS602)
RV Institute of Technology & Management®
In supervised training, a supervisor labels all valid moves for a given board state.
The target function represents the knowledge the system needs to learn.
It specifies the goal of the learning system and what it is trying to predict or
optimize.
Page 30
VI Semester Machine Learning (BCS602)
RV Institute of Technology & Management®
Once the target function is defined, the next step is deciding how to represent
it. The representation depends on the complexity of the problem and the
available computational resources.
Common Representations:
Lookup Tables:
Used for simple problems where all possible states and actions can be
enumerated.
Example: A small chessboard with a limited number of moves.
Mathematical Functions:
Function Approximation
Approaches to
Approximation:
Page 31
VI Semester Machine Learning (BCS602)
RV Institute of Technology & Management®
Parametric Models:
Page 32
VI Semester Machine Learning (BCS602)
RV Institute of Technology & Management®
Models that adapt their complexity to the amount of data (e.g., k-nearest
neighbors, decision trees).
Learning Algorithms:
Target Function:
Define the target function as selecting the best move M given the board state B:
Use a deep neural network to represent the target function, where inputs are
board states and outputs are move probabilities.
Function Approximation:
Page 33
VI Semester Machine Learning (BCS602)
RV Institute of Technology & Management®
Train the neural network using reinforcement learning, with rewards based
on the outcome of games played by the system.
It involves:
Categorization:
Page 34
VI Semester Machine Learning (BCS602)
RV Institute of Technology & Management®
Boolean-Valued Function:
Example:
Components of
Concept Learning
Input:
Page 35
VI Semester Machine Learning (BCS602)
RV Institute of Technology & Management®
Output:
Testing:
Process of
Concept
Learning
Training:
Hypothesis Formation:
Page 36
VI Semester Machine Learning (BCS602)
RV Institute of Technology & Management®
o Example: "An elephant has a trunk and tusks" could be the hypothesis to
classify an elephant.
Page 37
VI Semester Machine Learning (BCS602)
RV Institute of Technology & Management®
Generalization:
Output: Target concept for an elephant, e.g., "has a trunk," "has tusks,"
and "large size." Testing: New animal instances are classified based
Page 38
VI Semester Machine Learning (BCS602)
RV Institute of Technology & Management®
Goals: Selecting the right model, training effectively, reducing training time,
and achieving high performance on unseen data.
Types of Parameters:
Error Types:
o Training Error (In-sample Error): Error when the model is tested on training
data.
o Test Error (Out-of-sample Error): Error when predicting on unseen test data.
Page 39
VI Semester Machine Learning (BCS602)
RV Institute of Technology & Management®
dataset.
Page 40
VI Semester Machine Learning (BCS602)
RV Institute of Technology & Management®
Evaluation
Challenges:
underfitting). Approaches:
Resampling Methods
into folds:
Page 41
VI Semester Machine Learning (BCS602)
RV Institute of Technology & Management®
Page 42
VI Semester Machine Learning (BCS602)
RV Institute of Technology & Management®
Page 43
VI Semester Machine Learning (BCS602)
RV Institute of Technology & Management®
Precision-Recall Curve:
Selects the simplest model with the fewest bits to represent both data and
predictions.
Page 44
VI Semester Machine Learning (BCS602)
RV Institute of Technology & Management®
Page 45
VI Semester Machine Learning (BCS602)