0% found this document useful (0 votes)
13 views

Module -02 Machine Learning(BCS602)

The document discusses bivariate and multivariate data analysis, emphasizing the relationships between variables through methods like scatter plots, covariance, and correlation. It covers essential statistical techniques and mathematical concepts necessary for understanding multivariate data, including regression analysis and dimensionality reduction methods like PCA and t-SNE. Additionally, it highlights feature engineering techniques to improve machine learning model performance by selecting and transforming features effectively.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

Module -02 Machine Learning(BCS602)

The document discusses bivariate and multivariate data analysis, emphasizing the relationships between variables through methods like scatter plots, covariance, and correlation. It covers essential statistical techniques and mathematical concepts necessary for understanding multivariate data, including regression analysis and dimensionality reduction methods like PCA and t-SNE. Additionally, it highlights feature engineering techniques to improve machine learning model performance by selecting and transforming features effectively.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 45

Rashtreeya Sikshana Samithi Trust

RV Institute of Technology and Management®


(Affiliated to VTU, Belagavi)

JP Nagar, Bengaluru - 560076

Department of Information Science and


Engineering

Course Name: Machine Learning

Course Code: BCS602


VI Semester
2022 Scheme
RV Institute of Technology & Management®

Module-2

Chapter – 01 - Understanding Data – 2

Bivariate Data and Multivariate Data

Bivariate Data

Bivariate data involves two variables, and the goal of bivariate analysis is to
explore the relationship between them.

This relationship can help in comparisons, identifying causes, and further


exploration of the data.

Bivariate Data involves two variables. Bivariate data deals with causes of
relationships. The aim is to find relationships among data.

Consider the following Table 2.3, with data of the temperature in a shop and
sales of sweaters.

Page 2
VI Semester Machine Learning (BCS602)
RV Institute of Technology & Management®

Scatter Plot

A scatter plot is a useful graphical method for visualizing bivariate data.

It is particularly effective for illustrating the relationship between two variables.

Page 3
VI Semester Machine Learning (BCS602)
RV Institute of Technology & Management®

The key features of a scatter plot are:

 Strength: Indicates how closely the data points fit a pattern or trend.
 Shape: Helps in identifying the type of relationship (linear, quadratic, etc.).
 Direction: Shows whether the relationship is positive, negative, or neutral.
 Outliers: Helps identify any points that deviate significantly from the trend.

Scatter plots are often used in the exploratory phase of data analysis before
calculating correlation coefficients or fitting regression models.

Bivariate Statistics

There are various statistical measures to describe the relationship between


two variables.

Two important bivariate statistics are Covariance and

Correlation. Covariance

Covariance measures the joint variability of two random variables. It tells


you whether an increase in one variable results in an increase or decrease
in the other variable.
Mathematically, the covariance between two variables X and Y is defined as:

Page 4
VI Semester Machine Learning (BCS602)
RV Institute of Technology & Management®

Covariance values:

 Positive covariance: As one variable increases, the other variable also


increases.
 Negative covariance: As one variable increases, the other variable
decreases.
 Zero covariance: No linear relationship between the variables.

Correlation

While covariance measures the direction of the relationship, correlation


quantifies the strength of the relationship between two variables.

The most common measure of correlation is the Pearson correlation coefficient:

Unlike covariance, correlation is dimensionless, meaning it is not affected


by the units of the variables.

Page 5
VI Semester Machine Learning (BCS602)
RV Institute of Technology & Management®

Multivariate Statistics

Multivariate data refers to data that involves more than two variables, and in
machine learning, most datasets are multivariate.

The goal of multivariate analysis is to understand relationships among


multiple variables simultaneously.

This can involve multiple dependent (response) variables, and is often used
for analyzing more complex data scenarios.

Multivariate analysis techniques include:

 Regression Analysis
 Principal Component Analysis (PCA)
 Path Analysis

The mean vector is used to represent the mean of multiple variables, and
the covariance matrix represents the variance and relationships among all
variables.

The mean vector is also known as the centroid, while the covariance
matrix is also referred to as the dispersion matrix.

Multivariate Analysis

Techniques Regression

Analysis:

Used to model the relationship between multiple independent variables and


a dependent variable.

Factor Analysis:

Page 6
VI Semester Machine Learning (BCS602)
RV Institute of Technology & Management®

A statistical method used to identify underlying relationships between


observed variables.

Page 7
VI Semester Machine Learning (BCS602)
RV Institute of Technology & Management®

Multivariate Analysis of Variance (MANOVA):

Extends ANOVA to analyze multiple dependent variables

simultaneously. Visualization Techniques for Multivariate

Data

Heatmap

A heatmap is a graphical representation of a 2D matrix where values are


represented by colors. In a heatmap:

 Darker colors indicate larger values.


 Lighter colors indicate smaller values.

Applications:

Heatmaps are useful for visualizing complex data like traffic patterns or
patient health data, where you can easily identify regions of higher or lower
values.
Page 8
VI Semester Machine Learning (BCS602)
RV Institute of Technology & Management®

Example:

Page 9
VI Semester Machine Learning (BCS602)
RV Institute of Technology & Management®

In vehicle traffic data, regions with heavy traffic are highlighted with dark
colors, making it easy to spot problem areas.

Pairplot (or Scatter Matrix)

A pairplot (or scatter matrix) is a matrix of scatter plots that shows


relationships between every pair of variables in a multivariate dataset.

This method allows you to visually examine correlations or relationships


between variables.

A random matrix of three columns is chosen and the relationships of the


columns is plotted as a pairplot (or scattermatrix) as shown below in Figure
2.14.

 Visual Layout: Each scatter plot in the matrix shows the relationship
between two variables.
 Usefulness: By examining the pairplot, you can easily
identify patterns, correlations, or clusters among the
Page 10
VI Semester Machine Learning (BCS602)
RV Institute of Technology & Management®

variables.

Page 11
VI Semester Machine Learning (BCS602)
RV Institute of Technology & Management®

Essential Mathematics for Multivariate Data

In the realm of machine learning and multivariate data analysis, several


mathematical concepts are foundational.

These include concepts from Linear Algebra, Statistics, Probability,


and Optimization. Below is an overview of essential mathematical
tools that are necessary for understanding and working with
multivariate data.

Linear Algebra

Linear algebra is crucial in machine learning as it provides the tools for


dealing with data in the form of vectors and matrices. Here's a
breakdown of important topics:

 Vectors: A vector is an ordered list of numbers. It can represent


data points or features of an observation in a multivariate dataset.
o Dot product and cross product are used to compute
projections and angles between vectors.
 Matrices: A matrix is a 2D array of numbers. In machine learning,
matrices often represent data where rows are instances and columns
are features.
o Matrix multiplication allows the transformation of data
and is used in various algorithms like linear regression,
neural networks, and more.
 Eigenvalues and Eigenvectors: These are important for
dimensionality reduction techniques such as Principal Component
Analysis (PCA). They are used to transform data into a new basis
that captures the most variance.
 Determinants and Inverses: The determinant of a matrix tells us if
the matrix is invertible (non-singular). The inverse of a matrix is used
to solve linear systems of equations.
Page 12
VI Semester Machine Learning (BCS602)
RV Institute of Technology & Management®

 Singular Value Decomposition (SVD): This is a factorization


method used in PCA and other dimensionality reduction techniques to
decompose a matrix into singular values and vectors.

Page 13
VI Semester Machine Learning (BCS602)
RV Institute of Technology & Management®

Statistics

Statistics is key to understanding the relationships between different


variables in multivariate data. Key concepts include:

 Mean and Variance: Measures of central tendency (mean) and


spread (variance) are essential to understanding the distribution of
each variable.
 Covariance: Covariance measures the relationship between two
variables. A positive covariance indicates that as one variable
increases, the other tends to increase.
 Correlation: Correlation is a normalized measure of covariance that
indicates the strength and direction of the relationship between two
variables.
 Multivariate Normal Distribution: Many machine learning
algorithms assume that the data follows a multivariate normal
distribution, which extends the idea of normal distribution to more
than one variable.
 Principal Component Analysis (PCA): PCA is used to reduce the
dimensionality of the dataset while retaining as much variance as
possible. It uses eigenvectors and eigenvalues to identify the principal
components.

Probability

Probability theory underpins the concept of uncertainty, which is inherent


in real-world data:

 Random Variables: A random variable represents a quantity whose


value is subject to chance. In multivariate data, we deal with vectors of
random variables.
 Probability Distributions: These describe the likelihood of various
outcomes. Common distributions in machine learning include the
Page 14
VI Semester Machine Learning (BCS602)
RV Institute of Technology & Management®

normal distribution and the multinomial distribution.

Page 15
VI Semester Machine Learning (BCS602)
RV Institute of Technology & Management®

 Bayes' Theorem: This theorem describes the probability of an event,


based on prior knowledge of related events. It's fundamental to
algorithms like Naive Bayes and Bayesian Inference.
 Markov Chains: These are used for modeling systems that undergo
transitions from one state to another with a certain probability,
without memory of previous states.

Optimization

Optimization is key to finding the best model for multivariate data. Many
machine learning algorithms are formulated as optimization problems.

 Gradient Descent: An iterative optimization algorithm used to


minimize a cost function (such as in linear regression or neural
networks).
 Convex Optimization: Involves minimizing convex functions,
and plays a significant role in machine learning, as many cost
functions are convex.
 Lagrange Multipliers: Used for optimizing functions subject to
constraints, which is often seen in constrained optimization problems
in machine learning.

Multivariate Analysis

 Multivariate Regression: This is the extension of linear


regression to predict multiple dependent variables using a set of
independent variables.
 Multivariate Analysis of Variance (MANOVA): An extension of
ANOVA used when there are two or more dependent variables. It tests
for differences between groups.
 Factor Analysis: A method for identifying the underlying
relationships between observed variables. It’s often used in
exploratory data analysis.
Page 16
VI Semester Machine Learning (BCS602)
RV Institute of Technology & Management®

Page 17
VI Semester Machine Learning (BCS602)
RV Institute of Technology & Management®

Graphical Techniques for Multivariate Data

 Scatter Plots: A scatter plot can be used to visualize the relationship


between two variables. For multivariate data, pair plots or scatter
matrices are used to examine the relationships between all pairs of
variables.
 Heatmaps: Used to visualize correlation matrices or covariance
matrices, where color intensity represents the strength of the
relationship.

Multivariate Data Models

 Multivariate Normal Distribution: A generalization of the


univariate normal distribution to multiple variables, frequently
assumed in multivariate statistical analysis.
 Multivariate Linear Models: Models such as multiple regression,
where multiple independent variables are used to predict a set of
dependent variables.

Dimensionality Reduction

Dimensionality reduction is used to reduce the number of variables in a


dataset while maintaining the essential information:

 Principal Component Analysis (PCA): A technique that reduces the


dimensionality of the dataset by projecting the data onto a set of
orthogonal axes (principal components) that explain the most
variance.
 t-SNE: A technique for dimensionality reduction that is well-suited for
visualizing high-dimensional data in 2D or 3D space.

Feature Engineering and Dimensionality Reduction Techniques

Feature engineering and dimensionality reduction are critical steps in


Page 18
VI Semester Machine Learning (BCS602)
RV Institute of Technology & Management®

machine learning workflows.

Page 19
VI Semester Machine Learning (BCS602)
RV Institute of Technology & Management®

They ensure that models are not only accurate but also efficient,
interpretable, and scalable.

1. Feature Engineering

Feature engineering involves creating, modifying, or selecting features


(variables) from raw data to improve the performance of machine learning
models.

Techniques in Feature Engineering

1. Feature Creation

2. Feature Transformation
o Normalization: Scaling values to a specific range, typically [0,1].
o Standardization: Transforming features to have a mean
of 0 and a standard deviation of 1.
o Log Transformation: Reducing the impact of large values by
applying the log function.
o Power Transformation: Stabilizing variance by applying
functions like square root or exponential transformations.
3. Handling Missing Values
o Imputation: Filling missing values with statistical
measures (mean, median, mode) or predictions from
models.

Page 20
VI Semester Machine Learning (BCS602)
RV Institute of Technology & Management®

o Dropping Features or Rows: Removing features or samples


with excessive missing data.

Page 21
VI Semester Machine Learning (BCS602)
RV Institute of Technology & Management®

4. Encoding Categorical Features


o Label Encoding: Assigning numerical values to categories.
o One-Hot Encoding: Creating binary columns for each category.
o Target Encoding: Replacing categories with the mean
of the target variable.
5. Feature Selection
o Filter Methods: Using statistical tests (e.g., correlation, chi-
square) to select features.
o Wrapper Methods: Selecting features based on the
performance of a model (e.g., recursive feature
elimination).
o Embedded Methods: Feature selection integrated into model
training (e.g., regularization methods like LASSO).

Dimensionality Reduction

Dimensionality reduction aims to reduce the number of features while


preserving as much relevant information as possible.

It helps combat issues like overfitting, high computational costs, and the curse
of dimensionality.

Techniques for Dimensionality Reduction

1. Principal Component Analysis (PCA)


o Purpose: Identifies directions (principal components) in the
data that explain the maximum variance.
o Projects data onto a new coordinate system where each axis
represents a principal component.
o Captures the most variance in the first few components.

Applications: Commonly used in image compression, gene


expression analysis, and exploratory data analysis.
Page 22
VI Semester Machine Learning (BCS602)
RV Institute of Technology & Management®

2. Linear Discriminant Analysis (LDA)


o Purpose: Similar to PCA but focuses on maximizing class
separability in supervised learning tasks.
o Projects data onto a lower-dimensional space while
maintaining class distinction.

Applications: Often used in classification problems.

3. t-Distributed Stochastic Neighbor Embedding (t-SNE)


o Purpose: Reduces high-dimensional data to 2D or 3D for visualization.
o Preserves the local structure of the data while sacrificing global
structure.

Applications: Useful for visualizing clusters in high-dimensional


data like embeddings.

4. Autoencoders (Deep Learning-Based Reduction)


o Purpose: Learns a compressed representation of the data
using neural networks.
o The encoder compresses the data, and the decoder reconstructs it.
o The bottleneck layer represents the reduced dimensions.

Applications: Image compression, anomaly detection, and


generative models.

5. Feature Agglomeration
o Purpose: Groups features with similar characteristics
(hierarchical clustering for features).
o Combines redundant features into a single representative feature.

Applications: Useful for datasets with many correlated features.

Page 23
VI Semester Machine Learning (BCS602)
RV Institute of Technology & Management®

6. Independent Component Analysis (ICA)


o Purpose: Decomposes data into statistically independent components.
o Useful for signals with non-Gaussian distributions.

Applications: Signal processing, such as separating audio


signals in the "cocktail party problem."

7. Factor Analysis
o Purpose: Identifies underlying latent variables (factors)
that explain observed variables.
o Assumes that observed data is influenced by a smaller
number of unobservable factors.

Applications: Psychometrics, finance, and social sciences.

8. Backward Feature Elimination


o Purpose: Iteratively removes features that have the least
impact on the target variable.
o Uses a trained model's performance as the criterion.

Applications: Effective for small datasets where computational cost


isn’t a concern.

Combining Feature Engineering and

Dimensionality Reduction Pipeline Integration:

Many machine learning frameworks (e.g., scikit-learn) support building


pipelines where feature engineering and dimensionality reduction steps are
automated.

Hybrid

Methods: For

Page 24
VI Semester Machine Learning (BCS602)
RV Institute of Technology & Management®

example:

Page 25
VI Semester Machine Learning (BCS602)
RV Institute of Technology & Management®

o Combine PCA with feature selection to reduce noise and


retain relevant features.
o Use autoencoders to generate compact features, then apply
supervised learning techniques.

Applicati

ons Text

Data:

o Use TF-IDF for feature creation and Latent Semantic


Analysis (LSA) for dimensionality reduction.

Image Data:

o Apply Convolutional Autoencoders or PCA for reducing pixel-based data


dimensions.

Genomic Data:

o Use PCA or t-SNE to visualize high-dimensional gene expression data.

Sensor Data:

o Combine Fourier transforms for feature extraction and PCA for


dimensionality reduction.

Best Practices

Understand Data: Always begin with exploratory data analysis (EDA) to


understand feature importance and relationships.

Domain Knowledge: Incorporate domain expertise to create meaningful features.

Avoid Over-Reduction: Ensure that dimensionality reduction techniques retain

Page 26
VI Semester Machine Learning (BCS602)
RV Institute of Technology & Management®

sufficient information to build an accurate model.

Page 27
VI Semester Machine Learning (BCS602)
RV Institute of Technology & Management®

Evaluate: Continuously evaluate feature engineering and dimensionality


reduction using cross-validation.

Page 28
VI Semester Machine Learning (BCS602)
RV Institute of Technology & Management®

Chapter – 02

Basic Learning Theory

Design of Learning System

A learning system is a computational system that uses algorithms to learn


from data or experiences to improve its performance over time.

The design of such systems focuses on the following essential

steps: Choosing a Training Experience

The first step in building a learning system is selecting the type of training
experience it will use to learn. This involves determining the source of data
and how it will be used.

Types of Training Experience:

Direct Experience:

 The system is explicitly provided with examples of board states and


their correct moves.
 Example: In a chess game, the system is given specific board states and
the optimal moves for those states.

Indirect Experience:

 Instead of explicit guidance, the system is provided with sequences of


moves and their results.
 Example: The system observes the outcome (win or loss) of
different move sequences and learns to optimize its strategy.

Page 29
VI Semester Machine Learning (BCS602)
RV Institute of Technology & Management®

Supervised vs. Unsupervised Training:

In supervised training, a supervisor labels all valid moves for a given board state.

In the absence of a supervisor, the system uses self-play or exploration to


learn. For example, a chess agent can play games against itself and
identify successful moves.

Training Data Distribution:

o For reliable performance, training samples must cover a wide range of


scenarios.
o If the training data and testing data have similar distributions,
the system's performance will be better.

Determining the Target Function

The target function represents the knowledge the system needs to learn.

It specifies the goal of the learning system and what it is trying to predict or
optimize.

Page 30
VI Semester Machine Learning (BCS602)
RV Institute of Technology & Management®

Representation of the Target Function

Once the target function is defined, the next step is deciding how to represent
it. The representation depends on the complexity of the problem and the
available computational resources.

Common Representations:

Lookup Tables:

 Used for simple problems where all possible states and actions can be
enumerated.
 Example: A small chessboard with a limited number of moves.

Mathematical Functions:

 Represented using equations or models (e.g., linear regression or


polynomial equations).

Machine Learning Models:

 For complex systems, models like neural networks, decision trees, or


support vector machines are used to approximate the target function.
 Example: Using a neural network to predict the best chess moves
based on board states.

Function Approximation

In most real-world problems, the target function is too complex to be


represented exactly. Instead, an approximation of the target function is
learned.

Approaches to

Approximation:

Page 31
VI Semester Machine Learning (BCS602)
RV Institute of Technology & Management®

Parametric Models:

Page 32
VI Semester Machine Learning (BCS602)
RV Institute of Technology & Management®

Models with a fixed number of parameters (e.g., linear regression, neural

networks). Non-Parametric Models:

Models that adapt their complexity to the amount of data (e.g., k-nearest
neighbors, decision trees).

Learning Algorithms:

o Algorithms like gradient descent, reinforcement learning, or evolutionary


algorithms are used to optimize the parameters of the function.
o Example: In a chess game, reinforcement learning allows the agent to
learn by trial and error, optimizing its strategy over time.

Practical Example: Designing a Chess Learning

System Training Experience:

Use a combination of self-play (indirect experience) and historical game data


(direct experience).

Target Function:

Define the target function as selecting the best move M given the board state B:

Representation of the Target Function:

Use a deep neural network to represent the target function, where inputs are
board states and outputs are move probabilities.

Function Approximation:

Page 33
VI Semester Machine Learning (BCS602)
RV Institute of Technology & Management®

Train the neural network using reinforcement learning, with rewards based
on the outcome of games played by the system.

Introduction to Concept of Learning

Concept learning is a strategy in machine learning that involves acquiring


abstract knowledge or inferring general concepts from the given training
data.

It enables the learner to generalize from specific training examples and


classify objects or instances based on common, relevant features.

What is Concept Learning?

Concept learning is the process of abstraction and generalization from data,


where:

 The learner identifies common features shared by positive examples.


 It uses these features to classify new instances into categories.

It involves:

 Comparing and contrasting categories by analyzing


positive and negative examples.
 Simplifying observations from training data into a model or hypothesis.
 Applying this model to classify future data.

This process is also known as learning from

experience. Features of Concept Learning

Categorization:

o Concept learning enables classification of objects based on a set of


relevant features.

Page 34
VI Semester Machine Learning (BCS602)
RV Institute of Technology & Management®

o For example, humans classify animals like elephants, cats, or dogs


based on specific distinguishing features.

Boolean-Valued Function:

o Each concept or category learned is represented as a Boolean function


that returns true or false:
 True for positive examples that belong to the category.
 False for negative examples that do not belong to the category.

Example:

o Humans categorize animals by recognizing features such as:


 Size, shape, color, and behavior.
o For example, to identify an elephant:
 Large size, trunk, tusks, and big ears are the specific features.

Formal Definition of Concept Learning

Concept learning is the process of inferring a Boolean-valued function


by processing training examples.

The goal is to:

1. Identify a set of specific or common features.


2. Use these features to define a target concept for classifying objects.

Components of

Concept Learning

Input:

o A labeled training dataset consisting of:


 Positive examples: Instances that belong to the target concept.

Page 35
VI Semester Machine Learning (BCS602)
RV Institute of Technology & Management®

 Negative examples: Instances that do not belong to


the target concept.
o The learner uses this past experience to train the model.

Output:

o The Target Concept or Target Function f(x):


 A function f(x) maps input x to output y.
 The output is used to determine the relevant
features for classification.
o Example: Identifying an elephant requires a specific set of features
such as "has a trunk" and "has tusks."

Testing:

o New instances are provided to test the learned model.


o The system classifies these new instances based on the hypothesis
derived during training.

Process of

Concept

Learning

Training:

o The learner observes a set of labeled examples (positive and negative


instances).
o It identifies common, relevant features from the positive examples
and contrasts them with negative examples.

Hypothesis Formation:

o The system generates a hypothesis to represent the target concept.

Page 36
VI Semester Machine Learning (BCS602)
RV Institute of Technology & Management®

o Example: "An elephant has a trunk and tusks" could be the hypothesis to
classify an elephant.

Page 37
VI Semester Machine Learning (BCS602)
RV Institute of Technology & Management®

Generalization:

o The hypothesis is generalized to classify new instances correctly.

Testing and Validation:

o The learned model is tested on unseen data to evaluate its performance.

Example: Concept Learning for Animals

Input: Training dataset of animals with labeled features.

o Positive examples: Animals labeled as "elephants."


o Negative examples: Animals not labeled as "elephants."

Output: Target concept for an elephant, e.g., "has a trunk," "has tusks,"

and "large size." Testing: New animal instances are classified based

on the learned concept.

Applications of Concept Learning

1. Natural Language Processing: Categorizing words or


sentences based on grammatical or semantic features.
2. Image Recognition: Identifying objects or patterns in images.
3. Recommendation Systems: Classifying products or services
to provide personalized recommendations.
4. Medical Diagnosis: Identifying diseases based on symptoms and
medical test results.

Modelling in Machine Learning

A machine learning model abstracts a training dataset and makes


predictions on unseen data.

Page 38
VI Semester Machine Learning (BCS602)
RV Institute of Technology & Management®

Training: Involves feeding training data into a machine learning algorithm,


tuning parameters, and generating a predictive model.

Goals: Selecting the right model, training effectively, reducing training time,
and achieving high performance on unseen data.

Types of Parameters:

Model Parameters: Learnable directly from training data (e.g.,


regression coefficients, decision tree splits, neural network weights).

Hyperparameters: Cannot be learned directly and must be set (e.g.,


regularization strength, number of trees in random forests).

Evaluation and Error

Metrics Dataset Splitting:

o Training dataset: Used to train the model.


o Test dataset: Used to evaluate the model's ability to generalize.

Error Types:

o Training Error (In-sample Error): Error when the model is tested on training
data.
o Test Error (Out-of-sample Error): Error when predicting on unseen test data.

Loss Function: Measures prediction error. Example: Mean Squared Error


(MSE)—a smaller value indicates higher accuracy.

Steps in Machine Learning Process

Algorithm Selection: Choose a model suitable for the problem

and dataset. Training: Train the selected algorithm on the

Page 39
VI Semester Machine Learning (BCS602)
RV Institute of Technology & Management®

dataset.

Page 40
VI Semester Machine Learning (BCS602)
RV Institute of Technology & Management®

Tuning: Adjust parameters to improve

accuracy. Evaluation: Validate the model

using test data. Model Selection and

Evaluation

Challenges:

Balancing performance (accuracy) and complexity (overfitting or

underfitting). Approaches:

1. Resampling methods like splitting datasets or cross-validation.


2. Calculating accuracy or error metrics.
3. Probabilistic frameworks for scoring model performance.

Resampling Methods

Random Train/Test Splits: Randomly split the data for training

and testing. Cross-Validation: Tune models by splitting data

into folds:

o K-fold Cross-Validation: Split data into k parts, train on k-1 folds,


and test on the remaining fold.

Page 41
VI Semester Machine Learning (BCS602)
RV Institute of Technology & Management®

o Stratified K-fold: Ensures each fold contains a proportionate


distribution of class labels.

Page 42
VI Semester Machine Learning (BCS602)
RV Institute of Technology & Management®

o Leave-One-Out Cross-Validation (LOOCV): Train on all data except


one instance; repeat for every instance.

Page 43
VI Semester Machine Learning (BCS602)
RV Institute of Technology & Management®

Visualizing Model Performance

ROC Curve (Receiver Operating Characteristic):

o Plots True Positive Rate vs. False Positive Rate.


o Area Under the Curve (AUC): Measures classifier
performance (1.0 = perfect, closer to diagonal = less
accurate).

Precision-Recall Curve:

o Useful for imbalanced datasets to evaluate precision and recall.

Scoring and Complexity Methods

Scoring Models: Combine model performance and complexity into a single

score. Example: Minimum Description Length (MDL):

Selects the simplest model with the fewest bits to represent both data and
predictions.

Page 44
VI Semester Machine Learning (BCS602)
RV Institute of Technology & Management®

Page 45
VI Semester Machine Learning (BCS602)

You might also like