0% found this document useful (0 votes)
319 views17 pages

ML Unit-2

Uploaded by

pavankumarvoore3
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
319 views17 pages

ML Unit-2

Uploaded by

pavankumarvoore3
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

separate matrices, capturing the singular values, left singular vectors, and right

singular vectors. SVD is utilized for dimensionality reduction, recommendation


systems, image compression, and more.

These are just a few examples of how linear algebra concepts are applied in
machine learning. Understanding and applying linear algebra operations and
concepts allow for efficient manipulation of data, designing models, solving
optimization problems, and gaining insights from the data in the field of
machine learning.

UNIT-II

Supervised Learning in machine Learning:

Supervised learning is a type of machine learning where the algorithm learns


from labeled data, consisting of input features and their corresponding output
labels. The goal of supervised learning is to build a predictive model that can
accurately map inputs to their correct outputs, enabling the model to make
predictions on unseen data.

The process of supervised learning involves the following steps:

1. Data Collection: Gather a dataset that contains input features and their
associated output labels. The dataset should be representative of the problem
you are trying to solve.
2. Data Preprocessing: Clean the data by handling missing values, outliers,
and irrelevant features. It may involve techniques like data normalization,
feature scaling, or feature engineering to prepare the data for modeling.
3. Training-Validation Split: Split the dataset into two parts: a training set
and a validation set. The training set is used to train the model, while the

Downloaded by Pavankumar Voore ([email protected])


validation set is used to evaluate its performance during training and tune
hyperparameters.
4. Model Selection: Choose an appropriate algorithm or model architecture
for the specific problem. The choice of model depends on the characteristics of
the data and the desired output.
5. Model Training: Train the selected model on the training data. The model
learns to find patterns and relationships between the input features and the
corresponding output labels. During training, the model adjusts its internal
parameters iteratively to minimize the difference between predicted outputs and
true labels.
6. Model Evaluation: Evaluate the trained model's performance on the
validation set. Common evaluation metrics for supervised learning include
accuracy, precision, recall, F1 score, or mean squared error, depending on the
nature of the problem (classification or regression).
7. Hyperparameter Tuning: Adjust the hyperparameters of the model to
optimize its performance. Hyperparameters are configuration settings that are
not learned from the data but need to be set before training, such as learning
rate, regularization parameters, or the number of hidden layers in a neural
network.
8. Model Deployment: Once the model has been trained and evaluated
satisfactorily, it can be deployed to make predictions on new, unseen data.

Supervised learning algorithms include linear regression, logistic regression,


decision trees, random forests, support vector machines (SVM), naive Bayes, k-
nearest neighbors (KNN), and various neural network architectures.

Supervised learning is widely used in applications such as image classification,


sentiment analysis, fraud detection, recommendation systems, medical

Downloaded by Pavankumar Voore ([email protected])


diagnosis, and many more, where the availability of labeled data allows for
learning patterns and making accurate predictions.

Rationale and Basics:

Supervised learning is based on the principle of learning from labeled data. It is


widely used because it allows machines to learn patterns and relationships
directly from labeled examples, enabling accurate predictions or classifications
on unseen data. The rationale behind supervised learning is to leverage the
knowledge provided by labeled data to train models that can generalize well and
make informed decisions.

Basics of Supervised Learning:

1. Labeled Data: Supervised learning requires a labeled dataset, where each


data point consists of input features and corresponding output labels. The input
features represent the characteristics or attributes of the data, while the output
labels represent the desired prediction or classification associated with those
features.
2. Training Phase: In the training phase, the supervised learning algorithm
learns from the labeled data by finding patterns and relationships between the
input features and output labels. It adjusts its internal parameters iteratively to
minimize the difference between predicted outputs and the true labels in the
training data.
3. Prediction or Inference: After the model is trained, it can make
predictions or classifications on new, unseen data by applying the learned
patterns and relationships. The trained model takes input features as input and
produces predicted output labels based on the learned knowledge.

Downloaded by Pavankumar Voore ([email protected])


4. Evaluation: The performance of the trained model is evaluated using
evaluation metrics appropriate for the specific problem. Accuracy, precision,
recall, F1 score, mean squared error, or area under the receiver operating
characteristic curve (AUC-ROC) are some common evaluation metrics used in
supervised learning.
5. Model Selection and Tuning: Various algorithms and model architectures
can be used in supervised learning. The choice of model depends on the nature
of the problem (classification or regression), the characteristics of the data, and
the desired outcome. Hyperparameters, such as learning rate, regularization
parameters, or network structure, may need to be tuned to optimize the model's
performance.
6. Generalization: The goal of supervised learning is to build models that
can generalize well to unseen data. A well-generalized model can make accurate
predictions or classifications on new, previously unseen examples beyond the
training data. To achieve good generalization, overfitting (memorizing the
training data) should be avoided by applying regularization techniques and
using appropriate evaluation and validation strategies.

Supervised learning provides a powerful framework for solving a wide range of


prediction and classification tasks. By utilizing labeled data, it enables machines
to learn from examples and make informed decisions on new, unseen data. The
success of supervised learning relies on the availability of high-quality labeled
data and the choice of appropriate algorithms and techniques for the specific
problem at hand.

Learning from observations:

Learning from observations is a fundamental concept in machine learning and


artificial intelligence. It refers to the process of acquiring knowledge, patterns,

Downloaded by Pavankumar Voore ([email protected])


or insights by analyzing and extracting information from observed data.
Learning from observations forms the basis for developing models, making
predictions, and gaining understanding from real-world data. Here are some key
aspects and techniques related to learning from observations:

1. Data Collection: The first step in learning from observations is to gather


data from the real world or from a specific domain. Data can be collected
through various sources such as sensors, databases, surveys, or web scraping.
2. Data Preprocessing: Once the data is collected, it often requires
preprocessing to clean and transform it into a suitable format for analysis. This
may involve handling missing values, removing outliers, normalizing or scaling
features, and encoding categorical variables.
3. Exploratory Data Analysis: Exploratory data analysis involves
understanding the data by visualizing and summarizing its characteristics. This
step helps in identifying patterns, relationships, trends, or anomalies in the data.
Techniques such as statistical summaries, data visualization, and data profiling
can be used for exploratory data analysis.
4. Feature Engineering: Feature engineering involves creating new features
or transforming existing features to improve the performance of machine
learning models. This step may include selecting relevant features, combining
features, encoding categorical variables, or creating derived features based on
domain knowledge.
5. Model Selection: Learning from observations involves selecting an
appropriate model or algorithm that can capture the patterns and relationships in
the data. The choice of model depends on the nature of the problem, the
available data, and the desired output. Common models include decision trees,
neural networks, support vector machines (SVM), and linear regression.
6. Model Training: Once the model is selected, it is trained on the observed
data to learn patterns or relationships between input features and output labels.

Downloaded by Pavankumar Voore ([email protected])


The model's parameters or weights are adjusted iteratively to minimize the
difference between predicted outputs and the true labels in the training data.
7. Model Evaluation: After training, the model's performance is evaluated
on unseen data to assess its generalization ability. Evaluation metrics such as
accuracy, precision, recall, F1 score, or mean squared error are used to measure
the model's performance and assess its effectiveness in making predictions or
classifications.
8. Model Deployment: Once the model has been trained and evaluated
satisfactorily, it can be deployed to make predictions on new, unseen data. The
model is applied to new observations to generate predictions or gain insights.

Learning from observations is a continuous process that involves refining


models, incorporating new data, and updating knowledge as more observations
become available. It is a key component of machine learning and data-driven
decision-making, enabling systems to learn, adapt, and make informed decisions
based on real-world data

Bias and Why Learning Works

Bias, in the context of machine learning, refers to the tendency of a learning


algorithm to consistently make predictions or classifications that deviate from
the true values or labels in the training data. Bias can arise from various factors,
such as the choice of model, assumptions made during training, or limitations in
the representation of the data. Understanding bias is crucial in evaluating and
improving the performance of machine learning algorithms.

Why Learning Works: Learning in machine learning refers to the process of


training a model on data to make predictions or classifications. Learning works
in machine learning due to several key factors:

Downloaded by Pavankumar Voore ([email protected])


1. Generalization: Learning allows models to generalize from the observed
data to make accurate predictions on unseen or new data. By learning patterns
and relationships in the training data, models aim to capture the underlying
structure of the data, enabling them to make informed decisions on similar,
previously unseen instances.
2. Bias-Variance Trade-off: Learning works by striking a balance between
bias and variance. Bias refers to the error introduced by approximating a
complex problem with a simplified model, while variance refers to the
sensitivity of the model to variations in the training data. Learning algorithms
aim to minimize both bias and variance to achieve a good trade-off, leading to
models that generalize well and perform effectively on new data.
3. Model Complexity: Learning allows models to adapt their complexity to
the complexity of the underlying problem. More complex models, such as deep
neural networks, have the capacity to learn intricate patterns and relationships in
the data. On the other hand, simpler models, such as linear regression, may have
lower capacity but can still capture linear relationships. The learning process
adjusts the model's parameters to find an appropriate level of complexity that
best fits the data.
4. Optimization: Learning involves optimizing model parameters or weights
to minimize the difference between predicted outputs and true labels in the
training data. This optimization process uses various optimization algorithms,
such as gradient descent, to iteratively update the model's parameters and
improve its performance.
5. Feature Representation: Learning is effective when the data is properly
represented in a way that captures the relevant information for the task. Feature
engineering or feature learning techniques help to transform the raw data into a
more suitable representation, enabling the model to learn meaningful patterns
and relationships.

Downloaded by Pavankumar Voore ([email protected])


6. Regularization: Learning algorithms often incorporate regularization
techniques to prevent overfitting and improve generalization. Regularization
helps to control model complexity, reduce noise, and prevent the model from
excessively fitting the training data. Techniques such as L1 or L2 regularization
and dropout are commonly used to regularize models.

Learning in machine learning works through these mechanisms, allowing


models to learn from data, adapt to the underlying problem complexity,
generalize to new instances, and make accurate predictions or classifications..

Computational Learning Theory

Computational learning theory is a subfield of machine learning that focuses on


studying the theoretical foundations of learning algorithms and their
computational capabilities. It provides a framework for understanding the
fundamental principles of learning, analyzing the complexity of learning
problems, and establishing theoretical guarantees for the performance of
learning algorithms. The main goal of computational learning theory is to
provide insights into what can be learned, how efficiently it can be learned, and
the limitations of learning algorithms.

Key concepts and ideas in computational learning theory include:

1. Sample Complexity: Sample complexity refers to the number of training


examples required by a learning algorithm to achieve a certain level of accuracy
or generalization performance. Computational learning theory investigates the
relationship between the complexity of the underlying learning problem and the
amount of training data needed to learn it accurately.
2. Generalization and Overfitting: Generalization is the ability of a learning
algorithm to perform well on unseen data. Computational learning theory

Downloaded by Pavankumar Voore ([email protected])


examines the conditions under which learning algorithms can generalize from a
limited set of observed training examples to make accurate predictions on new,
unseen instances. It also investigates the causes and prevention of overfitting,
where a model becomes too complex and memorizes the training data instead of
learning the underlying patterns.
3. PAC Learning: Probably Approximately Correct (PAC) learning is a
theoretical framework introduced in computational learning theory. It provides a
formal definition of learning, where a learning algorithm is considered
successful if it outputs a hypothesis that has low error with high confidence
based on a polynomial number of training examples. PAC learning theory
explores the relationship between the accuracy, confidence, sample complexity,
and computational complexity of learning algorithms.
4. Computational Complexity: Computational learning theory also considers
the computational aspects of learning algorithms, analyzing their time and space
complexity. It examines the efficiency of learning algorithms in terms of their
computational requirements and explores the relationship between the
complexity of learning problems and the computational resources required to
solve them.
5. Bounds and Convergence: Computational learning theory provides
bounds and convergence guarantees for learning algorithms. These bounds give
theoretical guarantees on the expected error or performance of a learning
algorithm and help in understanding the trade-offs between the complexity of
the learning problem, the number of training examples, and the achievable
accuracy.
6. Intractability and No-Free-Lunch Theorems: Computational learning
theory explores the inherent limitations and intractability of learning problems.
No-Free-Lunch theorems state that there is no universally superior learning
algorithm that works well for all possible learning problems. These theorems

Downloaded by Pavankumar Voore ([email protected])


highlight the importance of considering problem-specific characteristics and
assumptions when designing learning algorithms.

By studying computational learning theory, researchers aim to understand the


theoretical underpinnings of machine learning, establish the capabilities and
limitations of learning algorithms, and develop rigorous mathematical
frameworks for analyzing and designing effective learning systems. It provides
theoretical foundations that guide the development and analysis of learning
algorithms in practice.

Occam's Razor Principle and Over fitting Avoidance Heuristic Search in


inductive Learning:

Occam's Razor Principle and Overfitting Avoidance:

Occam's Razor is a principle in machine learning and statistical modeling that


suggests choosing the simplest explanation or model that adequately explains
the data. It is a guiding principle that favors simpler models over more complex
ones when multiple models have similar predictive performance. Occam's Razor
helps to prevent overfitting, which occurs when a model captures noise or
irrelevant patterns in the training data, leading to poor generalization on unseen
data.

Overfitting occurs when a model becomes too complex and captures the noise
or idiosyncrasies present in the training data, instead of learning the underlying
true patterns. This results in a model that performs well on the training data but
fails to generalize to new data. Overfitting can be mitigated or avoided by
applying various techniques:

1. Regularization: Regularization is a technique that adds a penalty term to


the model's objective function, discouraging overly complex models.

Downloaded by Pavankumar Voore ([email protected])


Regularization techniques, such as L1 (Lasso) or L2 (Ridge) regularization,
limit the magnitudes of the model's parameters, effectively reducing overfitting.
2. Cross-Validation: Cross-validation is a technique to estimate the
performance of a model on unseen data. By dividing the available data into
multiple subsets for training and validation, cross-validation helps to assess the
model's generalization ability. If a model performs significantly better on the
training data than on the validation data, it is an indication of overfitting.
3. Early Stopping: Early stopping is a strategy that monitors the model's
performance during training and stops the training process before overfitting
occurs. It involves monitoring the validation error and stopping the training
when the error starts increasing, indicating that the model has started to overfit
the training data.
4. Feature Selection: Feature selection involves identifying the most
informative and relevant features for the model. Removing irrelevant or
redundant features can reduce model complexity and prevent overfitting.

Heuristic Search in Inductive Learning:

Heuristic search is a strategy used in inductive learning to guide the search for
the best hypothesis or model among a space of possible hypotheses. It involves
exploring the space of potential hypotheses by considering specific search
directions or rules based on domain-specific knowledge or heuristics. The goal
is to efficiently find a hypothesis that fits the available data well and generalizes
to new, unseen instances.

Heuristic search algorithms in inductive learning employ various techniques,


such as:

1. Greedy Search: Greedy search algorithms iteratively make locally


optimal choices at each step of the search. They prioritize immediate gains or

Downloaded by Pavankumar Voore ([email protected])


improvements without considering the long-term consequences. Greedy
algorithms can be efficient but may not always find the globally optimal
solution.
2. Genetic Algorithms: Genetic algorithms are inspired by the process of
natural evolution. They maintain a population of candidate solutions
(hypotheses) and apply genetic operators (selection, crossover, mutation) to
generate new candidate solutions. Genetic algorithms explore the search space
through a combination of random exploration and exploitation of promising
solutions.
3. Beam Search: Beam search is a search strategy that keeps track of a fixed
number of most promising hypotheses at each stage of the search. It avoids
exhaustive exploration of the entire search space and focuses on the most
promising paths based on certain evaluation criteria or heuristics.
4. Best-First Search: Best-first search algorithms prioritize the most
promising hypotheses based on a heuristic evaluation function. They explore the
search space by expanding the most promising nodes or hypotheses first, guided
by the heuristic estimates of their potential quality.

Heuristic search techniques in inductive learning aim to efficiently navigate the


space of possible hypotheses and find the best-fitting hypothesis based on the
available data. These strategies leverage domain-specific knowledge, heuristics,
or evaluation functions to guide the search process and optimize the learning
outcome

Estimating Generalization Errors:

Estimating generalization errors is a crucial aspect of machine learning that


allows us to assess how well a trained model is likely to perform on unseen
data. Generalization error refers to the difference between a model's

Downloaded by Pavankumar Voore ([email protected])


performance on the training data and its performance on new, unseen data. It
provides an estimate of how well the model can generalize its learned patterns
to make accurate predictions or classifications in real-world scenarios.

Here are some common techniques for estimating generalization errors:

1. Holdout Method: The holdout method involves splitting the available


data into two separate sets: a training set and a test set. The model is trained on
the training set, and its performance is evaluated on the test set. The test set
serves as a proxy for unseen data, and the evaluation metrics obtained on the
test set provide an estimate of the model's generalization error.
2. Cross-Validation: Cross-validation is a technique that estimates the
generalization error by partitioning the available data into multiple subsets or
"folds." The model is trained and evaluated iteratively, each time using a
different combination of training and validation folds. The average performance
across all iterations provides an estimate of the generalization error. Common
cross-validation methods include k-fold cross-validation, stratified k-fold cross-
validation, and leave-one-out cross-validation.
3. Bootstrapping: Bootstrapping is a resampling technique that estimates the
generalization error by creating multiple bootstrap samples from the original
dataset. Each bootstrap sample is generated by randomly selecting data points
with replacement. The model is trained and evaluated on each bootstrap sample,
and the average performance across all iterations provides an estimate of the
generalization error.
4. Out-of-Bag Error (OOB): OOB error is a technique specific to ensemble
methods, such as random forests. In random forests, each decision tree is trained
on a different bootstrap sample. The OOB error is estimated by evaluating the
model's performance on the data points that were not included in the training set

Downloaded by Pavankumar Voore ([email protected])


of each individual tree. The average OOB error across all trees provides an
estimate of the generalization error.
5. Nested Cross-Validation: Nested cross-validation is a technique that
combines cross-validation with an outer loop and an inner loop. The outer loop
performs cross-validation to estimate the generalization error, while the inner
loop performs cross-validation for hyperparameter tuning. This approach allows
for unbiased estimation of the generalization error while selecting the best
hyperparameters.
6. Validation Curve: A validation curve plots the performance of a model on
both the training and validation sets as a function of a specific hyperparameter.
By analyzing the gap between the training and validation performance, we can
estimate the generalization error. If the model performs well on the training data
but poorly on the validation data, it indicates a higher generalization error.

These techniques provide estimates of the generalization error by simulating the


model's performance on unseen data. It is important to note that these estimates
are approximations and depend on the quality and representativeness of the
data. Additionally, it is crucial to ensure that the evaluation data is truly
representative of the target population to obtain accurate estimates of
generalization errors.

Metrics for assessing regression:

When assessing regression models, several metrics are commonly used to


evaluate their performance and quantify the accuracy of predicted continuous
values. Here are some of the key metrics for assessing regression models:

1. Mean Squared Error (MSE): MSE is one of the most widely used metrics
for regression. It calculates the average squared difference between the
predicted values and the true values. The lower the MSE, the better the model's

Downloaded by Pavankumar Voore ([email protected])


performance. However, since MSE is in squared units, it may not be easily
interpretable in the original scale of the target variable.
2. Root Mean Squared Error (RMSE): RMSE is the square root of the MSE,
which provides a metric in the same units as the target variable. It represents the
average deviation between the predicted values and the true values. RMSE is
commonly used as a more interpretable alternative to MSE.
3. Mean Absolute Error (MAE): MAE calculates the average absolute
difference between the predicted values and the true values. It measures the
average magnitude of the errors without considering their direction. MAE is
easy to interpret as it is in the same units as the target variable.
4. R-squared (R²) or Coefficient of Determination: R-squared represents the
proportion of the variance in the target variable that can be explained by the
model. It ranges from 0 to 1, where 0 indicates that the model explains none of
the variance and 1 indicates a perfect fit. R-squared provides an indication of
how well the model captures the variation in the target variable.
5. Mean Absolute Percentage Error (MAPE): MAPE calculates the average
percentage difference between the predicted values and the true values, relative
to the true values. It is often used when the percentage error is more meaningful
than the absolute error. MAPE is particularly useful when dealing with variables
with different scales or when the target variable has significant variation across
its range.
6. Explained Variance Score: The explained variance score quantifies the
proportion of variance in the target variable that is explained by the model. It
represents the improvement of the model's predictions compared to using the
mean value of the target variable as the prediction. The explained variance score
ranges from 0 to 1, with 1 indicating a perfect fit.

It is important to note that the choice of the appropriate evaluation metric


depends on the specific problem and the context in which the regression model

Downloaded by Pavankumar Voore ([email protected])


is being applied. Different metrics may be more relevant or interpretable
depending on the particular requirements and characteristics of the problem at
hand.

Metris for assessing classification

When assessing classification models, several metrics are commonly used to


evaluate their performance in predicting categorical or binary outcomes. These
metrics provide insights into the accuracy, precision, recall, and overall
performance of the model. Here are some key metrics for assessing
classification models:

1. Accuracy: Accuracy is one of the most straightforward metrics,


measuring the proportion of correctly classified instances out of the total
number of instances. It provides an overall measure of the model's performance
but can be misleading if the classes are imbalanced.
2. Precision: Precision calculates the proportion of true positive predictions
out of all positive predictions. It measures the model's ability to correctly
identify positive instances and is particularly useful when the cost of false
positives is high. A high precision indicates a low rate of false positives.
3. Recall (Sensitivity or True Positive Rate): Recall calculates the
proportion of true positive predictions out of all actual positive instances. It
measures the model's ability to capture all positive instances and is particularly
useful when the cost of false negatives is high. A high recall indicates a low rate
of false negatives.
4. F1 Score: The F1 score combines precision and recall into a single
metric, balancing the trade-off between the two. It is the harmonic mean of

Downloaded by Pavankumar Voore ([email protected])


precision and recall, providing a balanced measure of the model's overall
accuracy. The F1 score is useful when the class distribution is imbalanced.
5. Specificity (True Negative Rate): Specificity calculates the proportion of
true negative predictions out of all actual negative instances. It measures the
model's ability to correctly identify negative instances and is particularly
relevant in binary classification problems with imbalanced classes.
6. Area Under the Receiver Operating Characteristic Curve (AUC-ROC):
AUC-ROC quantifies the performance of a binary classification model across
different classification thresholds. It plots the true positive rate (sensitivity)
against the false positive rate (1 - specificity) at various threshold settings. A
higher AUC-ROC indicates better overall classification performance, regardless
of the threshold chosen.
7. Confusion Matrix: A confusion matrix provides a tabular representation
of the model's predicted classes compared to the true classes. It shows the true
positives, true negatives, false positives, and false negatives, enabling a more
detailed analysis of the model's performance.

These metrics help evaluate different aspects of a classification model's


performance, such as its accuracy, ability to correctly identify positive or
negative instances, and the balance between precision and recall. The choice of
metric depends on the specific problem, the class distribution, and the relative
importance of different types of errors in the context of the application. It is
often advisable to consider multiple metrics to gain a comprehensive
understanding of the model's performance
UNIT-III

Statistical Learning:

Statistical learning, also known as statistical machine learning, is a subfield of


machine learning that focuses on developing and applying statistical models and

Downloaded by Pavankumar Voore ([email protected])

You might also like