Data Science-Unit-4 - 05.10.23
Data Science-Unit-4 - 05.10.23
• With the help of supervised learning, the model can predict the output on the
basis of prior experiences.
• In supervised learning, we can have an exact idea about the classes of objects.
• Supervised learning model helps us to solve various real-world problems such
as fraud detection, spam filtering, etc.
• Supervised learning models are not suitable for handling the complex tasks.
• Supervised learning cannot predict the correct output if the test data is different
from the training dataset.
• Training required lots of computation times.
• In supervised learning, we need enough knowledge about the classes of object.
UNSUPERVISED LEARNING
• Unsupervised learning is a type of machine learning in which models
are trained using unlabeled dataset and are allowed to act on that
data without any supervision
• K-means clustering
• KNN (k-nearest neighbors)
• Hierarchal clustering
• Anomaly detection
• Neural Networks
• Principle Component Analysis
• Independent Component Analysis
• Apriori algorithm
• Singular value decomposition
WHAT IS MACHINE LEARNING USED FOR?
• Overfitting occurs when our machine learning model tries to cover all the
data points or more than the required data points present in the given
dataset.
• Because of this, the model starts caching noise and inaccurate values
present in the dataset, and all these factors reduce the efficiency and
accuracy of the model.
• The over fitted model has low bias and high variance.
OVER FITTING AND UNDER FITTING
• Before understanding the overfitting and underfitting, let's understand some
basic term that will help to understand this topic well:
• Signal: It refers to the true underlying pattern of the data that helps the
machine learning model to learn from the data.
• Noise: Noise is unnecessary and irrelevant data that reduces the performance
of the model.
• Bias: Bias is a prediction error that is introduced in the model due to
oversimplifying the machine learning algorithms. Or it is the difference between
the predicted values and the actual values.
• Variance: If the machine learning model performs well with the training
dataset, but does not perform well with the test dataset, then variance occurs.
OVER FITTING AND UNDER FITTING
As we can see from the above graph, the model tries to cover all the data points present
in the scatter plot. It may look efficient, but in reality, it is not so. Because the goal of the
regression model to find the best fit line, but here we have not got any best fit, so, it will
generate the prediction errors.
HOW TO AVOID THE OVERFITTING IN MODEL
As we can see from the above diagram, the model is unable to capture the data points
present in the plot.
How to avoid under fitting:
•By increasing the training time of the model.
•By increasing the number of features.
CORRECTNESS
• Data scientists know that when they build training sets, they need to
watch out for data leakage in order to ensure that a model is only trained
on the correct data.
• Data leakage occurs when models are trained on examples that did not
really occur in the real world.
• In time-series models, data leakage typically is caused by adding features
to your training set that occurred after a given prediction would have
occurred.
• When feature generation, predictions, and label generation occur at
different points in time, data leakage can easily be introduced into your
training sets.
CORRECTNESS
• Low Bias: A low bias model will make fewer assumptions about the form
of the target function.
• High Bias: A model with a high bias makes more assumptions, and the
model becomes unable to capture the important features of our
dataset. A high bias model also cannot perform well on new data.
• Generally, a linear algorithm has a high bias, as it makes them learn fast.
The simpler the algorithm, the higher the bias it has likely to be
introduced. Whereas a nonlinear algorithm often has low bias.
THE BIAS-VARIANCE
• Bias is one type of error that occurs due to wrong assumptions
about data such as assuming data is linear when in reality, data
follows a complex function.
• On the other hand, variance gets introduced with high
sensitivity to variations in training data.
• This also is one type of error since we want to make our model
robust against noise.
THE BIAS-VARIANCE
• Before coming to the mathematical definitions, we need to know about
random variables and functions.
• Let’s say, f(x) is the function which our given data follows. We will build
few models which can be denoted as f\hat(x).
• Each point on this function is a random variable having the number of
values equal to the number of models.
• To correctly approximate the true function f(x), we take expected value of
• f(x) : E [f(x)]
THE BIAS-VARIANCE
• Bias : f - E[f]
• Variance : E[f^2] - E[f]] = E[(f - E[f])^2]
• Let’s see some visuals of what importance both of these terms hold.
THE BIAS-VARIANCE
Trade-off
TRADE-OFF
• In Machine Learning, the performance and complexity of the model not only
depends on certain parameters, assumptions and conditions.
• but also on the quality of data that is used to train the model and that’s
one of the steps that everyone goes through i.e. cleaning and standardizing
the data.
• If the data is not cleaned and standardized then no matter how fine tune
the model parameters and hyper-parameters are, the model will not be
able to provide the best solution.
• If the data is not cleaned and standardized then no matter how fine tune
the model parameters and hyper-parameters are, the model will not be
able to provide the best solution.
TRADE-OFF
SKEWNESS IN DATA
• In simple words, skewness is the measure of how much the probability
distribution of a random variable deviates from the normal distribution
(probability distribution without any skewness).
SKEWNESS IN DATA
• If our data is positively skewed, it means that it has a higher number of
data points having low values.
• So, when we train our model on this data, it will perform better at
predicting data points with lower values as compared to those with higher
values.
• Bias-vs-Variance Trade-Off
• It is one of the important concepts to understand for supervised machine
learning and predictive modeling use cases and the main goal is to
choose a model to train that offers lowest bias versus variance tradeoff
for that dataset or business use case.
Feature Extraction and Selection
FEATURE EXTRACTION
• Feature Extraction
• Feature Extraction is quite a complex concept concerning the
translation of raw data into the inputs that a particular Machine
Learning algorithm requires.
• Features must represent the information of the data in a format that will
best fit the needs of the algorithm that is going to be used to solve the
problem.
• Some of the most popular methods of feature extraction are :
• Bag-of-Words
• TF-IDF
FEATURE EXTRACTION
• Bag of Words: Bag-of-Words is one of the most fundamental methods to
transform tokens into a set of features
• We collect a huge amount of data to train our model and help it to learn better.
• Generally, the dataset consists of noisy data, irrelevant data, and some part of useful
data.
• Moreover, the huge amount of data also slows down the training process of the
model, and with noise and irrelevant data, the model may not predict and perform
well.
• Below are some benefits of using feature selection in machine
learning:
• It helps in avoiding the curse of dimensionality.
• It helps in the simplification of the model so that it can be
easily interpreted by the researchers.
• It reduces the training time.
• It reduces overfitting hence enhance the generalization.
FEATURE SELECTION TECHNIQUES
• In a Decision tree, there are two nodes, which are the Decision Node and
Leaf Node. Decision nodes are used to make any decision and have
multiple branches, whereas Leaf nodes are the output of those decisions
and do not contain any further branches.
• The decisions or the test are performed on the basis of features of the
given dataset.
• It is called a decision tree because, similar to a tree, it starts with the root
node, which expands on further branches and constructs a tree-like
structure.
• In order to build a tree, we use the CART algorithm, which stands
for Classification and Regression Tree algorithm.
• A decision tree simply asks a question, and based on the answer (Yes/No), it
further split the tree into subtrees.
• Below diagram explains the general structure of a decision tree:
• Decision Tree Terminologies
• Root Node: Root node is from where the decision tree starts. It represents the entire
dataset, which further gets divided into two or more homogeneous sets.
• Leaf Node: Leaf nodes are the final output node, and the tree cannot be segregated
further after getting a leaf node.
• Splitting: Splitting is the process of dividing the decision node/root node into sub-
nodes according to the given conditions.
• Pruning: Pruning is the process of removing the unwanted branches from the tree.
• Parent/Child node: The root node of the tree is called the parent node, and other
nodes are called the child nodes.
linear regression
• Machine Learning is a branch of Artificial intelligence that focuses on
the development of algorithms and statistical models that can learn
from and make predictions on data.
where:
Y=The dependent variable
X=The explanatory (independent) variable(s)
a=The y-intercept
b=(beta coefficient) is the slope of the explanatory
variable(s)
u=The regression residual or error term
Applications of linear regression
• Market analysis.
• Financial analysis.
• Sports analysis.
• Environmental health.
• Medicine.
• Least squares.
• Predicting outcomes.
Naive Bayes
• The Naïve Bayes classifier is a supervised machine learning algorithm, which
is used for classification tasks, like text classification.
• Mental state predictions: Using MRI data, naïve bayes has been leveraged to
predict different cognitive states among humans. The goal of this research was
to assist in better understanding hidden cognitive states, particularly among
brain injury patients.