ML Unit-1 (CEC)
ML Unit-1 (CEC)
It is a data-driven technology.
Supervised learning
Unsupervised learning
Reinforcement learning
Supervised learning is a type of machine learning method in
which we provide sample labeled data to the machine learning
system in order to train it, and on that basis, it predicts the
output.
The system creates a model using labeled data to understand
the datasets and learn about each data, once the training and
processing are done then we test the model by providing a
sample data to check whether it is predicting the exact output
or not.
The goal of supervised learning is to map input data with the
output data. The supervised learning is based on supervision,
and it is the same as when a student learns things in the
supervision of the teacher. The example of supervised learning
is spam filtering.
Supervised learning can be grouped further in two categories
of algorithms:
Classification
Regression
How does supervised machine learning work?
Supervised learning algorithms are good for the
following tasks:
Binary classification: Dividing data into two
categories.
Multi-class classification: Choosing more than two
classifications.
Regression Modeling: Predicting continuous values.
Assembling: Combining the predictions of multiple
machine learning models to produce an accurate
prediction.
Unsupervised learning is a learning method in which a machine
learns without any supervision.
Missing Values
Duplicate data
Invalid data
Noise
So, we use various filtering techniques to clean the data.
It is mandatory to detect and remove the above issues
because it can negatively affect the quality of the
outcome.
Now the cleaned and prepared data is passed on to
the analysis step. This step involves:
Weights: weights are the real values that are attached with each
input/feature and they convey the importance of that corresponding
Attribute in predicting the final output.
Bias: Bias is used for shifting the activation function towards left or right,
you can compare this to y-intercept in the line equation.
Healthcare
The curve has various applications in the healthcare sector. It can be
used to detect cancer disease in patients. It does this by using false
positive and false negative rates, and accuracy depends on the
threshold value used for the curve.
Binary Classification
AUC-ROC curve is mainly used for binary classification problems
to evaluate their performance.
While making predictions, a difference occurs
between prediction values made by the model and
actual values/expected values, and this difference is
known as bias errors or Errors due to bias. It can be
defined as an inability of machine learning algorithms
such as Linear Regression to capture the true
relationship between the data points. Each algorithm
begins with some amount of bias because bias occurs
from assumptions in the model, which makes the target
function simple to learn. A model has either:
Low Bias: A low bias model will make fewer
assumptions about the form of the target function.
High Bias: A model with a high bias makes more
assumptions, and the model becomes unable to
capture the important features of our dataset. A high
bias model also cannot perform well on new data.
Some examples of machine learning algorithms with
low bias are Decision Trees, k-Nearest Neighbours
and Support Vector Machines. At the same time, an
algorithm with high bias is Linear Regression,
Linear Discriminant Analysis and Logistic
Regression.
The variance would specify the amount of variation in
the prediction if the different training data was used. In
simple words, variance tells that how much a random
variable is different from its expected value. Ideally, a
model should not vary too much from one training
dataset to another, which means the algorithm should
be good in understanding the hidden mapping between
inputs and output variables. Variance errors are either
of low variance or high variance.
Low variance means there is a small variation in the
prediction of the target function with changes in the
training data set.
High variance shows a large variation in the
prediction of the target function with changes in the
training dataset.
Low-Bias, Low-Variance:
The combination of low bias and low variance shows
an ideal machine learning model.
High-Bias, High-Variance:
With high bias and high variance, predictions are
inconsistent and also inaccurate on average.
While building the machine learning model, it is
really important to take care of bias and variance in
order to avoid overfitting and underfitting in the
model. If the model is very simple with fewer
parameters, it may have low variance and high bias.
Whereas, if the model has a large number of
parameters, it will have high variance and low bias.
So, it is required to make a balance between bias and
variance errors, and this balance between the bias
error and variance error is known as the Bias-
Variance trade-off.