Unit I
Unit I
Machine Learning
What is the need of machine learning - Simply put, machine learning allows the
user to feed a computer algorithm an immense amount of data and have the computer
analyze and make data-driven recommendations and decisions based on only the input
data.
As per McKinsey & Co., machine learning is based on algorithms that can learn from
data without relying on rules-based programming.
Tom Mitchell’s book on machine learning says “A computer program is said to learn
from experience E with respect to some class of tasks T and performance measure P,
if its performance at tasks in T, as measured by P, improves with experience E.”
(or)
(or)
A Machine Learning system learns from historical data, builds the prediction
models, and whenever it receives new data, predicts the output for it. The
accuracy of predicted output depends upon the amount of data, as the huge amount of
data helps to build a better model which predicts the output more accurately.
Machine learning accesses vast amounts of data (both structured and unstructured)
and learns from it to predict the future. It learns from the data by using multiple
algorithms and techniques. Below is a diagram that shows how a machine learns from
data.
Now that you have been introduced to the basics of machine learning and how it
works, let’s see the different types of machine learning methods.
1. Supervised Learning
The system creates a model using labeled data to understand the datasets
and learn about each data, once the training and processing are done then we
test the model by providing a sample data to check whether it is predicting
the exact output or not.
The goal of supervised learning is to map input data with the output data.
The supervised learning is based on supervision, and it is the same as when a
student learns things in the supervision of the teacher. The example of
supervised learning is spam filtering.
Spam Filtering,
o Random Forest
o Decision Trees
o Logistic Regression
o Support vector Machines
o Linear Regression
o Regression Trees
o Non-Linear Regression
o Bayesian Linear Regression
o Polynomial Regression
Note: We will discuss these above all algorithms in detail in later chapters.
(or)
o With the help of supervised learning, the model can predict the output on
the basis of prior experiences.
o In supervised learning, we can have an exact idea about the classes of objects.
o Supervised learning model helps us to solve various real-world problems such
as fraud detection, spam filtering, etc.
o Supervised learning models are not suitable for handling the complex tasks.
o Supervised learning cannot predict the correct output if the test data is
different from the training dataset.
o Training required lots of computation times.
o In supervised learning, we need enough knowledge about the classes of
object.
2. Unsupervised Learning
Unsupervised learning algorithms employ unlabeled data to discover patterns from the
data on their own. The systems are able to identify hidden features from the input data
provided. Once the data is more readable, the patterns and similarities become more
evident.
(or)
(or)
o K-means clustering
o KNN (k-nearest neighbors)
o Hierarchal clustering
o Anomaly detection
o Neural Networks
o Principle Component Analysis
o Independent Component Analysis
o Apriori algorithm
o Singular value decomposition
3. Reinforcement Learning
Product recommendations
Deep learning is a subset of machine learning that deals with algorithms inspired by
the structure and function of the human brain. Deep learning algorithms can work
with an enormous amount of both structured and unstructured data. Deep learning’s
core concept lies in artificial neural networks, which enable machines to make
decisions.
The major difference between deep learning vs machine learning is the way data is
presented to the machine. Machine learning algorithms usually require structured data,
whereas deep learning networks work on multiple layers of artificial neural networks.
The network has an input layer that accepts inputs from the data. The hidden layer is
used to find any hidden features from the data. The output layer then provides the
expected output.
Here is an example of a neural network that uses large sets of unlabeled data of eye
retinas. The network model is trained on this data to find out whether or not a person
has diabetic retinopathy.
Now that we have an idea of what deep learning is, let’s see how it works.
3. The activation function takes the “weighted sum of input” as the input to the
function, adds a bias, and decides whether the neuron should be fired or not.
5. The model output is compared with the actual output. After training the neural
network, the model uses the backpropagation method to improve the performance
of the network. The cost function helps to reduce the error rate.
In the following example, deep learning and neural networks are used to identify the
number on a license plate. This technique is used by many countries to identify rules
violators and speeding vehicles.
Convolutional Neural Network (CNN) - CNN is a class of deep neural networks most
commonly used for image analysis.
Generative Adversarial Network (GAN) - GAN are algorithmic architectures that use
two neural networks to create new, synthetic instances of data that pass for real data.
A GAN trained on photographs can generate new photographs that look at least
superficially authentic to human observers.
Deep Belief Network (DBN) - DBN is a generative graphical model that is composed
of multiple layers of latent variables called hidden units. Each layer is interconnected,
but the units are not.
Music generation
Image coloring
Object detection
Machine learning accesses vast amounts of data (both structured and unstructured)
and learns from it to predict the future. It learns from the data by using multiple
algorithms and techniques. Below is a diagram that shows how a machine learns from
data.
Now that you have been introduced to the basics of machine learning and how it
works, let’s see the different types of machine learning methods.
Supervised learning algorithms are trained using Unsupervised learning algorithms are trained using
L Labelled data. unlabeled data.
Supervised learning model takes direct feedback to Unsupervised learning model does not take any
check if it is predicting correct output or not. feedback.
Supervised learning model predicts the output. Unsupervised learning model finds the hidden
patterns in data.
In supervised learning, input data is provided to the In unsupervised learning, only input data is provided
model along with the output. to the model.
The goal of supervised learning is to train the model The goal of unsupervised learning is to find the
so that it can predict the output when it is given new hidden patterns and useful insights from the
data. unknown dataset.
Supervised learning needs supervision to train the Unsupervised learning does not need any
model. supervision to train the model.
Supervised learning can be used for those cases Unsupervised learning can be used for those cases where we
where k have know the input as well as only input data and no corresponding output data.
Corresponding outputs.
Supervised learning model produces an accurate Unsupervised learning model may give less accurate
result. result as compared to supervised learning.
S Supervised learning is not close to true Artificial Unsupervised learning is more close to the true
ntelligence as in this, we first train the model for each Artificial Intelligence as it learns similarly as a child
data, and then learns daily routine things by his experiences.
only it can predict the correct output.
t includes various algorithms such as Linear Regression, It includes various algorithms such as Clustering, KNN, and
Logistic Regression, Support Vector Machine, Multi- Apriori algorithm.
class Classification, Decision tree, Bayesian Logic, etc.
There are a lot of challenges that machine learning professionals face to inculcate
ML skills and create an application from scratch. we will discuss seven major
challenges faced by machine learning professionals.
-> Data plays a significant role in the machine learning process. One of the
significant issues that machine learning professionals face is the absence of good
quality data.
-> Unclean and noisy data can make the whole process extremely exhausting. We
don’t want our algorithm to make inaccurate or faulty predictions.
-> Hence the quality of data is essential to enhance the output. Therefore, we need
to ensure that the process of data preprocessing which includes removing outliers,
filtering missing values, and removing unwanted features, is done with the utmost
level of perfection.
-> Overfitting refers to a machine learning model trained with a massive amount of
data that negatively affect its performance.
-> It is like trying to fit in Oversized jeans. Unfortunately, this is one of the
significant issues faced by machine learning professionals.
-> This means that the algorithm is trained with noisy and biased data, which will
affect its overall performance.
-> Let’s understand this with the help of an example. Let’s consider a model trained
to differentiate between a cat, a rabbit, a dog, and a tiger. The training data
contains 1000 cats, 1000 dogs, 1000 tigers, and 4000 Rabbits. Then there is a
considerable probability that it will identify the cat as a rabbit. In this example, we
had a vast amount of data, but it was biased; hence the prediction was negatively
affected.
We can tackle this issue by:
-> The machine learning industry is young and is continuously changing. Rapid hit
and trial experiments are being carried on.
-> The process is transforming, and hence there are high chances of error which
makes the learning complex.
-> It includes analyzing the data, removing data bias, training data, applying
complex mathematical calculations, and a lot more. Hence it is a really complicated
process which is another big challenge for Machine learning professionals.
-> The most important task you need to do in the machine learning process is to
train the data to achieve an accurate output. Less amount training data will
produce inaccurate or too biased predictions.
-> Let us understand this with the help of an example. Consider a machine learning
algorithm similar to training a child.
-> One day you decided to explain to a child how to distinguish between an apple
and a watermelon. You will take an apple and a watermelon and show him the
difference between both based on their color, shape, and taste.
-> In this way, soon, he will attain perfection in differentiating between the two.
But on the other hand, a machine-learning algorithm needs a lot of data to
distinguish.
-> For complex problems, it may even require millions of data to be trained.
Therefore we need to ensure that Machine learning algorithms are trained with
sufficient amounts of data.
6. Slow Implementation
-> This is one of the common issues faced by machine learning professionals. The
machine learning models are highly efficient in providing accurate results, but it
takes a tremendous amount of time.
-> Slow programs, data overload, and excessive requirements usually take a lot of
time to provide accurate results.
-> Further, it requires constant monitoring and maintenance to deliver the best
output.
-> So you have found quality data, trained it amazingly, and the predictions are
really concise and accurate.
-> Yay, you have learned how to create a machine learning algorithm!! But wait,
there is a twist; the model may become useless in the future as data grows.
-> The best model of the present may become inaccurate in the coming Future and
require further rearrangement.
-> So you need regular monitoring and maintenance to keep the algorithm working.
This is one of the most exhausting issues faced by machine learning professionals.
-> As we know that generalized output data is mandatory for any machine learning
model; hence, regular monitoring and maintenance become compulsory for the
same.
-> Different results for different actions require data change; hence editing of codes
as well as resources for monitoring them also become necessary.
-> Although Machine Learning and Artificial Intelligence are continuously growing in
the market, still these industries are fresher in comparison to others.
-> The absence of skilled resources in the form of manpower is also an issue. Hence,
we need manpower having in-depth knowledge of mathematics, science, and
technologies for developing and managing scientific substances for machine learning.
-> To identify the customers who paid for the recommendations shown by the model
and who don't even check them.
-> Hence, an algorithm is necessary to recognize the customer behavior and trigger a
relevant recommendation for the user based on past experience.
-> Data Biasing is also found a big challenge in Machine Learning. These errors exist
when certain elements of the dataset are heavily weighted or need more importance
than others.
-> Biased data leads to inaccurate results, skewed outcomes, and other analytical
errors. However, we can resolve this error by determining where data is actually
biased in the dataset. Further, take necessary steps to reduce it.
-> Hence, a lack of explain ability is also found in machine learning algorithms which
reduce the credibility of the algorithms.