Unit-1 ML
Unit-1 ML
UNIT-1
Machine learning
Machine learning is a growing technology which enables computers to learn automatically from past
data. Machine learning uses various algorithms for building mathematical models and making
predictions using historical data or information. Currently, it is being used for various tasks such
as image recognition, speech recognition, email filtering, Facebook auto-tagging, recommender
system, and many more.
Machine Learning is said as a subset of artificial intelligence that is mainly concerned with
the development of algorithms which allow a computer to learn from the data and past
experiences on their own. The term machine learning was first introduced by Arthur
Samuel in 1959. We can define it in a summarized way as:
Machine learning enables a machine to automatically learn from data, improve performance
from experiences, and predict things without being explicitly programmed.
With the help of sample historical data, which is known as training data, machine learning
algorithms build a mathematical model that helps in making predictions or decisions
without being explicitly programmed. Machine learning brings computer science and
statistics together for creating predictive models. Machine learning constructs or uses the
algorithms that learn from historical data. The more we will provide the information, the
higher will be the performance.
A machine has the ability to learn if it can improve its performance by gaining more data.
Suppose we have a complex problem, where we need to perform some predictions, so instead
of writing a code for it, we just need to feed the data to generic algorithms, and with the help
of these algorithms, machine builds the logic as per the data and predict the output. Machine
learning has changed our way of thinking about the problem. The below block diagram
explains the working of Machine Learning algorithm:
Features of Machine Learning:
o Machine learning uses data to detect various patterns in a given dataset.
o It can learn from past data and improve automatically.
o It is a data-driven technology.
Machine learning is much similar to data mining as it also deals with the huge amount
of the data.
1. Supervised learning
2. Unsupervised learning
3. Reinforcement learning
1) Supervised Learning
Supervised learning is a type of machine learning method in which we provide sample
labeled data to the machine learning system in order to train it, and on that basis, it predicts
the output.
The system creates a model using labeled data to understand the datasets and learn about each
data, once the training and processing are done then we test the model by providing a sample
data to check whether it is predicting the exact output or not.
The goal of supervised learning is to map input data with the output data. The supervised
learning is based on supervision, and it is the same as when a student learns things in the
supervision of the teacher. The example of supervised learning is spam filtering.
o Classification
o Regression
2) Unsupervised Learning
Unsupervised learning is a learning method in which a machine learns without any
supervision.
The training is provided to the machine with the set of data that has not been labeled,
classified, or categorized, and the algorithm needs to act on that data without any supervision.
The goal of unsupervised learning is to restructure the input data into new features or a group
of objects with similar patterns.
In unsupervised learning, we don't have a predetermined result. The machine tries to find
useful insights from the huge amount of data. It can be further classifieds into two categories
of algorithms:
o Clustering
o Association
3) Reinforcement Learning
Reinforcement learning is a feedback-based learning method, in which a learning agent gets a
reward for each right action and gets a penalty for each wrong action. The agent learns
automatically with these feedbacks and improves its performance. In reinforcement learning,
the agent interacts with the environment and explores it. The goal of an agent is to get the
most reward points, and hence, it improves its performance.
The robotic dog, which automatically learns the movement of his arms, is an example of
Reinforcement learning.
o 1834: In 1834, Charles Babbage, the father of the computer, conceived a device that
could be programmed with punch cards. However, the machine was never built, but
all modern computers rely on its logical structure.
o 1936: In 1936, Alan Turing gave a theory that how a machine can determine and
execute a set of instructions.
o 1950: In 1950, Alan Turing published a seminal paper, "Computer Machinery and
Intelligence," on the topic of artificial intelligence. In his paper, he asked, "Can
machines think?"
o 1952: Arthur Samuel, who was the pioneer of machine learning, created a program
that helped an IBM computer to play a checkers game. It performed better more it
played.
o 1959: In 1959, the term "Machine Learning" was first coined by Arthur Samuel.
o The duration of 1974 to 1980 was the tough time for AI and ML researchers, and this
duration was called as AI winter.
o In this duration, failure of machine translation occurred, and people had reduced their
interest from AI, which led to reduced funding by the government to the researches.
o 1959: In 1959, the first neural network was applied to a real-world problem to remove
echoes over phone lines using an adaptive filter.
o 1985: In 1985, Terry Sejnowski and Charles Rosenberg invented a neural
network NETtalk, which was able to teach itself how to correctly pronounce 20,000
words in one week.
o 1997: The IBM's Deep blue intelligent computer won the chess game against the
chess expert Garry Kasparov, and it became the first computer which had beaten a
human chess expert.
Machine Learning at 21st century
o 2006: In the year 2006, computer scientist Geoffrey Hinton has given a new name to
neural net research as "deep learning," and nowadays, it has become one of the most
trending technologies.
o 2012: In 2012, Google created a deep neural network which learned to recognize the
image of humans and cats in YouTube videos.
o 2014: In 2014, the Chabot "Eugen Goostman" cleared the Turing Test. It was the
first Chabot who convinced the 33% of human judges that it was not a machine.
o 2014: DeepFace was a deep neural network created by Facebook, and they claimed
that it could recognize a person with the same precision as a human can do.
o 2016: AlphaGo beat the world's number second player Lee sedol at Go game. In
2017 it beat the number one player of this game Ke Jie.
o 2017: In 2017, the Alphabet's Jigsaw team built an intelligent system that was able to
learn the online trolling. It used to read millions of comments of different websites to
learn to stop online trolling.
Task
Performance Measure
Experience
Certain examples that efficiently defines the well-posed learning problem are –
1. A checkers learning problem
Task – Playing checkers game
Performance Measure – percent of games won against opposer
Experience – playing implementation games against itself
2. Handwriting Recognition Problem
Task – Acknowledging handwritten words within portrayal
Performance Measure – percent of words accurately classified
Experience – a directory of handwritten words with given classifications
3. A Robot Driving Problem
Task – driving on public four-lane highways using sight scanners
Performance Measure – average distance progressed before a fallacy
Experience – order of images and steering instructions noted down while
observing a human driver
Steps for Designing Learning System are:
Step 1) Choosing the Training Experience: The very important and first
task is to choose the training data or training experience which will be fed to
the Machine Learning Algorithm. It is important to note that the data or
experience that we fed to the algorithm must have a significant impact on the
Success or Failure of the Model. So Training data or experience should be
chosen wisely.
Step 2- Choosing target function: The next important step is choosing the
target function. It means according to the knowledge fed to the algorithm the
machine learning will choose NextMove function which will describe what
type of legal moves should be taken. For example : While playing chess with
the opponent, when opponent will play then the machine learning algorithm
will decide what be the number of possible legal moves taken in order to get
success.
Step 3- Choosing Representation for Target function: When the machine
algorithm will know all the possible legal moves the next step is to choose the
optimized move using any representation i.e. using linear Equations,
Hierarchical Graph Representation, Tabular form etc. The NextMove function
will move the Target move like out of these move which will provide more
success rate. For Example : while playing chess machine have 4 possible
moves, so the machine will choose that optimized move which will provide
success to it.
Step 4- Choosing Function Approximation Algorithm: An optimized move
cannot be chosen just with the training data. The training data had to go
through with set of example and through these examples the training data will
approximates which steps are chosen and after that machine will provide
feedback on it. For Example : When a training data of Playing chess is fed to
algorithm so at that time it is not machine algorithm will fail or get success
and again from that failure or success it will measure while next move what
step should be chosen and what is its success rate.
Step 5- Final Design: The final design is created at last when system goes
from number of examples , failures and success , correct and incorrect
decision and what will be the next step etc. Example: DeepBlue is an
intelligent computer which is ML-based won chess game against the chess
expert Garry Kasparov, and it became the first computer which had beaten a
human chess expert.
The given figure illustrates the typical diagram of Biological Neural Network.
The typical Artificial Neural Network looks something like the given figure.
Dendrites from Biological Neural Network represent inputs in Artificial Neural
Networks, cell nucleus represents Nodes, synapse represents Weights, and Axon
represents Output.
Relationship between Biological neural network and artificial neural network:
Dendrites Inputs
Synapse Weights
Axon Output
As the name suggests, it accepts inputs in several different formats provided by the
programmer.
Hidden Layer:
The hidden layer presents in-between input and output layers. It performs all the calculations
to find hidden features and patterns.
Output Layer:
The input goes through a series of transformations using the hidden layer, which finally
results in output that is conveyed using this layer.
The artificial neural network takes input and computes the weighted sum of the inputs and
includes a bias. This computation is represented in the form of a transfer function.
Artificial neural networks have a numerical value that can perform more than one task
simultaneously.
After ANN training, the information may produce output even with inadequate data. The loss
of performance here relies upon the significance of missing data.
For ANN is to be able to adapt, it is important to determine the examples and to encourage
the network according to the desired output by demonstrating these examples to the network.
The succession of the network is directly proportional to the chosen instances, and if the
event can't appear to the network in all its aspects, it can produce false output.
Extortion of one or more cells of ANN does not prohibit it from generating output, and this
feature makes the network fault-tolerance.
There is no particular guideline for determining the structure of artificial neural networks.
The appropriate network structure is accomplished through experience, trial, and error.
It is the most significant issue of ANN. When ANN produces a testing solution, it does not
provide insight concerning why and how. It decreases trust in the network.
Hardware dependence:
Artificial neural networks need processors with parallel processing power, as per their
structure. Therefore, the realization of the equipment is dependent.
ANNs can work with numerical data. Problems must be converted into numerical values
before being introduced to ANN. The presentation mechanism to be resolved here will
directly impact the performance of the network. It relies on the user's abilities.
It does it by finding some similar patterns in the unlabelled dataset such as shape, size, color,
behavior, etc., and divides them as per the presence and absence of those similar patterns.
After applying this clustering technique, each cluster or group is provided with a cluster-ID.
ML system can use this id to simplify the processing of large and complex datasets.
The clustering technique can be widely used in various tasks. Some most common uses of
this technique are:
o Market Segmentation
o Statistical data analysis
o Social network analysis
o Image segmentation
o Anomaly detection, etc.
Apart from these general usages, it is used by the Amazon in its recommendation system to
provide the recommendations as per the past search of products. Netflix also uses this
technique to recommend the movies and web-series to its users as per the watch history.
The below diagram explains the working of the clustering algorithm. We can see the different
fruits are divided into several groups with similar properties.
Types of Clustering Methods
The clustering methods are broadly divided into Hard clustering (datapoint belongs to only
one group) and Soft Clustering (data points can belong to another group also). But there are
also other various approaches of Clustering exist. Below are the main clustering methods
used in Machine learning:
1. Partitioning Clustering
2. Density-Based Clustering
3. Distribution Model-Based Clustering
4. Hierarchical Clustering
5. Fuzzy Clustering
o Decision Tree is a Supervised learning technique that can be used for both classification
and Regression problems, but mostly it is preferred for solving Classification problems. It is a
tree-structured classifier, where internal nodes represent the features of a dataset,
branches represent the decision rules and each leaf node represents the outcome.
o In a Decision tree, there are two nodes, which are the Decision Node and Leaf
Node. Decision nodes are used to make any decision and have multiple branches, whereas
Leaf nodes are the output of those decisions and do not contain any further branches.
o The decisions or the test are performed on the basis of features of the given dataset.
o It is a graphical representation for getting all the possible solutions to a problem/decision
based on given conditions.
o It is called a decision tree because, similar to a tree, it starts with the root node, which
expands on further branches and constructs a tree-like structure.
o In order to build a tree, we use the CART algorithm, which stands for Classification and
Regression Tree algorithm.
o A decision tree simply asks a question, and based on the answer (Yes/No), it further split the
tree into subtrees.
o Below diagram explains the general structure of a decision tree:
Note: A decision tree can contain categorical data (YES/NO) as well as numeric data.
o Decision Trees usually mimic human thinking ability while making a decision, so it is easy to
understand.
o The logic behind the decision tree can be easily understood because it shows a tree-like
structure.
Leaf Node: Leaf nodes are the final output node, and the tree cannot be segregated further after
getting a leaf node.
Splitting: Splitting is the process of dividing the decision node/root node into sub-nodes according
to the given conditions.
Parent/Child node: The root node of the tree is called the parent node, and other nodes are called
the child nodes.
In a decision tree, for predicting the class of the given dataset, the algorithm starts from the
root node of the tree. This algorithm compares the values of root attribute with the record
(real dataset) attribute and, based on the comparison, follows the branch and jumps to the
next node.
For the next node, the algorithm again compares the attribute value with the other sub-nodes
and move further. It continues the process until it reaches the leaf node of the tree. The
complete process can be better understood using the below algorithm:
o Step-1: Begin the tree with the root node, says S, which contains the complete
dataset.
o Step-2: Find the best attribute in the dataset using Attribute Selection Measure
(ASM).
o Step-3: Divide the S into subsets that contains possible values for the best attributes.
o Step-4: Generate the decision tree node, which contains the best attribute.
o Step-5: Recursively make new decision trees using the subsets of the dataset created
in step -3. Continue this process until a stage is reached where you cannot further
classify the nodes and called the final node as a leaf node.
Example: Suppose there is a candidate who has a job offer and wants to decide whether he
should accept the offer or Not. So, to solve this problem, the decision tree starts with the root
node (Salary attribute by ASM). The root node splits further into the next decision node
(distance from the office) and one leaf node based on the corresponding labels. The next
decision node further gets split into one decision node (Cab facility) and one leaf node.
Finally, the decision node splits into two leaf nodes (Accepted offers and Declined offer).
Consider the below diagram:
Advantages of the Decision Tree
o It is simple to understand as it follows the same process which a human follow while
making any decision in real-life.
o It can be very useful for solving decision-related problems.
o It helps to think about all the possible outcomes for a problem.
o There is less requirement of data cleaning compared to other algorithms.
It is also called a Bayes network, belief network, decision network, or Bayesian model.
Bayesian networks are probabilistic, because these networks are built from a probability
distribution, and also use probability theory for prediction and anomaly detection.
Real world applications are probabilistic in nature, and to represent the relationship between
multiple events, we need a Bayesian network. It can also be used in various tasks
including prediction, anomaly detection, diagnostics, automated insight, reasoning, time
series prediction, and decision making under uncertainty.
Bayesian Network can be used for building models from data and experts opinions, and it
consists of two parts:
The generalized form of Bayesian network that represents and solve decision problems under
uncertain knowledge is known as an Influence diagram.
A Bayesian network graph is made up of nodes and Arcs (directed links), where:
Note: The Bayesian network graph does not contain any cyclic graph. Hence, it is known as
a directed acyclic graph or DAG.
o Causal Component
o Actual numbers
Each node in the Bayesian network has condition probability distribution P(Xi |Parent(Xi) ),
which determines the effect of the parent on that node.
The goal of the SVM algorithm is to create the best line or decision boundary that can
segregate n-dimensional space into classes so that we can easily put the new data point in the
correct category in the future. This best decision boundary is called a hyperplane.
SVM chooses the extreme points/vectors that help in creating the hyperplane. These extreme
cases are called as support vectors, and hence algorithm is termed as Support Vector
Machine. Consider the below diagram in which there are two different categories that are
classified using a decision boundary or hyperplane:
Example: SVM can be understood with the example that we have used in the KNN
classifier. Suppose we see a strange cat that also has some features of dogs, so if we want a
model that can accurately identify whether it is a cat or dog, so such a model can be created
by using the SVM algorithm. We will first train our model with lots of images of cats and
dogs so that it can learn about different features of cats and dogs, and then we test it with this
strange creature. So as support vector creates a decision boundary between these two data
(cat and dog) and choose extreme cases (support vectors), it will see the extreme case of cat
and dog. On the basis of the support vectors, it will classify it as a cat. Consider the below
diagram:s
SVM algorithm can be used for Face detection, image classification, text
categorization, etc.
Types of SVM
SVM can be of two types:
o Linear SVM: Linear SVM is used for linearly separable data, which means if a
dataset can be classified into two classes by using a single straight line, then such data
is termed as linearly separable data, and classifier is used called as Linear SVM
classifier.
o Non-linear SVM: Non-Linear SVM is used for non-linearly separated data, which
means if a dataset cannot be classified by using a straight line, then such data is
termed as non-linear data and classifier used is called as Non-linear SVM classifier.
The genetic algorithm works on the evolutionary generational cycle to generate high-quality
solutions. These algorithms use different operations that either enhance or replace the
population to give an improved fit solution.
It basically involves five phases to solve the complex optimization problems, which are given
as below:
o Initialization
o Fitness Assignment
o Selection
o Reproduction
o Termination
Data plays a significant role in the machine learning process. One of the
significant issues that machine learning professionals face is the absence of
good quality data. Unclean and noisy data can make the whole process
extremely exhausting. We don’t want our algorithm to make inaccurate or
faulty predictions. Hence the quality of data is essential to enhance the output.
2. Non-representative training data
To make sure our training model is generalized well or not, we have to ensure
that sample training data must be representative of new cases that we need to
generalize. The training data must cover all cases that are already occurred as
well as occurring.
The major issue that comes while using machine learning algorithms is the lack of quality as
well as quantity of data. Although data plays a vital role in the processing of machine
learning algorithms,
o Noisy Data- It is responsible for an inaccurate prediction that affects the decision as
well as accuracy in classification tasks.
o Incorrect data- It is also responsible for faulty programming and results obtained in
machine learning models. Hence, incorrect data may affect the accuracy of the results
also.
o Generalizing of output data- Sometimes, it is also found that generalizing output
data becomes complex, which results in comparatively poor future actions.
4. Overfitted
Overfitting is one of the most common issues faced by Machine Learning engineers and data
scientists. Whenever a machine learning model is trained with a huge amount of data, it starts
capturing noise and inaccurate data into the training data set. It negatively affects the
performance of the model.
5. Underfitting:
Underfitting is just the opposite of overfitting. Whenever a machine learning model is trained with
fewer amounts of data, and as a result, it provides incomplete and inaccurate data and destroys the
accuracy of the machine learning model.
It deals with understanding and finding hidden It is a subfield of data science that enables the machine
patterns or useful insights from the data, which learn from the past data and experiences automatically.
helps to take smarter business decisions.
It is used for discovering insights from the data. It is used for making predictions and classifying the res
for new data points.
It is a broad term that includes various steps to It is used in the data modeling step of the data science a
create a model for a given problem and deploy the complete process.
model.
A data scientist needs to have skills to use big Machine Learning Engineer needs to have skills such
data tools like Hadoop, Hive and Pig, statistics, computer science fundamentals, programming skills
programming in Python, R, or Scala. Python or R, statistics and probability concepts, etc.
It can work with raw, structured, and unstructured It mostly requires structured data to work on.
data.
Data scientists spent lots of time in handling the ML engineers spend a lot of time for managing
data, cleansing the data, and understanding its complexities that occur during the implementation
patterns. algorithms and mathematical concepts behind that
o