0% found this document useful (0 votes)
624 views42 pages

ML Unit 1

Ml

Uploaded by

226y1a6760
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
624 views42 pages

ML Unit 1

Ml

Uploaded by

226y1a6760
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 42

www.android.universityupdates.in | www.universityupdates.in | https://telegram.

me/jntuh

R22 B.Tech. CSE (AI and ML) Syllabus JNTU Hyderabad


AM502PC MACHINE LEARNING

UNIT - I
Learning –

The Machine Learning Tutorial covers both the fundamentals and more complex ideas
of machine learning. Students and professionals in the workforce can benefit from our
machine learning tutorial.

A rapidly developing field of technology, machine learning allows computers to


automatically learn from previous data. For building mathematical models and making
predictions based on historical data or information, machine learning employs a
variety of algorithms. It is currently being used for a variety of tasks, including speech
recognition, email filtering, auto-tagging on Facebook, a recommender system, and
image recognition.

You will learn about the many different methods of machine learning, including
reinforcement learning, supervised learning, and unsupervised learning, in this
machine learning tutorial. Regression and classification models, clustering techniques,
hidden Markov models, and various sequential models will all be covered.

What is Machine Learning


In the real world, we are surrounded by humans who can learn everything from their
experiences with their learning capability, and we have computers or machines which
work on our instructions. But can a machine also learn from experiences or past data
like a human does? So here comes the role of Machine Learning.

Introduction to Machine Learning

A subset of artificial intelligence known as machine learning focuses primarily on the


creation of algorithms that enable a computer to independently learn from data and

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

previous experiences. Arthur Samuel first used the term "machine learning" in 1959. It
could be summarized as follows:

Without being explicitly programmed, machine learning enables a machine to


automatically learn from data, improve performance from experiences, and predict
things.

Machine learning algorithms create a mathematical model that, without being


explicitly programmed, aids in making predictions or decisions with the assistance of
sample historical data, or training data. For the purpose of developing predictive
models, machine learning brings together statistics and computer science. Algorithms
that learn from historical data are either constructed or utilized in machine learning.
The performance will rise in proportion to the quantity of information we provide.

Types of Machine Learning

Machine learning is a subset of AI, which enables the machine to automatically


learn from data, improve performance from past experiences, and make
predictions. Machine learning contains a set of algorithms that work on a huge
amount of data. Data is fed to these algorithms to train them, and on the basis of
training, they build the model & perform a specific task.

These ML algorithms help to solve different business problems like Regression,


Classification, Forecasting, Clustering, and Associations, etc.

Based on the methods and way of learning, machine learning is divided into mainly
four types, which are:

1. Supervised Machine Learning


2. Unsupervised Machine Learning
3. Semi-Supervised Machine Learning
4. Reinforcement Learning

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

. Supervised Machine Learning


As its name suggests, Supervised machine learning is based on supervision. It means
in the supervised learning technique, we train the machines using the "labelled"
dataset, and based on the training, the machine predicts the output. Here, the labelled
data specifies that some of the inputs are already mapped to the output. More
preciously, we can say; first, we train the machine with the input and corresponding
output, and then we ask the machine to predict the output using the test dataset.

Let's understand supervised learning with an example. Suppose we have an input


dataset of cats and dog images. So, first, we will provide the training to the machine
to understand the images, such as the shape & size of the tail of cat and dog, Shape
of eyes, colour, height (dogs are taller, cats are smaller), etc. After completion of
training, we input the picture of a cat and ask the machine to identify the object and
predict the output. Now, the machine is well trained, so it will check all the features of
the object, such as height, shape, colour, eyes, ears, tail, etc., and find that it's a cat. So,
it will put it in the Cat category. This is the process of how the machine identifies the
objects in Supervised Learning.

The main goal of the supervised learning technique is to map the input
variable(x) with the output variable(y). Some real-world applications of supervised
learning are Risk Assessment, Fraud Detection, Spam filtering, etc.

o Classification
o Regression

a) Classification

Classification algorithms are used to solve the classification problems in which the
output variable is categorical, such as "Yes" or No, Male or Female, Red or Blue, etc.
The classification algorithms predict the categories present in the dataset. Some real-
world examples of classification algorithms are Spam Detection, Email filtering, etc.

Some popular classification algorithms are given below:

o Random Forest Algorithm


o Decision Tree Algorithm
o Logistic Regression Algorithm
o Support Vector Machine Algorithm

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

b) Regression

Regression algorithms are used to solve regression problems in which there is a linear
relationship between input and output variables. These are used to predict continuous
output variables, such as market trends, weather prediction, etc.

Some popular Regression algorithms are given below:

o Simple Linear Regression Algorithm


o Multivariate Regression Algorithm
o Decision Tree Algorithm
o Lasso Regression

Advantages and Disadvantages of Supervised Learning


Advantages:

o Since supervised learning work with the labelled dataset so we can have an exact idea
about the classes of objects.
o These algorithms are helpful in predicting the output on the basis of prior experience.

Disadvantages:

o These algorithms are not able to solve complex tasks.


o It may predict the wrong output if the test data is different from the training data.
o It requires lots of computational time to train the algorithm.

Unsupervised Machine Learning


Unsupervised learning is different from the Supervised learning technique; as its name
suggests, there is no need for supervision. It means, in unsupervised machine learning,
the machine is trained using the unlabeled dataset, and the machine predicts the
output without any supervision.

In unsupervised learning, the models are trained with the data that is neither classified
nor labelled, and the model acts on that data without any supervision.

The main aim of the unsupervised learning algorithm is to group or categories


the unsorted dataset according to the similarities, patterns, and
differences. Machines are instructed to find the hidden patterns from the input
dataset.

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

Let's take an example to understand it more preciously; suppose there is a basket of


fruit images, and we input it into the machine learning model. The images are totally
unknown to the model, and the task of the machine is to find the patterns and
categories of the objects.

So, now the machine will discover its patterns and differences, such as colour
difference, shape difference, and predict the output when it is tested with the test
dataset.

Categories of Unsupervised Machine Learning


Unsupervised Learning can be further classified into two types, which are given below:

o Clustering
o Association

1) Clustering

The clustering technique is used when we want to find the inherent groups from the
data. It is a way to group the objects into a cluster such that the objects with the most
similarities remain in one group and have fewer or no similarities with the objects of
other groups. An example of the clustering algorithm is grouping the customers by
their purchasing behaviour.

Some of the popular clustering algorithms are given below:

o K-Means Clustering algorithm


o Mean-shift algorithm
o DBSCAN Algorithm
o Principal Component Analysis
o Independent Component Analysis

2) Association

Association rule learning is an unsupervised learning technique, which finds interesting


relations among variables within a large dataset. The main aim of this learning
algorithm is to find the dependency of one data item on another data item and map
those variables accordingly so that it can generate maximum profit. This algorithm is
mainly applied in Market Basket analysis, Web usage mining, continuous
production, etc.

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

Some popular algorithms of Association rule learning are Apriori Algorithm, Eclat,
FP-growth algorithm.

Advantages and Disadvantages of Unsupervised Learning


Algorithm
Advantages:

o These algorithms can be used for complicated tasks compared to the supervised ones
because these algorithms work on the unlabeled dataset.
o Unsupervised algorithms are preferable for various tasks as getting the unlabeled
dataset is easier as compared to the labelled dataset.

Disadvantages:

o The output of an unsupervised algorithm can be less accurate as the dataset is not
labelled, and algorithms are not trained with the exact output in prior.
o Working with Unsupervised learning is more difficult as it works with the unlabelled
dataset that does not map with the output.

Applications of Unsupervised Learning


o Network Analysis: Unsupervised learning is used for identifying plagiarism and
copyright in document network analysis of text data for scholarly articles.
o Recommendation Systems: Recommendation systems widely use unsupervised
learning techniques for building recommendation applications for different web
applications and e-commerce websites.
o Anomaly Detection: Anomaly detection is a popular application of unsupervised
learning, which can identify unusual data points within the dataset. It is used to discover
fraudulent transactions.
o Singular Value Decomposition: Singular Value Decomposition or SVD is used to
extract particular information from the database. For example, extracting information
of each user located at a particular location.

Semi-Supervised Learning
Semi-Supervised learning is a type of Machine Learning algorithm that lies
between Supervised and Unsupervised machine learning. It represents the
intermediate ground between Supervised (With Labelled training data) and

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

Unsupervised learning (with no labelled training data) algorithms and uses the
combination of labelled and unlabeled datasets during the training period.

ADVERTISEMENT

Although Semi-supervised learning is the middle ground between supervised and


unsupervised learning and operates on the data that consists of a few labels, it mostly
consists of unlabeled data. As labels are costly, but for corporate purposes, they may
have few labels. It is completely different from supervised and unsupervised learning
as they are based on the presence & absence of labels.

To overcome the drawbacks of supervised learning and unsupervised learning


algorithms, the concept of Semi-supervised learning is introduced. The main aim
of semi-supervised learning is to effectively use all the available data, rather than only
labelled data like in supervised learning. Initially, similar data is clustered along with an
unsupervised learning algorithm, and further, it helps to label the unlabeled data into
labelled data. It is because labelled data is a comparatively more expensive acquisition
than unlabeled data.

We can imagine these algorithms with an example. Supervised learning is where a


student is under the supervision of an instructor at home and college. Further, if that
student is self-analysing the same concept without any help from the instructor, it
comes under unsupervised learning. Under semi-supervised learning, the student has
to revise himself after analyzing the same concept under the guidance of an instructor
at college.

Advantages and disadvantages of Semi-supervised


Learning
Advantages:

o It is simple and easy to understand the algorithm.


o It is highly efficient.
o It is used to solve drawbacks of Supervised and Unsupervised Learning algorithms.

Disadvantages:

o Iterations results may not be stable.


o We cannot apply these algorithms to network-level data.
o Accuracy is low.

4. Reinforcement Learning

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

Reinforcement learning works on a feedback-based process, in which an AI agent


(A software component) automatically explore its surrounding by hitting & trail,
taking action, learning from experiences, and improving its performance. Agent
gets rewarded for each good action and get punished for each bad action; hence the
goal of reinforcement learning agent is to maximize the rewards.

In reinforcement learning, there is no labelled data like supervised learning, and agents
learn from their experiences only.

ADVERTISEMENT

The reinforcement learning process is similar to a human being; for example, a child
learns various things by experiences in his day-to-day life. An example of
reinforcement learning is to play a game, where the Game is the environment, moves
of an agent at each step define states, and the goal of the agent is to get a high score.
Agent receives feedback in terms of punishment and rewards.

Due to its way of working, reinforcement learning is employed in different fields such
as Game theory, Operation Research, Information theory, multi-agent systems.

A reinforcement learning problem can be formalized using Markov Decision


Process(MDP). In MDP, the agent constantly interacts with the environment and
performs actions; at each action, the environment responds and generates a new state.

Categories of Reinforcement Learning


Reinforcement learning is categorized mainly into two types of methods/algorithms:

o Positive Reinforcement Learning: Positive reinforcement learning specifies


increasing the tendency that the required behaviour would occur again by adding
something. It enhances the strength of the behaviour of the agent and positively
impacts it.
o Negative Reinforcement Learning: Negative reinforcement learning works exactly
opposite to the positive RL. It increases the tendency that the specific behaviour would
occur again by avoiding the negative condition.

Real-world Use cases of Reinforcement Learning


o Video Games:
RL algorithms are much popular in gaming applications. It is used to gain super-human
performance. Some popular games that use RL algorithms are AlphaGO and AlphaGO
Zero.

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

o Resource Management:
The "Resource Management with Deep Reinforcement Learning" paper showed that
how to use RL in computer to automatically learn and schedule resources to wait for
different jobs in order to minimize average job slowdown.
o Robotics:
RL is widely being used in Robotics applications. Robots are used in the industrial and
manufacturing area, and these robots are made more powerful with reinforcement
learning. There are different industries that have their vision of building intelligent
robots using AI and Machine learning technology.
o Text Mining
Text-mining, one of the great applications of NLP, is now being implemented with the
help of Reinforcement Learning by Salesforce company.

Advantages and Disadvantages of Reinforcement Learning


Advantages

o It helps in solving complex real-world problems which are difficult to be solved by


general techniques.
o The learning model of RL is similar to the learning of human beings; hence most
accurate results can be found.
o Helps in achieving long term results.

Disadvantage

o RL algorithms are not preferred for simple problems.


o RL algorithms require huge data and computations.
o Too much reinforcement learning can lead to an overload of states which can weaken
the results.

The Brain and the Neuron


THE BRAIN AND THE NEURON In animals, learning occurs within the brain. If we can understand how
the brain works, then there might be things in there for us to copy and use for our machine learning
systems. While the brain is an impressively powerful and complicated system, the basic building
blocks that it is made up of are fairly simple and easy to understand. We’ll look at them shortly, but
it’s worth noting that in computational terms the brain does exactly what we want. It deals with
noisy and even inconsistent data, and produces answers that are usually correct from very high
dimensional data (such as images) very quickly. All amazing for something that weighs about 1.5 kg
and is losing parts of itself all the time (neurons die as you age at impressive/depressing rates), but

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

Artificial Neural Network Tutorial provides basic and


advanced concepts of ANNs. Our Artificial Neural
Network tutorial is developed for beginners as well as
professions.

The term "Artificial neural network" refers to a


biologically inspired sub-field of artificial intelligence
modeled after the brain. An Artificial neural network is
usually a computational network based on biological
neural networks that construct the structure of the
human brain. Similar to a human brain has neurons interconnected to each other,
artificial neural networks also have neurons that are linked to each other in various
layers of the networks. These neurons are known as nodes.

Artificial neural network tutorial covers all the aspects related to the artificial neural
network. In this tutorial, we will discuss ANNs, Adaptive resonance theory, Kohonen
self-organizing map, Building blocks, unsupervised learning, Genetic algorithm, etc.

What is Artificial Neural Network?


The term "Artificial Neural Network" is derived from Biological neural networks that
develop the structure of a human brain. Similar to the human brain that has neurons
interconnected to one another, artificial neural networks also have neurons that are
interconnected to one another in various layers of the networks. These neurons are
known as nodes.

The given figure illustrates the typical diagram


of Biological Neural Network.

The typical Artificial Neural Network looks something


like the given figure.

Dendrites from Biological Neural Network represent inputs in Artificial Neural


Networks, cell nucleus represents Nodes, synapse represents Weights, and Axon
represents Output.

Relationship between Biological neural network and


artificial neural network:

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

Biological Neural Artificial Neural Network


Network

Dendrites Inputs

Cell nucleus Nodes

Synapse Weights

Axon Output

An Artificial Neural Network in the field of Artificial intelligence where it attempts


to mimic the network of neurons makes up a human brain so that computers will have
an option to understand things and make decisions in a human-like manner. The
artificial neural network is designed by programming computers to behave simply like
interconnected brain cells.

There are around 1000 billion neurons in the human brain. Each neuron has an
association point somewhere in the range of 1,000 and 100,000. In the human brain,
data is stored in such a manner as to be distributed, and we can extract more than one
piece of this data when necessary from our memory parallelly. We can say that the
human brain is made up of incredibly amazing parallel processors.

We can understand the artificial neural network with an example, consider an example
of a digital logic gate that takes an input and gives an output. "OR" gate, which takes
two inputs. If one or both the inputs are "On," then we get "On" in output. If both the
inputs are "Off," then we get "Off" in output. Here the output depends upon input. Our
brain does not perform the same task. The outputs to inputs relationship keep
changing because of the neurons in our brain, which are "learning."

The architecture of an artificial neural network:


To understand the concept of the architecture of an artificial neural network, we have
to understand what a neural network consists of. In order to define a neural network
that consists of a large number of artificial neurons, which are termed units arranged
in a sequence of layers. Lets us look at various types of layers available in an artificial
neural network.

Artificial Neural Network primarily consists of three layers:

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

Input Layer:

As the name suggests, it accepts


inputs in several different formats
provided by the programmer.

Hidden Layer:

The hidden layer presents in-between input and output layers. It performs all the
calculations to find hidden features and patterns.

Output Layer:

The input goes through a series of transformations using the hidden layer, which finally
results in output that is conveyed using this layer.

The artificial neural network takes input and computes the weighted sum of the inputs
and includes a bias. This computation is represented in the form of a transfer function.

It determines weighted total is passed as an input to an


activation function to produce the output.
Activation functions choose whether a node should fire or not. Only those who are
fired make it to the output layer. There are distinctive activation functions available
that can be applied upon the sort of task we are performing.

Advantages of Artificial Neural Network (ANN)


Parallel processing capability:

Artificial neural networks have a numerical value that can perform more than one task
simultaneously.

Storing data on the entire network:

Data that is used in traditional programming is stored on the whole network, not on a
database. The disappearance of a couple of pieces of data in one place doesn't prevent
the network from working.

Capability to work with incomplete knowledge:

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

After ANN training, the information may produce output even with inadequate data.
The loss of performance here relies upon the significance of missing data.

Having a memory distribution:

For ANN is to be able to adapt, it is important to determine the examples and to


encourage the network according to the desired output by demonstrating these
examples to the network. The succession of the network is directly proportional to the
chosen instances, and if the event can't appear to the network in all its aspects, it can
produce false output.

Having fault tolerance:

Extortion of one or more cells of ANN does not prohibit it from generating output,
and this feature makes the network fault-tolerance.

Disadvantages of Artificial Neural Network:


Assurance of proper network structure:

There is no particular guideline for determining the structure of artificial neural


networks. The appropriate network structure is accomplished through experience, trial,
and error.

Unrecognized behavior of the network:

It is the most significant issue of ANN. When ANN produces a testing solution, it does
not provide insight concerning why and how. It decreases trust in the network.

Hardware dependence:

Artificial neural networks need processors with parallel processing power, as per their
structure. Therefore, the realization of the equipment is dependent.

Difficulty of showing the issue to the network:

ANNs can work with numerical data. Problems must be converted into numerical
values before being introduced to ANN. The presentation mechanism to be resolved
here will directly impact the performance of the network. It relies on the user's abilities.

The duration of the network is unknown:

The network is reduced to a specific value of the error, and this value does not give us
optimum results.

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

Science artificial neural networks that have steeped into the world in the mid-20th century are
exponentially developing. In the present time, we have investigated the pros of artificial neural
networks and the issues encountered in the course of their utilization. It should not be overlooked
that the cons of ANN networks, which are a flourishing science branch, are eliminated individually,
and their pros are increasing day by day. It means that artificial neural networks will turn into an
irreplaceable part of our lives progressively important.

How do artificial neural networks work?


Artificial Neural Network can be best represented as a weighted directed graph, where
the artificial neurons form the nodes. The association between the neurons outputs
and neuron inputs can be viewed as the directed edges with weights. The Artificial
Neural Network receives the input signal from the external source in the form of a
pattern and image in the form of a vector. These inputs are then mathematically
assigned by the notations x(n) for every n number of inputs.

Afterward, each of the input is multiplied by


its corresponding weights ( these weights are
the details utilized by the artificial neural
networks to solve a specific problem ). In
general terms, these weights normally
represent the strength of the interconnection
between neurons inside the artificial neural
network. All the weighted inputs are
summarized inside the computing unit.

If the weighted sum is equal to zero, then bias is added to make the output non-zero
or something else to scale up to the system's response. Bias has the same input, and
weight equals to 1. Here the total of weighted inputs can be in the range of 0 to
positive infinity. Here, to keep the response in the limits of the desired value, a certain
maximum value is benchmarked, and the total of weighted inputs is passed through
the activation function.

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

Choosing the Training Experience:

The first design choice we face is to choose the type of training experience from which our
system will learn. The type of training experience available can have a significant impact on
success or failure of the learner. One key attribute is whether the training experience provides
direct or indirect feedback regarding the choices made by the performance system. For
example, in learning to play checkers, the system might learn from direct training examples
consisting of individual checkers board states and the correct move for each.

In order to complete the design of the learning system, we must now choose

1. the exact type of knowledge to be learned

2. a representation for this target knowledge

3. a learning mechanism

Choosing the Target Function:

The next design choice is to determine exactly what type of knowledge will be learned and
how this will be used by the performance program. Let us begin with a checkers-playing
program that can generate the legal moves from any board state. The program needs only to
learn how to choose the best move from among these legal moves.

Choosing a Representation for the Target Function:

X1: the number of black pieces on the board

x2: the number of red pieces on the board

x3: the number of black kings on the board

x4: the number of red kings on the board

x5: the number of black pieces threatened by red (i.e., which can be captured on red's next
turn)

X6: the number of red pieces threatened by black

Thus, our learning program will represent V(b) as a linear function of the form

V(b)=w0+w1x1+w2x2+w3x3+w4x4+w5x5+w6x6

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

where wo through w6
6 are numerical coefficients, or weights, to be chosen by the learning
algorithm.

Partial design of a checkers learning program:

Task T: playing checkers

Performance measure P: percent of games won in the world tournament

Training experience E: games played against itself

Target function: V:BoardR

Target function representation

V(b)=w0+w1x1+w2x2+w3x3+w4x4+w5x5+w6x6

Choosing a Function Approximation A


Algorithm

In order to learn the target function f we require a set of training examples, each describing a
specific board state b and the training value Vtrain(b) for bb.

ESTIMATING TRAINING VALUES

Rule for estimating training values.

Vtrain (b) V(Successor(b))

ADJUSTING THE WEIGHTS

The Final Design:

The Performance System is the module that must solve the given per-
per formance task, in this
case playing checkers, by using the learned target function(s). It takes an instance of a new
problem (new game) as input and produces a trace of its solution (game history) as output.

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

The Critic takes as input the history or trace of the ga


game
me and produces as output a set of
training examples of the target function. As shown in the diagram, each training example in
this case corresponds to some game state in the trace, along with an estimate Vtrain,
Vtrai of the
target function value for this example.
examp

The Generalizer takes as input the training examples and produces an output hypothesis that
is its estimate of the target function. It generalizes from the specific training examples,
hypothesizing a general function that covers these examples and othe
otherr cases beyond the
training examples.

The Experiment Generator takes as input the current hypothesis (currently learned
function) and outputs a new problem (i.e., initial board state) for the Performance System to
explore. Its role is to pick new practice problems that will maximize the learning rate of the
overall system.

Fig: Final Design of Checkers Learning Problem

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

Fig: Summary of choices in designing checkers learning problem.

PERSPECTIVES AND ISSUES IN MACHINE LEARNING


LEARNING:

One useful perspective on machine learning is that it involves searching a very large space of
possible hypotheses to determine one that best fits the observed data and any prior knowledge
held by the learner. For example, consider the space of hypotheses that could in principle be
output by the above checkers learner. This hypothesis space consists of all evaluation
functions that can be represented by some choice of values for the weights wo through w6.
The learner's task is thus to search through this vast space to locate the hyp
hypothesis
othesis that is most
consistent with the available training examples. The LMS algorithm for fitting weights
achieves this goal by iteratively tuning the weights, adding a correction to each weight each
time the hypothesized evaluation function predicts a vvalue
alue that differs from the training value.

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

This algorithm works well when the hypothesis representation considered by the learner
defines a continuously parameterized space of potential hypotheses.

Issues in Machine Learning

 What algorithms exist for learning general target functions from specific training
examples? In what settings will particular algorithms converge to the desired function,
given sufficient training data? Which algorithms perform best for which types of
problems and representations?
 How much training data is sufficient? What general bounds can be found to relate the
confidence in learned hypotheses to the amount of training experience and the
character of the learner's hypothesis space?
 When and how can prior knowledge held by the learner guide the process of
generalizing from examples? Can prior knowledge be helpful even when it is only
approximately correct?
 What is the best strategy for choosing a useful next training experience, and how does
the choice of this strategy alter the complexity of the learning problem?
 What is the best way to reduce the learning task to one or more function
approximation problems? Put another way, what specific functions should the system
attempt to learn? Can this process itself be automated?
 How can the learner automatically alter its representation to improve its ability to
represent and learn the target function?

Concept Learning:

Concept learning: Inferring a boolean-valued function from training examples of its input
and output.

A CONCEPT LEARNING TASK:

What hypothesis representation shall we provide to the learner in this case? Let us begin by
considering a simple representation in which each hypothesis consists of a conjunction of
constraints on the instance attributes. In particular, let each hypothesis be a vector of six
constraints, specifying the values of the six attributes Sky, AirTemp, Humidity, Wind, Water,
and Forecast. For each attribute, the hypothesis will either

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

indicate by a "?' that any value is acceptable for this attribute,

specify a single
ingle required value (e.g., Warm) for the attribute, or

indicate by a "θ" that no value is acceptable.

The inductive learning hypothesis. Any hypothesis found to approximate the target
function well over a sufficiently large set of training examples will also approximate the
target function well over other unobserved examples.

CONCEPT LEARNING AS SEARCH Concept learning can be viewed as the task of


searching through a large space of hypotheses implicitly defined by the hypothesis
representation. The goal off this search is to find the hypothesis that best fits the training
examples. It is important to note that by selecting a hypothesis representation, the designer of
the learning algorithm implicitly defines the space of all hypotheses that the program can ever
represent and therefore can ever learn.

General-to-Specific
Specific Ordering of Hypotheses Many algorithms for concept learning
organize the search through the hypothesis space by relying on a very useful structure that
exists for any concept learning problem
problem: a general-to-specific
specific ordering of hypotheses. By
taking advantage of this naturally occurring structure over the hypothesis space, we can
design learning algorithms that exhaustively search even infinite hy
hypothesis
pothesis spaces without
explicitly enumerating every
very hypothesis.

To illustrate the general-to-specific


specific ordering, consider the two hypotheses

h1 = (Sunny, ?, ?, Strong, ?, ?)

h2 = (Sunny, ?, ?, ?, ?, ?)

Now consider the sets of instances that are classified positive by hl and by h2. Because h2
imposess fewer constraints on the instance, it classifies more instances as positive. In fact, any
instance classified positive by hl will also be classified positive by h2. Therefore, we say that
h2 is more general than hl.

Definition: Let hj and hk be boolean


boolean-valued
alued functions defined over X. Then hj is more
general-than-or-equal-to
to hk (written hj >= hk) if and only if

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

FIND-S:
S: FINDING A MAXIMALLY SPECIFIC HYPOTHESIS
HYPOTHESIS:

Fig: Find-S Algorithm

VERSION SPACES AND THE CANDIDATE


CANDIDATE-ELIMINATION
ELIMINATION ALGORITHM:
ALGORITHM

The CANDIDATE-ELIMINATION
ELIMINATION algorithm finds all describable hypotheses that are
consistent with the observed training examples. In order to define this algorithm precisely, we
begin with a few basic definitions. First, let us say that a hypothesis is consist
consistent
ent with the
training examples if it correctly classifies these examples.

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

CANDIDATE-ELIMINATION
ELIMINATION Learning Algorithm
Algorithm:

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

REMARKS ON VERSION SPACES AND CANDIDATE-ELIMINATION


CANDIDATE ATION
ALGORITHM:

Will the CANDIDATE-ELIMINATION


ELIMINATION Algorithm Converge to the Correct
Hypothesis?

The version space learned by the CANDIDATE


CANDIDATE-ELIMINATION
ELIMINATION algorithm will con
con- verge
toward the hypothesis that correctly describes the target concept, provided (1) there are no
errors
rs in the training examples, and (2) there is some hypothesis in H that correctly describes
the target concept.

What Training Example Should the Learner Request Next? Up to this point we have
assumed that training examples are provided to the learner by so
some
me external teacher. Suppose
instead that the learner is allowed to conduct experiments in which it chooses the next
instance, then obtains the correct classification for this instance from an external oracle (e.g.,
nature or a teacher).

How Can Partially Learned Concepts Be Used? Suppose that no additional training
examples are available beyond the four in our example above, but that the learner is now
required to classify new instances that it has not yet observed. Even though the version space
still contains
ains multiple hypotheses, indicating that the target concept has not yet been fully
learned, it is possible to classify certain examples with the same degree of confidence as if
the target concept had been uniquely identified.

INDUCTIVE BIAS:

Inductive bias of CANDIDATE-ELIMINATION


CANDIDATE ELIMINATION algorithm. The target concept c is
contained in the given hypothesis space H.

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

Linear Discriminant Analysis (LDA) in Machine


Learning
Linear Discriminant Analysis (LDA) is one of the commonly used dimensionality
reduction techniques in machine learning to solve more than two-class
classification problems. It is also known as Normal Discriminant Analysis (NDA)
or Discriminant Function Analysis (DFA).

This can be used to project the features of higher dimensional space into lower-
dimensional space in order to reduce resources and dimensional costs. In this topic,
"Linear Discriminant Analysis (LDA) in machine learning”, we will discuss the LDA
algorithm for classification predictive modeling problems, limitation of logistic
regression, representation of linear Discriminant analysis model, how to make a
prediction using LDA, how to prepare data for LDA, extensions to LDA and much more.
So, let's start with a quick introduction to Linear Discriminant Analysis (LDA) in machine
learning.

What is Linear Discriminant Analysis (LDA)?


Although the logistic regression algorithm is limited to only two-class, linear
Discriminant analysis is applicable for more than two classes of classification problems.

Linear Discriminant analysis is one of the most popular dimensionality reduction


techniques used for supervised classification problems in machine learning. It is
also considered a pre-processing step for modeling differences in ML and applications
of pattern classification.

Whenever there is a requirement to separate two or more classes having multiple


features efficiently, the Linear Discriminant Analysis model is considered the most
common technique to solve such classification problems. For e.g., if we have two
classes with multiple features and need to separate them efficiently. When we classify
them using a single feature, then it may show overlapping.

To overcome the overlapping issue in the classification process, we must increase the
number of features regularly.

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

Example:
Let's assume we have to classify two different classes having two sets of data points in
a 2-dimensional plane as shown below image:

However, it is impossible to draw a straight line in a 2-d plane that can separate these
data points efficiently but using linear Discriminant analysis; we can dimensionally
reduce the 2-D plane into the 1-D plane. Using this technique, we can also maximize
the separability between multiple classes.

How Linear Discriminant Analysis (LDA) works?


Linear Discriminant analysis is used as a dimensionality reduction technique in machine
learning, using which we can easily transform a 2-D and 3-D graph into a 1-
dimensional plane.

Let's consider an example where we have two classes in a 2-D plane having an X-Y
axis, and we need to classify them efficiently. As we have already seen in the above
example that LDA enables us to draw a straight line that can completely separate the
two classes of the data points. Here, LDA uses an X-Y axis to create a new axis by
separating them using a straight line and projecting data onto a new axis.

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

Hence, we can maximize the separation between these classes and reduce the 2-D
plane into 1-D.

To create a new axis, Linear Discriminant Analysis uses the following criteria:

o It maximizes the distance between means of two classes.


o It minimizes the variance within the individual class.

Using the above two conditions, LDA generates a new axis in such a way that it can
maximize the distance between the means of the two classes and minimizes the
variation within each class.

In other words, we can say that the new axis will increase the separation between the
data points of the two classes and plot them onto the new axis.

Why LDA?
o Logistic Regression is one of the most popular classification algorithms that perform
well for binary classification but falls short in the case of multiple classification
problems with well-separated classes. At the same time, LDA handles these quite
efficiently.
o LDA can also be used in data pre-processing to reduce the number of features, just as
PCA, which reduces the computing cost significantly.
o LDA is also used in face detection algorithms. In Fisherfaces, LDA is used to extract
useful data from different faces. Coupled with eigenfaces, it produces effective results.

Drawbacks of Linear Discriminant Analysis (LDA)


Although, LDA is specifically used to solve supervised classification problems for two
or more classes which are not possible using logistic regression in machine learning.

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

But LDA also fails in some cases where the Mean of the distributions is shared. In this
case, LDA fails to create a new axis that makes both the classes linearly separable.

To overcome such problems, we use non-linear Discriminant analysis in machine


learning.

Extension to Linear Discriminant Analysis (LDA)


Linear Discriminant analysis is one of the most simple and effective methods to solve
classification problems in machine learning. It has so many extensions and variations
as follows:

1. Quadratic Discriminant Analysis (QDA): For multiple input variables, each class
deploys its own estimate of variance.
2. Flexible Discriminant Analysis (FDA): it is used when there are non-linear groups of
inputs are used, such as splines.
3. Flexible Discriminant Analysis (FDA): This uses regularization in the estimate of the
variance (actually covariance) and hence moderates the influence of different variables
on LDA.

Real-world Applications of LDA


Some of the common real-world applications of Linear discriminant Analysis are given
below:

o Face Recognition
Face recognition is the popular application of computer vision, where each face is
represented as the combination of a number of pixel values. In this case, LDA is used
to minimize the number of features to a manageable number before going through
the classification process. It generates a new template in which each dimension
consists of a linear combination of pixel values. If a linear combination is generated
using Fisher's linear discriminant, then it is called Fisher's face.
o Medical
In the medical field, LDA has a great application in classifying the patient disease on
the basis of various parameters of patient health and the medical treatment which is
going on. On such parameters, it classifies disease as mild, moderate, or severe. This
classification helps the doctors in either increasing or decreasing the pace of the
treatment.

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

o Customer Identification
In customer identification, LDA is currently being applied. It means with the help of
LDA; we can easily identify and select the features that can specify the group of
customers who are likely to purchase a specific product in a shopping mall. This can
be helpful when we want to identify a group of customers who mostly purchase a
product in a shopping mall.
o For Predictions
LDA can also be used for making predictions and so in decision making. For example,
"will you buy this product” will give a predicted result of either one or two possible
classes as a buying or not.
o In Learning
Nowadays, robots are being trained for learning and talking to simulate human work,
and it can also be considered a classification problem. In this case, LDA builds similar
groups on the basis of different parameters, including pitches, frequencies, sound,
tunes, etc.

Perceptron in Machine Learning


In Machine Learning and Artificial Intelligence, Perceptron is the most commonly used
term for all folks. It is the primary step to learn Machine Learning and Deep Learning
technologies, which consists of a set of weights, input values or scores, and a
threshold. Perceptron is a building block of an Artificial Neural Network. Initially,
in the mid of 19th century, Mr. Frank Rosenblatt invented the Perceptron for
performing certain calculations to detect input data capabilities or business
intelligence. Perceptron is a linear Machine Learning algorithm used for supervised
learning for various binary classifiers. This algorithm enables neurons to learn elements
and processes them one by one during preparation. In this tutorial, "Perceptron in
Machine Learning," we will discuss in-depth knowledge of Perceptron and its basic
functions in brief. Let's start with the basic introduction of Perceptron.

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

What is the Perceptron model in Machine


Learning?
Perceptron is Machine Learning algorithm for supervised learning of various binary
classification tasks. Further, Perceptron is also understood as an Artificial Neuron
or neural network unit that helps to detect certain input data computations in
business intelligence.

Perceptron model is also treated as one of the best and simplest types of Artificial
Neural networks. However, it is a supervised learning algorithm of binary classifiers.
Hence, we can consider it as a single-layer neural network with four main parameters,
i.e., input values, weights and Bias, net sum, and an activation function.

What is Binary classifier in Machine Learning?


In Machine Learning, binary classifiers are defined as the function that helps in
deciding whether input data can be represented as vectors of numbers and belongs
to some specific class.

Backward Skip 10sPlay VideoForward Skip 10s

Binary classifiers can be considered as linear classifiers. In simple words, we can


understand it as a classification algorithm that can predict linear predictor
function in terms of weight and feature vectors.

Basic Components of Perceptron


Mr. Frank Rosenblatt invented the perceptron model as a binary classifier which
contains three main components. These are as follows:

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

ADVERTISEMENT

o Input Nodes or Input Layer:

This is the primary component of Perceptron which accepts the initial data into the
system for further processing. Each input node contains a real numerical value.

o Wight and Bias:

Weight parameter represents the strength of the connection between units. This is
another most important parameter of Perceptron components. Weight is directly
proportional to the strength of the associated input neuron in deciding the output.
Further, Bias can be considered as the line of intercept in a linear equation.

o Activation Function:

These are the final and important components that help to determine whether the
neuron will fire or not. Activation Function can be considered primarily as a step
function.

Types of Activation functions:

o Sign function
o Step function, and
o Sigmoid function

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

The data scientist uses the activation function to take a subjective decision based on
various problem statements and forms the desired outputs. Activation function may
differ (e.g., Sign, Step, and Sigmoid) in perceptron models by checking whether the
learning process is slow or has vanishing or exploding gradients.

How does Perceptron work?


In Machine Learning, Perceptron is considered as a single-layer neural network that
consists of four main parameters named input values (Input nodes), weights and Bias,
net sum, and an activation function. The perceptron model begins with the
multiplication of all input values and their weights, then adds these values together to
create the weighted sum. Then this weighted sum is applied to the activation function
'f' to obtain the desired output. This activation function is also known as the step
function and is represented by 'f'.

This step function or Activation function plays a vital role in ensuring that output is
mapped between required values (0,1) or (-1,1). It is important to note that the weight
of input is indicative of the strength of a node. Similarly, an input's bias value gives the
ability to shift the activation function curve up or down.

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

Perceptron model works in two important steps as follows:

Step-1

In the first step first, multiply all input values with corresponding weight values and
then add them to determine the weighted sum. Mathematically, we can calculate the
weighted sum as follows:

∑wi*xi = x1*w1 + x2*w2 +…wn*xn

Add a special term called bias 'b' to this weighted sum to improve the model's
performance.

∑wi*xi + b

Step-2

In the second step, an activation function is applied with the above-mentioned


weighted sum, which gives us output either in binary form or a continuous value as
follows:

Y = f(∑wi*xi + b)

Linear separability
Linear separability is an important concept in machine learning, particularly in the field
of supervised learning. It refers to the ability of a set of data points to be separated
into distinct categories using a linear decision boundary. In other words, if there exists
a straight line that can cleanly divide the data into two classes, then the data is said to
be linearly separable.

Linear separability is a concept in machine learning that refers to the ability to separate
data points in binary classification problems using a linear decision boundary. If the
data points can be separated using a line, linear function, or flat hyperplane, they are
considered linearly separable. Linear separability is an important concept in neural
networks, and it is introduced in the context of linear algebra and optimization theory.

In the context of machine learning, linear separability is an important property because


it makes classification problems easier to solve. If the data is linearly separable, we can
use a linear classifier, such as logistic regression or support vector machines (SVMs),
to accurately classify new instances of data.

Linearly separable data points can be separated using a line, linear function, or flat
hyperplane. In practice, there are several methods to determine whether data is linearly

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

separable[3]. One method is linear programming, which defines an objective function


subjected to constraints that satisfy linear separability. Another method is to train and
test on the same data - if there is a line that separates the data points, then the
accuracy or AUC should be close to 100%. If there is no such line, then training and
testing on the same data will result in at least some error. Multi-layer neural networks
can learn hidden features and patterns in data that linear classifiers cannot

To understand the concept of linear separability, it is helpful to first consider a simple


two-dimensional example. Imagine we have a set of data points in a two-dimensional
space, where each point is labeled either "red" or "blue". If these data points can be
separated by a straight line, such that all the red points are on one side of the line and
all the blue points are on the other side, then the data is linearly separable.

Python provides several methods to determine whether data is linearly separable. One
method is linear programming, which defines an objective function subjected to
constraints that satisfy linear separability. Another method is clustering, where if two
clusters with cluster purity of 100% can be found using some clustering methods such
as k-means, then the data is linearly separable.

However, not all data sets are linearly separable. In some cases, it may be impossible
to draw a straight line that can separate the data into distinct categories. For example,
imagine a set of data points that are arranged in a circular pattern, with red and blue
points interspersed throughout. In this case, it is impossible to draw a straight line that
separates the data into two classes.

When faced with data that is not linearly separable, machine learning algorithms must
use more complex decision boundaries to accurately classify the data. For example, a
decision tree or a neural network may be able to accurately classify data that is not
linearly separable.

Linear separability is not only important in the context of machine learning, but it also
has applications in other fields such as physics, biology, and economics. For example,
in physics, linear separability can be used to analyze the relationship between two
physical quantities. In biology, it can be used to study the behavior of animals or to
analyze genetic data. In economics, it can be used to analyze the relationship between
two economic variables.

Example:
One way to test for linear separability is to use linear programming. Linear
programming defines an objective function subject to constraints that satisfy linear
separability. The scipy.optimize.linprog() function in Python can be used to solve linear
programming problems. Here's an example of using scipy.optimize.linprog() to test for
linear separability:

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

ADVERTISEMENT

1. import numpy as np
2. from scipy.optimize import linprog
3.
4. # Define the data points
5. X = np.array([[1, 2], [2, 3], [3, 1], [4, 3]])
6. y = np.array([1, 1, -1, -1])
7.
8. # Define the objective function and constraints
9. c = np.zeros(X.shape[1] + 1)
10. c[-1] = 1
11. A = np.zeros((X.shape, X.shape[1] + 1))
12. A[:, :-1] = -y[:, np.newaxis] * X
13. A[:, -1] = -y
14. b = -np.ones(X.shape)
15.
16. # Solve the linear programming problem
17. res = linprog(c, A_ub=A, b_ub=b)
18.
19. if res.success:
20. print("The data is linearly separable.")
21. else:
22. print("The data is not linearly separable.")

In this example, we define a set of data points X and their corresponding labels y. We
then define the objective function and constraints for the linear programming
problem. The objective function is simply a vector of zeros with a 1 in the last position,
which corresponds to the bias term in the linear decision boundary. The constraints
are defined such that the dot product of each data point with the weight vector and
bias term is greater than or equal to -1 for negative examples and less than or equal
to 1 for positive examples. We then solve the linear programming problem using
scipy.optimize.linprog() and check if the solution is successful. If the solution is
successful, the data is linearly separable.

Another way to test for linear separability is to use a linear classifier or support vector
machine (SVM) with a linear kernel. For linearly separable datasets, a linear classifier or
SVM with a linear kernel can achieve 100% accuracy to classify data[4]. Here's an
example of using a linear SVM to test for linear separability:

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

1. from sklearn import svm


2.
3. # Define the data points
4. X = np.array([[1, 2], [2, 3], [3, 1], [4, 3]])
5. y = np.array([1, 1, -1, -1])
6.
7. # Train a linear SVM
8. clf = svm.SVC(kernel='linear')
9. clf.fit(X, y)
10. if clf.score(X, y) == 1.0:
11. print("The data is linearly separable.")
12. else:
13. print("The data is not linearly separable.")

In this example, we define a set of data points X and their corresponding labels y. We
then train a linear SVM using the svm.SVC() function from scikit-learn with a linear
kernel. We then check if the accuracy of the SVM on the training data is 100%.

Moreover, it is important to note that linear separability is not the only criterion for the
effectiveness of a classification algorithm. In some cases, even if the data is linearly
separable, a linear classifier may not be the best choice. For example, if the data is
high-dimensional or contains complex nonlinear relationships, a nonlinear classifier
may be more effective. In such cases, machine learning algorithms such as decision
trees, random forests, or neural networks may be more appropriate

In practice, determining whether a set of data points is linearly separable can be


challenging. There are several approaches that can be used to determine linear
separability, including visual inspection and mathematical analysis. One common
approach is to use a scatter plot to visualize the data and see if a straight line can be
drawn that separates the data into two classes.

Another approach is to use mathematical tools to determine if the data is linearly


separable. For example, the Perceptron algorithm is a simple algorithm that can be
used to determine linear separability. The algorithm works by iterating through the
data points and updating a set of weights that define the decision boundary. If the
algorithm is able to converge on a set of weights that separates the data, then the data
is linearly separable.

Another important aspect to consider is that linear separability is not an inherent


property of the data itself, but rather a property of the representation of the data. In
other words, the way in which the data is transformed or encoded can affect whether

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

or not it is linearly separable. This means that preprocessing techniques such as feature
selection,

ADVERTISEMENT

In conclusion, linear separability is an important concept in machine learning that


refers to the ability of a set of data points to be separated into distinct categories using
a linear decision boundary. Linear separability makes classification problems easier to
solve, but not all data sets are linearly separable. linear separability is an important
concept in machine learning that allows us to determine if a binary classification
problem can be separated using a linear decision boundary. In practice, determining
whether a set of data points is linearly separable can be challenging, and there are
several approaches that can be used to determine linear separability, including visual
inspection and mathematical analysis.

Linear Regression in Machine Learning


Linear regression is one of the easiest and most popular Machine Learning algorithms.
It is a statistical method that is used for predictive analysis. Linear regression makes
predictions for continuous/real or numeric variables such as sales, salary, age,
product price, etc.

Linear regression algorithm shows a linear relationship between a dependent (y) and
one or more independent (y) variables, hence called as linear regression. Since linear
regression shows the linear relationship, which means it finds how the value of the
dependent variable is changing according to the value of the independent variable.

The linear regression model provides a sloped straight line representing the
relationship between the variables. Consider the below image:

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

Mathematically, we can represent a linear regression as:

y= a0+a1x+ ε

Here,

Y= Dependent Variable (Target Variable)


X= Independent Variable (predictor Variable)
a0= intercept of the line (Gives an additional degree of freedom)
a1 = Linear regression coefficient (scale factor to each input value).
ε = random error

The values for x and y variables are training datasets for Linear Regression model
representation.

Types of Linear Regression


Linear regression can be further divided into two types of the algorithm:

ADVERTISEMENT

o Simple Linear Regression:


If a single independent variable is used to predict the value of a numerical dependent
variable, then such a Linear Regression algorithm is called Simple Linear Regression.

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

o Multiple Linear regression:


If more than one independent variable is used to predict the value of a numerical
dependent variable, then such a Linear Regression algorithm is called Multiple Linear
Regression.

Linear Regression Line


A linear line showing the relationship between the dependent and independent
variables is called a regression line. A regression line can show two types of
relationship:

o Positive Linear Relationship:


If the dependent variable increases on the Y-axis and independent variable increases
on X-axis, then such a relationship is termed as a Positive linear relationship.

o Negative Linear Relationship:


If the dependent variable decreases on the Y-axis and independent variable increases
on the X-axis, then such a relationship is called a negative linear relationship.

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

Finding the best fit line:


When working with linear regression, our main goal is to find the best fit line that
means the error between predicted values and actual values should be minimized. The
best fit line will have the least error.

ADVERTISEMENT

The different values for weights or the coefficient of lines (a0, a1) gives a different line
of regression, so we need to calculate the best values for a 0 and a1 to find the best fit
line, so to calculate this we use cost function.

Cost function-
o The different values for weights or coefficient of lines (a0, a1) gives the different line of
regression, and the cost function is used to estimate the values of the coefficient for
the best fit line.
o Cost function optimizes the regression coefficients or weights. It measures how a linear
regression model is performing.
o We can use the cost function to find the accuracy of the mapping function, which
maps the input variable to the output variable. This mapping function is also known
as Hypothesis function.

For Linear Regression, we use the Mean Squared Error (MSE) cost function, which is
the average of squared error occurred between the predicted values and actual values.
It can be written as:

For the above linear equation, MSE can be calculated as:

Where,

N=Total number of observation


Yi = Actual value
(a1xi+a0)= Predicted value.

Residuals: The distance between the actual value and predicted values is called
residual. If the observed points are far from the regression line, then the residual will
be high, and so cost function will high. If the scatter points are close to the regression
line, then the residual will be small and hence the cost function.

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

Gradient Descent:
o Gradient descent is used to minimize the MSE by calculating the gradient of the cost
function.
o A regression model uses gradient descent to update the coefficients of the line by
reducing the cost function.
o It is done by a random selection of values of coefficient and then iteratively update the
values to reach the minimum cost function.

Model Performance:
The Goodness of fit determines how the line of regression fits the set of observations.
The process of finding the best model out of various models is called optimization. It
can be achieved by below method:

1. R-squared method:

o R-squared is a statistical method that determines the goodness of fit.


o It measures the strength of the relationship between the dependent and independent
variables on a scale of 0-100%.
o The high value of R-square determines the less difference between the predicted values
and actual values and hence represents a good model.
o It is also called a coefficient of determination, or coefficient of multiple
determination for multiple regression.
o It can be calculated from the below formula:

Assumptions of Linear Regression


Below are some important assumptions of Linear Regression. These are some formal
checks while building a Linear Regression model, which ensures to get the best
possible result from the given dataset.

o Linear relationship between the features and target:


Linear regression assumes the linear relationship between the dependent and
independent variables.
o Small or no multicollinearity between the features:
Multicollinearity means high-correlation between the independent variables. Due to

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

multicollinearity, it may difficult to find the true relationship between the predictors
and target variables. Or we can say, it is difficult to determine which predictor variable
is affecting the target variable and which is not. So, the model assumes either little or
no multicollinearity between the features or independent variables.
o Homoscedasticity Assumption:
Homoscedasticity is a situation when the error term is the same for all the values of
independent variables. With homoscedasticity, there should be no clear pattern
distribution of data in the scatter plot.
o Normal distribution of error terms:
Linear regression assumes that the error term should follow the normal distribution
pattern. If error terms are not normally distributed, then confidence intervals will
become either too wide or too narrow, which may cause difficulties in finding
coefficients.
It can be checked using the q-q plot. If the plot shows a straight line without any
deviation, which means the error is normally distributed.
o No autocorrelations:
The linear regression model assumes no autocorrelation in error terms. If there will be
any correlation in the error term, then it will drastically reduce the accuracy of the
model. Autocorrelation usually occurs if there is a dependency between residual errors.

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh

You might also like