Unit 1 ML
Unit 1 ML
Machine learning is an application of artificial intelligence (AI) that provides systems the ability to
automatically learn and improve from experience without being explicitly programmed. Machine
learning focuses on the development of computer programs that can access data and use it learn for
themselves.
Machine learning is an application of artificial intelligence (AI) that provides systems the ability to
automatically learn and improve from experience without being explicitly programmed. Machine
learning focuses on the development of computer programs that can access data and use it learn for
themselves.
Learning System
Types of Learning
Supervised Learning is the one, where you can consider the learning is guided by a teacher. We have
a dataset which acts as a teacher and its role is to train the model or the machine. Once the model
gets trained it can start making a prediction or decision when new data is given to it.
The model learns through observation and finds structures in the data. Once the model is given a
dataset, it automatically finds patterns and relationships in the dataset by creating clusters in it. What
it cannot do is add labels to the cluster, like it cannot say this a group of apples or mangoes, but it will
separate all the apples from mangoes.
Suppose we presented images of apples, bananas and mangoes to the model, so what it does, based
on some patterns and relationships it creates clusters and divides the dataset into those clusters. Now
if a new data is fed to the model, it adds it to one of the created clusters.
What is Reinforcement Learning?
It is the ability of an agent to interact with the environment and find out what is the best outcome. It
follows the concept of hit and trial method. The agent is rewarded or penalized with a point for a
correct or a wrong answer and based on the positive reward points gained the model trains itself. And
again, once trained it gets ready to predict the new data presented to it.
Supervised Learning vs Unsupervised Learning
Definition:
A computer program is said to learn from experience E with respect to some class of tasks T and
performance measure P, if its performance at tasks in T, as measured by P, improves with experience
E.
• Whether the training experience provides direct or indirect feedback regarding the choices
made by the performance system:
• Example:
– Indirect training examples in the same game consist of the move sequences and final
outcomes of various games played in which information about the correctness of
specific moves early in the game must be inferred indirectly from the fact that the
game was eventually won or lost – credit assignment problem.
• The degree to which the learner controls the sequence of training examples:
• Example:
– The learner might rely on the teacher to select informative board states and to
provide the correct move for each
– The learner might itself propose board states that it finds particularly confusing and
ask the teacher for the correct move. Or the learner may have complete control over
the board states and (indirect) classifications, as it does when it learns by playing
against itself with no teacher present.
• How well it represents the distribution of examples over which the final system performance
P must be measured: In general learning is most reliable when the training examples follow
a distribution similar to that of future test examples.
• Example:
– If the training experience in play checkers consists only of games played against itself,
the learner might never encounter certain crucial board states that are very likely to be played by the
human checkers champion. (Note however that the most current theory of machine learning rests on
the crucial assumption that the distribution of training examples is identical to the distribution of test
examples)
• To determine what type of knowledge will be learned and how this will be used by the
performance program:
• Example:
– In play checkers, it needs to learn to choose the best move among those legal moves:
ChooseMove: B ->M, which accepts as input any board from the set of legal board
states B and produces as output some move from the set of legal moves M.
• Since the target function such as ChooseMove turns out to be very difficult to learn given the
kind of indirect training experience available to the system, an alternative target function is
then an evaluation function that assigns a numerical score to any given board state, V: B ->R.
• Given the ideal target function V, we choose a representation that the learning system will
use to describe V' that it will learn:
• Example:
– In play checkers,
• Each training example is given by <b, Vtrain(b)> where Vtrain(b) is the training value for a board
b.
• Adjusting the weights: To specify the learning algorithm for choosing the weights wi to best
fit the set of training examples {<b, Vtrain(b)>}, which minimizes the squared error E between
the training values and the values predicted by the hypothesis V‘
E= ∑ (𝑉𝑡𝑟𝑎𝑖𝑛(𝑏) − 𝑉 ′(𝑏))^2
The Final Design
• Performance System: To solve the given performance task by using the learned target
function(s). It takes an instance of a new problem (new game) as input and a trace of its
solution (game history) as output.
• Critic: To take as input the history or trace of the game and produce as output a set of training
examples of the target function.
• Generalizer: To take as input the training examples and produce an output hypothesis that is
its estimate of the target function. It generalizes from the specific training examples,
hypothesizing a general function that covers these examples and other cases beyond the
training examples.
• Experiment Generator: To take as input the current hypothesis (currently learned function)
and outputs a new problem (i.e., initial board state) for Performance System to explore. Its
role is to pick new practice problems that will maximize the learning rate of the overall
system.
History of ML
Pattern Recognition
Quinlan’s ID3
NLP
Data Mining
Text Learning
Reinforcement Learning
Ensembles
An Artificial Neural Network (ANN) has hidden layers which are used to respond to more complicated
tasks than the earlier perceptrons could. ANNs are a primary tool used for Machine Learning. Neural
networks use input and output layers and, normally, include a hidden layer (or layers) designed to
transform input into data that can be used the by output layer. The hidden layers are excellent for
finding patterns too complex for a human programmer to detect, meaning a human could not find the
pattern and then teach the device to recognize it.
Artificial Neural Networks or ANN is an information processing paradigm that is inspired by the way
the biological nervous system such as brain process information. It is composed of large number of
highly interconnected processing elements(neurons) working in unison to solve a specific problem
The following diagram represents the general model of ANN which is inspired by a biological neuron.
It is also called Perceptron.
Marketing : It can be used to characterize & discover customer segments for marketing purposes.
Biology : It can be used for classification among different species of plants and animals.
Libraries : It is used in clustering different books on the basis of topics and information.
Insurance : It is used to acknowledge the customers, their policies and identifying the frauds.
City Planning: It is used to make groups of houses and to study their values based on their
geographical locations and other factors present.
Earthquake studies: By learning the earthquake-affected areas we can determine the dangerous
zones.
Reinforcement Learning
Reinforcement learning is an area of Machine Learning. It is about taking suitable action to maximize
reward in a particular situation. It is employed by various software and machines to find the best
possible behavior or path it should take in a specific situation. Reinforcement learning differs from the
supervised learning in a way that in supervised learning the training data has the answer key with it
so the model is trained with the correct answer itself whereas in reinforcement learning, there is no
answer but the reinforcement agent decides what to do to perform the given task. In the absence of
a training dataset, it is bound to learn from its experience.
Example: The problem is as follows: We have an agent and a reward, with many hurdles in between.
The agent is supposed to find the best possible path to reach the reward. The following problem
explains the problem more easily.
Decision tree learning is one of the predictive modelling approaches used in statistics, data
mining and machine learning. It uses a decision tree (as a predictive model) to go from observations
about an item (represented in the branches) to conclusions about the item's target value (represented
in the leaves). Tree models where the target variable can take a discrete set of values are
called classification trees; in these tree structures, leaves represent class labels and branches
represent conjunctions of features that lead to those class labels. Decision trees where the target
variable can take continuous values (typically real numbers) are called regression trees. Decision trees
are among the most popular machine learning algorithms given their intelligibility and simplicity.
• Decision tree learning is one of the most widely used and practical methods for inductive
inference.
– Its inductive bias is a preference for small trees over large trees.
• The decision tree algorithms such as ID3, C4.5 are very popular inductive inference
algorithms, and they are successfully applied to many leaning tasks.
Bayesian Networks
Bayesian networks are a type of Probabilistic Graphical Model that can be used to build models from
data and/or expert opinion.
They can be used for a wide range of tasks including prediction, anomaly detection, diagnostics,
automated insight, reasoning, time series prediction and decision making under uncertainty. Figure
below shows these capabilities in terms of the four major analytics disciplines, Descriptive
analytics, Diagnostic analytics, Predictive analytics and Prescriptive analytics.
They are also commonly referred to as Bayes nets, Belief networks and sometimes Causal networks.
A Bayes net is a model. It reflects the states of some part of a world that is being modeled and it
describes how those states are related by probabilities. The model might be of your house, or your
car, your body, your community, an ecosystem, a stock-market, etc. Absolutely anything can be
modeled by a Bayes net. All the possible states of the model represent all the possible worlds that can
exist, that is, all the possible ways that the parts or states can be configured. The car engine can be
running normally or giving trouble. It's tires can be inflated or flat. Your body can be sick or healthy,
and so on.
Support Vector Machine (SVM) is a relatively simple Supervised Machine Learning Algorithm used for
classification and/or regression. It is more preferred for classification but is sometimes very useful for
regression as well. Basically, SVM finds a hyper-plane that creates a boundary between the types of
data. In 2-dimensional space, this hyper-plane is nothing but a line.
In SVM, we plot each data item in the dataset in an N-dimensional space, where N is the number of
features/attributes in the data. Next, find the optimal hyperplane to separate the data. So by this, you
must have understood that inherently, SVM can only perform binary classification (i.e., choose
between two classes). However, there are various techniques to use for multi-class problems.
SVM works very well without any modifications for linearly separable data. Linearly Separable Data is
any data that can be plotted in a graph and can be separated into classes using a straight line.
We use Kernelized SVM for non-linearly separable data. Say, we have some non-linearly separable
data in one dimension. We can transform this data into two-dimensions and the data will become
linearly separable in two dimensions. This is done by mapping each 1-D data point to a
corresponding 2-D ordered pair.
So for any non-linearly separable data in any dimension, we can just map the data to a higher
dimension and then make it linearly separable. This is a very powerful and general transformation.
A kernel is nothing a measure of similarity between data points. The kernel function in a kernelized
SVM tell you, that given two data points in the original feature space, what the similarity is between
the points in the newly transformed feature space.
Genetic Algorithms
Genetic Algorithms(GAs) are adaptive heuristic search algorithms that belong to the larger part of
evolutionary algorithms. Genetic algorithms are based on the ideas of natural selection and genetics.
These are intelligent exploitation of random search provided with historical data to direct the search
into the region of better performance in solution space. They are commonly used to generate high-
quality solutions for optimization problems and search problems.
Genetic algorithms simulate the process of natural selection which means those species who can
adapt to changes in their environment are able to survive and reproduce and go to next generation.
In simple words, they simulate “survival of the fittest” among individual of consecutive generation for
solving a problem. Each generation consist of a population of individuals and each individual
represents a point in search space and possible solution. Each individual is represented as a string of
character/integer/float/bits. This string is analogous to the Chromosome
Genetic algorithms are based on an analogy with genetic structure and behavior of chromosome of
the population. Following is the foundation of GAs based on this analogy –
Those individuals who are successful (fittest) then mate to create more offspring than others
Genes from “fittest” parent propagate throughout the generation, that is sometimes parents create
offspring which is better than either parent.
Search space
The population of individuals are maintained within search space. Each individual represent a solution
in search space for given problem. Each individual is coded as a finite length vector (analogous to
chromosome) of components. These variable components are analogous to Genes. Thus, a
chromosome (individual) is composed of several genes (variable components).
Fitness Score
A Fitness Score is given to everyone which shows the ability of an individual to “compete”. The
individual having optimal fitness score (or near optimal) are sought.
The GAs maintains the population of n individuals (chromosome/solutions) along with their fitness
scores. The individuals having better fitness scores are given more chance to reproduce than others.
The individuals with better fitness scores are selected who mate and produce better offspring by
combining chromosomes of parents. The population size is static, so the room has to be created for
new arrivals. So, some individuals die and get replaced by new arrivals eventually creating new
generation when all the mating opportunity of the old population is exhausted. It is hoped that over
successive generations better solutions will arrive while least fit die.
Each new generation has on average more “better genes” than the individual (solution) of previous
generations. Thus, each new generations have better “partial solutions” than previous generations.
Once the offspring's produced having no significant difference than offspring produced by previous
populations, the population is converged. The algorithm is said to be converged to a set of solutions
for the problem.
• What algorithms exist for learning general target functions from specific training examples ?
• When and how can prior knowledge held by the learner guide the process of generalizing
from examples ?
• What is the best strategy for choosing a useful next training experience, and how does the
choice of this strategy alter the complexity of the learning problem ?
• What is the best way to reduce the learning task to one or more function approximation
problems ?
• How can the learner automatically alter its representation to improve its ability to represent
and learn the target function ?