0% found this document useful (0 votes)
30 views

Unit 1 ML

Uploaded by

26cssohar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views

Unit 1 ML

Uploaded by

26cssohar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

UNIT – 1 Introduction to Machine learning

Machine learning is an application of artificial intelligence (AI) that provides systems the ability to
automatically learn and improve from experience without being explicitly programmed. Machine
learning focuses on the development of computer programs that can access data and use it learn for
themselves.

Machine learning is an application of artificial intelligence (AI) that provides systems the ability to
automatically learn and improve from experience without being explicitly programmed. Machine
learning focuses on the development of computer programs that can access data and use it learn for
themselves.

Traditional Learning v/s Machine Learning

Learning System
Types of Learning

Machine learning is sub-categorized to three types:

• Supervised Learning – Train Me!

• Unsupervised Learning – I am self sufficient in learning

• Reinforcement Learning – My life My rules! (Hit & Trial)

What is Supervised Learning?

Supervised Learning is the one, where you can consider the learning is guided by a teacher. We have
a dataset which acts as a teacher and its role is to train the model or the machine. Once the model
gets trained it can start making a prediction or decision when new data is given to it.

What is Unsupervised Learning?

The model learns through observation and finds structures in the data. Once the model is given a
dataset, it automatically finds patterns and relationships in the dataset by creating clusters in it. What
it cannot do is add labels to the cluster, like it cannot say this a group of apples or mangoes, but it will
separate all the apples from mangoes.

Suppose we presented images of apples, bananas and mangoes to the model, so what it does, based
on some patterns and relationships it creates clusters and divides the dataset into those clusters. Now
if a new data is fed to the model, it adds it to one of the created clusters.
What is Reinforcement Learning?

It is the ability of an agent to interact with the environment and find out what is the best outcome. It
follows the concept of hit and trial method. The agent is rewarded or penalized with a point for a
correct or a wrong answer and based on the positive reward points gained the model trains itself. And
again, once trained it gets ready to predict the new data presented to it.
Supervised Learning vs Unsupervised Learning

Well-Posed Learning Problems

Definition:

A computer program is said to learn from experience E with respect to some class of tasks T and
performance measure P, if its performance at tasks in T, as measured by P, improves with experience
E.

Well-Posed Learning Problems: Examples

• A checkers learning problem

o Task T : playing checkers

o Performance measure P : percent of games won against opponents

o Training experience E : playing practice games against itself

• A handwriting recognition learning problem

– Task T : recognizing and classifying handwritten words within images

– Performance measure P : percent of words correctly classified

– Training experience E : a database of handwritten words with given classifications


• A robot driving learning problem

– Task T : driving on public four-lane highways using vision sensors

– Performance measure P : average distance traveled before an error (as judged by


human overseer)

– Training experience E : a sequence of images and steering commands recorded while


observing a human driver

Designing a Learning System

• Choosing the Training Experience

• Choosing the Target Function

• Choosing a Representation for the Target Function

• Choosing a Function Approximation Algorithm

• The Final Design

Choosing the Training Experience

• Whether the training experience provides direct or indirect feedback regarding the choices
made by the performance system:

• Example:

– Direct training examples in learning to play checkers consist of individual checkers


board states and the correct move for each.

– Indirect training examples in the same game consist of the move sequences and final
outcomes of various games played in which information about the correctness of
specific moves early in the game must be inferred indirectly from the fact that the
game was eventually won or lost – credit assignment problem.

• The degree to which the learner controls the sequence of training examples:

• Example:

– The learner might rely on the teacher to select informative board states and to
provide the correct move for each

– The learner might itself propose board states that it finds particularly confusing and
ask the teacher for the correct move. Or the learner may have complete control over
the board states and (indirect) classifications, as it does when it learns by playing
against itself with no teacher present.

• How well it represents the distribution of examples over which the final system performance
P must be measured: In general learning is most reliable when the training examples follow
a distribution similar to that of future test examples.
• Example:

– If the training experience in play checkers consists only of games played against itself,
the learner might never encounter certain crucial board states that are very likely to be played by the
human checkers champion. (Note however that the most current theory of machine learning rests on
the crucial assumption that the distribution of training examples is identical to the distribution of test
examples)

Choosing the Target Function

• To determine what type of knowledge will be learned and how this will be used by the
performance program:

• Example:

– In play checkers, it needs to learn to choose the best move among those legal moves:
ChooseMove: B ->M, which accepts as input any board from the set of legal board
states B and produces as output some move from the set of legal moves M.

• Since the target function such as ChooseMove turns out to be very difficult to learn given the
kind of indirect training experience available to the system, an alternative target function is
then an evaluation function that assigns a numerical score to any given board state, V: B ->R.

Choosing a Representation for the Target Function

• Given the ideal target function V, we choose a representation that the learning system will
use to describe V' that it will learn:

• Example:

– In play checkers,

V'(b) = w0 + w1x1 + w2x2 + w3x3 + w4x4 + w5x5 + w6x6

– where wi is the numerical coefficient or weight to determine the relative importance


of the various board features and xi is the number of i-th objects on the board.

Choosing a Function Approximation Algorithm

• Each training example is given by <b, Vtrain(b)> where Vtrain(b) is the training value for a board
b.

• Estimating Training Values:

Vtrain(b) <- V' (Successor(b)).

• Adjusting the weights: To specify the learning algorithm for choosing the weights wi to best
fit the set of training examples {<b, Vtrain(b)>}, which minimizes the squared error E between
the training values and the values predicted by the hypothesis V‘

E= ∑ (𝑉𝑡𝑟𝑎𝑖𝑛(𝑏) − 𝑉 ′(𝑏))^2
The Final Design

• Performance System: To solve the given performance task by using the learned target
function(s). It takes an instance of a new problem (new game) as input and a trace of its
solution (game history) as output.

• Critic: To take as input the history or trace of the game and produce as output a set of training
examples of the target function.

• Generalizer: To take as input the training examples and produce an output hypothesis that is
its estimate of the target function. It generalizes from the specific training examples,
hypothesizing a general function that covers these examples and other cases beyond the
training examples.

• Experiment Generator: To take as input the current hypothesis (currently learned function)
and outputs a new problem (i.e., initial board state) for Performance System to explore. Its
role is to pick new practice problems that will maximize the learning rate of the overall
system.

History of ML

1950s: Samuel’s Checker-Playing Program

1960s: Neural Network: Rosenblatt’s Perceptron (Inventor of ANN)

Pattern Recognition

Minsky & Papert Prove Limitations of Perceptron

1970s: Symbolic Concept Introduction

Expert Systems & Knowledge Acquisition Bottleneck

Quinlan’s ID3

NLP

1980s: Advanced Decision Trees & Rule Learning


Focus on experimental methodology

Resurgence of Neural Network

90s: ML & Statistics

Support Vector Machines

Data Mining

Adaptive Agents & Web Applications

Text Learning

Reinforcement Learning

Ensembles

Bayes Net Learning

1994: Self Driving Cars Road Test

1997: Deep Blue defeated Garry Kasparov in Chess Exhibition Match.

2009: Google Builds Self Driving Cars

2011: Watson wins Jeopardy

2014: Human Vision surpasses by ML systems

ANN(ARTIFICIAL NEURAL NETWORK)

An Artificial Neural Network (ANN) has hidden layers which are used to respond to more complicated
tasks than the earlier perceptrons could. ANNs are a primary tool used for Machine Learning. Neural
networks use input and output layers and, normally, include a hidden layer (or layers) designed to
transform input into data that can be used the by output layer. The hidden layers are excellent for
finding patterns too complex for a human programmer to detect, meaning a human could not find the
pattern and then teach the device to recognize it.

Artificial Neural Networks or ANN is an information processing paradigm that is inspired by the way
the biological nervous system such as brain process information. It is composed of large number of
highly interconnected processing elements(neurons) working in unison to solve a specific problem

The following diagram represents the general model of ANN which is inspired by a biological neuron.
It is also called Perceptron.

A single layer neural network is called a Perceptron. It gives a single output.


Clustering in Machine Learning

It is basically a type of unsupervised learning method . An unsupervised learning method is a method


in which we draw references from datasets consisting of input data without labelled responses.
Generally, it is used as a process to find meaningful structure, explanatory underlying processes,
generative features, and groupings inherent in a set of examples.
Clustering is the task of dividing the population or data points into a number of groups such that
data points in the same groups are more similar to other data points in the same group and
dissimilar to the data points in other groups. It is basically a collection of objects on the basis of
similarity and dissimilarity between them.

Applications of Clustering in different fields

Marketing : It can be used to characterize & discover customer segments for marketing purposes.

Biology : It can be used for classification among different species of plants and animals.

Libraries : It is used in clustering different books on the basis of topics and information.

Insurance : It is used to acknowledge the customers, their policies and identifying the frauds.

City Planning: It is used to make groups of houses and to study their values based on their
geographical locations and other factors present.
Earthquake studies: By learning the earthquake-affected areas we can determine the dangerous
zones.

Reinforcement Learning

Reinforcement learning is an area of Machine Learning. It is about taking suitable action to maximize
reward in a particular situation. It is employed by various software and machines to find the best
possible behavior or path it should take in a specific situation. Reinforcement learning differs from the
supervised learning in a way that in supervised learning the training data has the answer key with it
so the model is trained with the correct answer itself whereas in reinforcement learning, there is no
answer but the reinforcement agent decides what to do to perform the given task. In the absence of
a training dataset, it is bound to learn from its experience.
Example: The problem is as follows: We have an agent and a reward, with many hurdles in between.
The agent is supposed to find the best possible path to reach the reward. The following problem
explains the problem more easily.

Decision Tree Learning

Decision tree learning is one of the predictive modelling approaches used in statistics, data
mining and machine learning. It uses a decision tree (as a predictive model) to go from observations
about an item (represented in the branches) to conclusions about the item's target value (represented
in the leaves). Tree models where the target variable can take a discrete set of values are
called classification trees; in these tree structures, leaves represent class labels and branches
represent conjunctions of features that lead to those class labels. Decision trees where the target
variable can take continuous values (typically real numbers) are called regression trees. Decision trees
are among the most popular machine learning algorithms given their intelligibility and simplicity.

• Decision tree learning is a method for approximating discrete-valued target functions.

• The learned function is represented by a decision tree.

– A learned decision tree can also be re-represented as a set of if-then rules.

• Decision tree learning is one of the most widely used and practical methods for inductive
inference.

• It is robust to noisy data and capable of learning disjunctive expressions.

• Decision tree learning method searches a completely expressive hypothesis .

– Avoids the difficulties of restricted hypothesis spaces.

– Its inductive bias is a preference for small trees over large trees.

• The decision tree algorithms such as ID3, C4.5 are very popular inductive inference
algorithms, and they are successfully applied to many leaning tasks.

Bayesian Networks

Bayesian networks are a type of Probabilistic Graphical Model that can be used to build models from
data and/or expert opinion.

They can be used for a wide range of tasks including prediction, anomaly detection, diagnostics,
automated insight, reasoning, time series prediction and decision making under uncertainty. Figure
below shows these capabilities in terms of the four major analytics disciplines, Descriptive
analytics, Diagnostic analytics, Predictive analytics and Prescriptive analytics.

They are also commonly referred to as Bayes nets, Belief networks and sometimes Causal networks.

A Bayes net is a model. It reflects the states of some part of a world that is being modeled and it
describes how those states are related by probabilities. The model might be of your house, or your
car, your body, your community, an ecosystem, a stock-market, etc. Absolutely anything can be
modeled by a Bayes net. All the possible states of the model represent all the possible worlds that can
exist, that is, all the possible ways that the parts or states can be configured. The car engine can be
running normally or giving trouble. It's tires can be inflated or flat. Your body can be sick or healthy,
and so on.

Support Vector Machines

Support Vector Machine (SVM) is a relatively simple Supervised Machine Learning Algorithm used for
classification and/or regression. It is more preferred for classification but is sometimes very useful for
regression as well. Basically, SVM finds a hyper-plane that creates a boundary between the types of
data. In 2-dimensional space, this hyper-plane is nothing but a line.
In SVM, we plot each data item in the dataset in an N-dimensional space, where N is the number of
features/attributes in the data. Next, find the optimal hyperplane to separate the data. So by this, you
must have understood that inherently, SVM can only perform binary classification (i.e., choose
between two classes). However, there are various techniques to use for multi-class problems.

SVM works very well without any modifications for linearly separable data. Linearly Separable Data is
any data that can be plotted in a graph and can be separated into classes using a straight line.

A: Linearly Separable Data B: Non-Linearly Separable Data

We use Kernelized SVM for non-linearly separable data. Say, we have some non-linearly separable
data in one dimension. We can transform this data into two-dimensions and the data will become
linearly separable in two dimensions. This is done by mapping each 1-D data point to a
corresponding 2-D ordered pair.
So for any non-linearly separable data in any dimension, we can just map the data to a higher
dimension and then make it linearly separable. This is a very powerful and general transformation.
A kernel is nothing a measure of similarity between data points. The kernel function in a kernelized
SVM tell you, that given two data points in the original feature space, what the similarity is between
the points in the newly transformed feature space.

Genetic Algorithms

Genetic Algorithms(GAs) are adaptive heuristic search algorithms that belong to the larger part of
evolutionary algorithms. Genetic algorithms are based on the ideas of natural selection and genetics.
These are intelligent exploitation of random search provided with historical data to direct the search
into the region of better performance in solution space. They are commonly used to generate high-
quality solutions for optimization problems and search problems.

Genetic algorithms simulate the process of natural selection which means those species who can
adapt to changes in their environment are able to survive and reproduce and go to next generation.
In simple words, they simulate “survival of the fittest” among individual of consecutive generation for
solving a problem. Each generation consist of a population of individuals and each individual
represents a point in search space and possible solution. Each individual is represented as a string of
character/integer/float/bits. This string is analogous to the Chromosome

Foundation of Genetic Algorithms

Genetic algorithms are based on an analogy with genetic structure and behavior of chromosome of
the population. Following is the foundation of GAs based on this analogy –

Individual in population compete for resources and mate

Those individuals who are successful (fittest) then mate to create more offspring than others

Genes from “fittest” parent propagate throughout the generation, that is sometimes parents create
offspring which is better than either parent.

Thus, each successive generation is more suited for their environment.

Search space

The population of individuals are maintained within search space. Each individual represent a solution
in search space for given problem. Each individual is coded as a finite length vector (analogous to
chromosome) of components. These variable components are analogous to Genes. Thus, a
chromosome (individual) is composed of several genes (variable components).

Fitness Score

A Fitness Score is given to everyone which shows the ability of an individual to “compete”. The
individual having optimal fitness score (or near optimal) are sought.

The GAs maintains the population of n individuals (chromosome/solutions) along with their fitness
scores. The individuals having better fitness scores are given more chance to reproduce than others.
The individuals with better fitness scores are selected who mate and produce better offspring by
combining chromosomes of parents. The population size is static, so the room has to be created for
new arrivals. So, some individuals die and get replaced by new arrivals eventually creating new
generation when all the mating opportunity of the old population is exhausted. It is hoped that over
successive generations better solutions will arrive while least fit die.
Each new generation has on average more “better genes” than the individual (solution) of previous
generations. Thus, each new generations have better “partial solutions” than previous generations.
Once the offspring's produced having no significant difference than offspring produced by previous
populations, the population is converged. The algorithm is said to be converged to a set of solutions
for the problem.

Issues in Machine Learning

• What algorithms exist for learning general target functions from specific training examples ?

• How does the number of training examples influence accuracy ?

• When and how can prior knowledge held by the learner guide the process of generalizing
from examples ?

• What is the best strategy for choosing a useful next training experience, and how does the
choice of this strategy alter the complexity of the learning problem ?

• What is the best way to reduce the learning task to one or more function approximation
problems ?

• How can the learner automatically alter its representation to improve its ability to represent
and learn the target function ?

Data Science vs Machine Learning

You might also like