0% found this document useful (0 votes)

57 views35 pages

MLT UNIT-3 Notes

Uploaded by

srimaddhesia9

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

57 views35 pages

MLT UNIT-3 Notes

Uploaded by

srimaddhesia9

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 35

UNIT – 3

 Decision Tree:

o Decision Tree is a Supervised learning technique that

can be used for both classification and Regression problems,
but mostly it is preferred for solving Classification problems.
It is a tree-structured classifier, where internal nodes
represent the features of a dataset, branches
represent the decision rules, and each leaf node
represents the outcome.

o In a Decision tree, there are two nodes, which are

the Decision Node and Leaf Node. Decision nodes are
used to make any decision and have multiple branches,
whereas Leaf nodes are the output of those decisions and do
not contain any further branches.
o The decisions or the test are performed based on features of
the given dataset.
o It is a graphical representation for getting all the
possible solutions to a problem/decision based on
given conditions.
o It is called a decision tree because, like a tree, it starts with
the root node, which expands on further branches and
constructs a tree-like structure.
o To build a tree, we use the CART algorithm, which stands
for Classification and Regression Tree algorithm.
o A decision tree simply asks a question and based on the
answer (Yes/No), it further splits the tree into subtrees.
Decision Tree Terminology:

 Root Node: Root node is from where the decision tree starts. It represents the entire
dataset, which further gets divided into two or more homogeneous sets.

 Leaf Node: Leaf nodes are the final output node, and the tree cannot be segregated
further after getting a leaf node.

 Splitting: Splitting is the process of dividing the decision node/root node into sub-
nodes according to the given conditions.

 Branch/Sub Tree: A tree formed by splitting the tree.

 Pruning: Pruning is the process of removing unwanted branches from the tree.

 Parent/Child node: The root node of the tree is called the parent node, and other
nodes are called the child nodes.

Example:

Suppose there is a candidate who has a job offer and wants

to decide whether he should accept the offer or not. So, to
solve this problem, the decision tree starts with the root
node (Salary attribute by ASM). The root node splits further
into the next decision node (distance from the office) and
one leaf node based on the corresponding labels. The next
decision node further gets split into one decision node (Cab
facility) and one leaf node. Finally, the decision node splits
into two leaf nodes (Accepted offers and Declined offer).
Consider the below diagram:
 Attribute selection measures used in decision tree are:
1. Entropy: Entropy is a metric to measure the impurity in each attribute.
It specifies randomness in data. The value of entropy range 0-1.
Entropy can be calculated as:
Entropy(S) = - P(Yes) log base 2P(Yes) – P(No)Log base2 P(No)
where,
S -> no. of samples
P(Yes) -> Probability of yes
P(No) -> Probability of no
A log function of base 2 is used because the entropy is encoded in
bits 0 and 1.

2. Information Gain:
Information gain is the difference between before and after a split
on a given attribute. It measures how much information a feature
provides about a target.

Constructing a decision tree is solely about finding a feature that

returns the highest information gain. The feature with the highest
information gain produces the best split, classifying the training
dataset better according to the target variable.

Information gain has the following formula:

Where:

 a is the specific attribute or class label.

 Entropy(S) is the entropy of dataset S.
 |Sv| / |S| is the proportion of the values in S to the number of
values in the dataset S.

3. Gain Ratio:
Gain Ratio or Uncertainty Coefficient is used to normalize the
information gain of an attribute against how much entropy that
attribute has. The information gain measure is biased towards tests
with many outcomes.
Formula of gain ratio is given by

Gain Ratio=Information Gain/Split information

 ID3 Algorithm:

ID3 stands for Iterative Dichotomizer3 and is named

such because the algorithm iteratively(repeatedly)
dichotomizes(divides) features into two or more groups at
each step.ID3 is an algorithm invented by Ross Quinlan used
to generate a decision tree from a dataset and is the most
popular algorithms used to constructing trees.

ID3 is the core algorithm for building a decision tree. It

employs a top-down greedy search through the space of all
possible branches with no backtracking. This algorithm uses
information gain and entropy to construct a classification
decision tree.

Steps:

1. Calculate entropy for the whole dataset.

2. For each attribute:

Calculate entropy for all its categorical values.

Calculate information gain for the feature.

3. find the feature with maximum information gain

4. Repeat till algorithm converges for a decision tree.

Example Of ID3 Algorithm:

Suppose we had the following dataset:

From the above example dataset, we are required to construct a

decision tree to help us decide whether we should play volleyball
based on the weather conditions.

To construct a decision tree, we need to pick the features that will

best guide us to make a viable decision on whether we should play
or not play volleyball.

We can't randomly select a feature from the dataset to build the

tree, so Entropy and information gain are good criteria for this
problem.

To begin with, we have four features we need to consider:

 Outlook
 Temp
 Humidity
 Wind

Finding the root node feature

Since we cannot just pick one of the features to start our decision
tree, we need to make calculations to get the feature with the
highest information gain from which we start splitting.

Calculate the entropy of the entire dataset (Entropy(S))

We can see that we have 5 Noes(or negatives) and 9 Yeses(or

positives). The total number of entries is 14.
The entropy of the whole dataset is:
Calculate the information gain for the Outlook feature

Outlook has 3 attributes:

 Sunny
 Overcast
 Rain.

So, we will calculate the entropy of each of these attributes (Sv) as

follows:

We have 5 Sunny attributes for Outlook:

 3 negative Sunny Outlooks (When Play Volleyball is No).

 2 positive Sunny Outlooks (When Play Volleyball is Yes).

Let's calculate:

We have 4 Overcast attributes for Outlook:

 0 negative Overcast Outlooks.

 4 positive Overcast Outlooks.

Let's calculate:

We have 5 Rain attributes for Outlook:

 2 negative Rain attributes.
 3 positive Rain attributes.

Let's calculate:

The information gain for Outlook

We have the following Entropies:

 Entropy(S) = 0.94
 Entropy (SSunny) = 0.97
 Entropy (SOvercast) = 0
 Entropy (SRain) = 0.97

We use the formula for information gain to calculate the gain.

So:

Information gain for Outlook is 0.24.

Similarly, we must calculate the information gain for the other

features.
Calculate the information gain for the Temp feature.

Temp has 3 attributes:

 Hot
 Mild
 Cool

Since we already have the entropy for the entire dataset

(Entropy(S)), we will calculate the entropy of each
attribute(Entropy (Sv)) of Temp, just as we did with Outlook.

The entropy of Hot:

The entropy of Mild:

The entropy of Cool:

Calculate the information gain for the Temp feature:

Information gain for Temp is 0.03.

Similarly, calculate the information gain for Humidity and Wind. All
information gain values will be:

 Gain(S, Outlook) = 0.24

 Gain(S, Temp) = 0.03
 Gain(S, Humidity) = 0.15
 Gain(S, Wind) = 0.04
💡
Outlook gives the highest information about our target variable from
the information gain values. It will act as the root node of our tree
from where the splitting will begin.

Note, for Sunny and Rain branches, we can not easily conclude a
yes or a no since we have events where Play Volleyball is yes and
Play volleyball is no. That means that their entropy is more than
zero and hence impure. So we need to split them.
💡

Overcast is a branch with zero entropy since it has all events as Play
volleyball (Yes), so it automatically becomes a leaf node.

Finding the internal nodes

We will calculate information gain for the rest of the features when
the Outlook is Sunny and when the Outlook is Rain:
Splitting on the Sunny attribute

Calculate the information gain for Temp.

Values (Temp) = Hot, Mild, Cool.

The entropy for Hot:

The entropy for Mild:

The entropy for Cool:

The Information gain for Temp:

Calculate the information gain for Humidity

Values (Humidity) = High, Normal.

The entropy for Sunny:

Entropy (SSunny) = 0.97

The entropy for High:

Entropy (SHigh) = 0

Then entropy for Normal:

Entropy (SNormal) = 0

The Information gain for Humidity:

Calculate the information gain for Wind.

Values (Wind) = Strong, Weak.

The entropy for Sunny:

Entropy (SSunny) = 0.97

The entropy for Strong:

Entropy (SStrong) = 1.0

The entropy for Weak:

The Information gain for Wind:

Image by author

Humidity gives the highest information gain value(0.97). So far,

our Tree will look like this:
Splitting on the Rain attribute

Calculate the information gain for Temp.

Values (Temp) = Mild, Cool

The entropy for Mild:

The entropy for Cool:

Entropy (SCool) = 1.0

The information gain for Temp:

Calculate the information gain for Humidity.

Values (Humidity) = High, Normal.

The entropy for High:

Entropy (SHigh) = 1.0

The Entropy for Normal:

The Information gain for Humidity:

Calculate the information gain for Wind.

Values (Wind) = Strong, Weak.

The entropy for Strong:

Entropy (SStrong) = 0

The entropy for Weak:

Entropy(SWeak) = 0

The information gain for Wind:

Wind gives the highest information gain value (0.97). Now we can
complete our Decision Tree.
A complete decision tree with Entropy and Information
gain criteria:

Issues in Decision Tree Learning:

Issues in decision tree learning:
Decision tree learning is a popular machine learning algorithm used for both
classification and regression tasks. Like any algorithm, decision trees come with
their own set of challenges and issues. Here are some common issues associated
with decision tree learning:
1. Overfitting:
Decision trees are prone to overfitting, especially when they are deep and capture
noise in the training data. Overfitting occurs when a model learns the training data
too well, including the noise and outliers, and performs poorly on new, unseen
data.
2. High Variance:
Decision trees can have high variance, meaning that small changes in the training
data can result in significantly different tree structures. This can lead to instability
in the model.
3. Sensitivity to Small Variations in Data:
Small changes in the input data can lead to different tree structures. This sensitivity
can make decision trees less robust, especially when dealing with noisy or
imprecise data.
4. Bias towards Dominant Classes:
In classification tasks with imbalanced class distributions, decision trees may have
a bias towards the dominant class. They might perform well on the majority class
but struggle to accurately predict instances from the minority class.
5. Limited Expressiveness:
Decision trees may not be expressive enough to capture complex relationships in
the data. They are considered "weak learners" compared to some other algorithms.
6. Difficulty Handling Missing Data:
Decision trees can struggle with datasets that have missing values. The way
missing data is handled (or not handled) can affect the performance of the model.
7. Lack of Interpretability for Deep Trees:
While decision trees are generally interpretable, deep trees can become complex
and difficult to interpret. This can be a challenge when trying to explain the model
to non-experts.
8. Computational Complexity:
Building a decision tree involves recursively splitting the dataset, which can
become computationally expensive, especially for large datasets or deep trees. This
complexity can affect both training and prediction times.

Mitigation Strategies:
1. Pruning:
Pruning involves removing branches from the tree that do not provide significant
predictive power. This helps to reduce overfitting and make the tree more
generalizable.
2. Minimum Samples per Leaf or Split:
Setting a minimum number of samples required to make a split or form a leaf node
can help control the tree's depth and mitigate overfitting.
3. Feature Selection:
Carefully selecting relevant features and avoiding irrelevant ones can improve the
tree's ability to generalize to new data.
4. Ensemble Methods:
Using ensemble methods like Random Forests or Gradient Boosting can improve
the overall performance and robustness of decision trees by combining multiple
trees.
5. Handling Imbalanced Data:
Techniques like resampling, using different evaluation metrics, or using specialized
algorithms can address issues related to imbalanced class distributions.
6. Feature Engineering:
Preprocessing the data and engineering informative features can enhance the
performance of decision trees.
7. Cross-Validation:
Employing techniques like cross-validation helps to assess the model's
performance on different subsets of the data, reducing the risk of overfitting.
8. Hyperparameter Tuning:
Tuning the hyperparameters of the decision tree, such as the maximum depth,
minimum samples per leaf, and others, can significantly impact the model's
performance.
By carefully addressing these issues and applying appropriate mitigation strategies,
decision trees can be powerful and effective models in machine learning.

 Instance Based Learning:

Instance-based learning, often associated with k-Nearest Neighbors (k-NN)
algorithms, is a type of lazy learning approach. Instead of building a model
during the training phase, instance-based learning stores the training
instances and makes predictions based on the similarity between new
instances and the stored examples. In the case of k-NN, the "k" nearest
neighbors in the training data are used to determine the prediction for a new
instance.
While these approaches are typically used independently, you might use an
instance-based method to generate predictions for instances and then use
those predictions as features in a decision tree. However, this is more of an
ensemble learning approach rather than directly combining instance-based
learning with decision tree learning.

Here's a simplified example:

Use k-Nearest Neighbors to predict labels for instances in your dataset.

Use these predicted labels, along with other original features, as input to a
decision tree.
Ensemble methods, such as Random Forests, can be considered as a
combination of decision trees, but they typically don't integrate instance-
based learning directly.
It's essential to carefully consider the nature of your data and the problem
you're trying to solve when choosing and combining different machine
learning techniques. Each method has its strengths and weaknesses, and the
effectiveness of the combination will depend on the specific characteristics
of your dataset and the goals of your machine learning task.

 Inductive Inference:

Inductive inference in machine learning is the process of learning patterns,

relationships, or rules from data to make predictions or decisions on new,
unseen data. The goal is to generalize from specific examples in the training
set to make accurate predictions on previously unseen instances. One
common approach for inductive inference is to use machine learning
algorithms that can automatically identify and capture patterns in the data.

In supervised learning, a prevalent form of inductive inference, the

algorithm learns from a labeled training dataset, where input features are
associated with corresponding output labels. The learned model aims to map
input features to the correct output labels, enabling predictions on new,
unseen data.

Decision trees, support vector machines, neural networks, and k-Nearest

Neighbors are examples of machine learning algorithms that engage in
inductive inference. The success of inductive inference depends on factors
like the quality and representativeness of the training data, the chosen
algorithm, and the appropriate tuning of hyperparameters.

 K- Nearest Neighbor’s (KNN algorithm):

o K-Nearest Neighbors is one of the simplest Machine Learning

algorithms based on Supervised Learning technique.
o The K-NN algorithm assumes the similarity between the new case/data
and available cases and put the new case into the category that is
most similar to the available categories.
o The K-NN algorithm stores all the available data and classifies a new
data point based on the similarity. This means when new data appears
then it can be easily classified into a well suite category by using the K-
NN algorithm.
o The K-NN algorithm can be used for Regression as well as for
Classification but mostly it is used for Classification problems.
o K-NN is a non-parametric algorithm, which means it does not make
any assumption on underlying data.
o It is also called a lazy learner algorithm because it does not learn
from the training set immediately instead it stores the dataset and at
the time of classification, it performs an action on the dataset.
o KNN algorithm at the training phase just stores the dataset and when it
gets new data, then it classifies that data into a category that is much
similar to the new data.

Suppose there are two categories, i.e., Category A and Category B, and we
have a new data point x1, so this data point will lie in which of these
categories. To solve this type of problem, we need a K-NN algorithm. With
the help of K-NN, we can easily identify the category or class of a particular
dataset. Consider the below diagram:

 Steps Of K-NN Algorithm:

The K-NN working can be explained based on the below algorithm:
o Step-1: Select the number K of the neighbors.
o Step-2: Calculate the Euclidean distance of K number of neighbors.
o Step-3: Take the K nearest neighbors as per the calculated Euclidean
distance.
o Step-4: Among these k neighbors, count the number of data points in
each category.
o Step-5: Assign the new data points to that category for which the
number of the neighbor is maximum.
o Step-6: Our model is ready.

 Locally Weighted Regression:

Locally Weighted Regression (LWR), also known as Locally Weighted

Scatterplot Smoothing (LOESS), is a non-parametric regression technique
used in machine learning and statistics. It is particularly useful when
dealing with non-linear relationships between variables. LWR gives more
weight to data points that are close to the target point during the training
process, allowing the model to focus on the local behavior of the data.

how Locally Weighted Regression works:

 Basic Idea:
For each prediction, LWR assigns different weights to the data points
based on their proximity to the point where the prediction is
being made. Points closer to the prediction point receive higher
weights, while points farther away receive lower weights.

 Weighting Function:
The weights are assigned using a weighting function, which is typically
a Gaussian (bell-shaped) function.
where,
x^i is the feature value of the data point,
x is the feature value of the prediction point, and
τ is a bandwidth parameter that controls the width of the weighting
function.
 Local Regression:
LWR fits a regression model locally for each prediction point using the
weighted data. The weights are incorporated into the regression
algorithm to give more importance to nearby points.
 Prediction:
To make a prediction at a new point, the model computes a weighted
least squares regression using only the data points close to the
prediction point.
 Bandwidth Parameter:
The bandwidth parameter (τ) is crucial in controlling the degree of
locality. A smaller bandwidth focuses more on local details, but it may
lead to overfitting, while a larger bandwidth considers more global
patterns.

Pros and Cons:

Pros: LWR is flexible and can capture complex, non-linear relationships in

the data. It is adaptive to local patterns.

Cons: LWR may be computationally expensive for large datasets. The

choice of the bandwidth parameter is critical and can affect the model's
performance.

LWR is often used in situations where the underlying relationship between

variables is expected to change across different regions of the input
space. It's worth noting that while LWR can provide accurate predictions
in certain scenarios, it may not be the best choice for all types of data.
The choice of the bandwidth parameter is an important consideration, and
it may require some tuning to achieve optimal performance.
 Radial Basis Function networks:

A Radial Basis Function Network (RBFN) is a particular

type of neural network. In this article, I’ll be describing
its use as a non-linear classifier.

Generally, when people talk about neural networks or

“Artificial Neural Networks” they are referring to
the Multilayer Perceptron (MLP). Each neuron in an MLP
takes the weighted sum of its input values. That is, each
input value is multiplied by a coefficient, and the results
are all summed together. A single MLP neuron is a
simple linear classifier, but complex non-linear classifiers
can be built by combining these neurons into a network.

To me, the RBFN approach is more intuitive than the

MLP. An RBFN performs classification by measuring the
input’s similarity to examples from the training set. Each
RBFN neuron stores a “prototype”, which is just one of
the examples from the training set. When we want to
classify a new input, each neuron computes the
Euclidean distance between the input and its prototype.
Roughly speaking, if the input more closely resembles
the class A prototypes than the class B prototypes, it is
classified as class A.
 RBF Network Architecture

The above illustration shows the typical architecture of

an RBF Network. It consists of an input vector, a layer of
RBF neurons, and an output layer with one node per
category or class of data.

 The Input Vector

The input vector is the n-dimensional vector that you are

trying to classify. The entire input vector is shown to
each of the RBF neurons.

 The RBF Neurons

Each RBF neuron stores a “prototype” vector which is

just one of the vectors from the training set. Each RBF
neuron compares the input vector to its prototype, and
outputs a value between 0 and 1 which is a measure of
similarity. If the input is equal to the prototype, then the
output of that RBF neuron will be 1. As the distance
between the input and prototype grows, the response
falls off exponentially towards 0. The shape of the RBF
neuron’s response is a bell curve, as illustrated in the
network architecture diagram.

 The neuron’s response value is also called its

“activation” value.

 The prototype vector is also often called the neuron’s

“center” since it’s the value at the center of the bell
curve.

 The Output Nodes

The output of the network consists of a set of nodes, one

per category that we are trying to classify. Each output
node computes a sort of score for the associated
category. Typically, a classification decision is made by
assigning the input to the category with the highest
score.

 The score is computed by taking a weighted sum of the

activation values from every RBF neuron. By weighted
sum we mean that an output node associates a weight
value with each of the RBF neurons and multiplies the
neuron’s activation by this weight before adding it to the
total response.

 Because each output node is computing the score for a

different category, every output node has its own set of
weights. The output node will typically give a positive
weight to the RBF neurons that belong to its category,
and a negative weight to the others.

 RBF Neuron Activation Function

 Each RBF neuron computes a measure of the similarity

between the input and its prototype vector (taken from
the training set). Input vectors which are more similar to
the prototype return a result closer to 1. There are
different possible choices of similarity functions, but the
most popular is based on the Gaussian. Below is the
equation for a Gaussian with a one-dimensional input.

 Where x is the input, mu is the mean, and sigma is the

standard deviation. This produces the familiar bell curve
shown below, which is centered at the mean, mu (in the
below plot the mean is 5 and sigma is 1).

 The RBF neuron activation function is slightly different,

and is typically written as:

 In the Gaussian distribution, mu refers to the meaning of

the distribution. Here, it is the prototype vector which is
at the center of the bell curve.
 For the activation function, phi, we aren’t directly
interested in the value of the standard deviation, sigma,
so we make a couple simplifying modifications.

 The first change is that we’ve removed the outer

coefficient, 1 / (sigma * sqrt (2 * pi)). This term normally
controls the height of the Gaussian. Here, though, it is
redundant with the weights applied by the output nodes.
During training, the output nodes will learn the correct
coefficient or “weight” to apply to the neuron’s
response.

 The second change is that we’ve replaced the inner

coefficient, 1 / (2 * sigma^2), with a single parameter
‘beta’. This beta coefficient controls the width of the bell
curve. Again, in this context, we don’t care about the
value of sigma, we just care that there’s some
coefficient which is controlling the width of the bell
curve. So, we simplify the equation by replacing the
term with a single variable.

 RBF Neuron activation for different values of beta

 There is also a slight change in notation here when we
apply the equation to n-dimensional vectors. The double
bar notation in the activation equation indicates that we
are taking the Euclidean distance between x and mu and
squaring the result. For the 1-dimensional Gaussian, this
simplifies to just (x - mu) ^2.

 It’s important to note that the underlying metric here for

evaluating the similarity between an input vector and a
prototype is the Euclidean distance between the two
vectors.

 Also, each RBF neuron will produce its largest response

when the input is equal to the prototype vector. This
allows us to take it as a measure of similarity and sum
up the results from all the RBF neurons.

 As we move out from the prototype vector, the response

falls off exponentially. Recall from the RBFN architecture
illustration that the output node for each category takes
the weighted sum of every RBF neuron in the network–in
other words, every neuron in the network will have some
influence over the classification decision. The
exponential falls off the activation function, however,
means that the neurons whose prototypes are far from
the input vector will contribute very little to the result.

 Cased Based Learning:

Case based format encourages active learning and demonstrates how to apply
theoretical concepts to surgical practice.

1. Can be an element of curriculum.

2. Based on issue(s) that arise in a clinical case
3. Self-directed or structured
4. Structure depends on the level of the learner.
Case based learning instruction is one of the artist-oriented teaching
approaches since it promotes students’ active participation so they could form
their own learning. It helps transfer knowledge and expectations of the students
from their learning.

It is often deﬁned as a teaching method which requires students to actively

participate in real or hypothetical problem situations, reﬂecting the kinds of
experiences naturally encountered in the discipline under study.
Cases are stories with a message which students analyze and consider the
solutions of these stories.

Functions of case-based learning algorithm are as follows:

1. Preprocessor: This prepares the input for processing e.g., normalizing the
range of numeric value features to ensure that they are treated with equal
importance by the similarity function formatting the raw input into a set of
cases etc.
2. Similarity: This function assesses the similarities of a given case with the
previously stored cases in the concept description. Assessment may
involve explicit encoding and dynamic computation, most practical CBL
similarity functions and a compromise along the continuum between these
extremes.
3. Prediction: This function inputs the similarity assessments and generates
a prediction for the value of the given cases goal feature i.e., a
classification when it is symbolic values.
4. Memory Updating: This updates the stored case base by modifying or
abstracting previously stored cases, forgetting cases presumed to be noisy
or updating a features relevance weight setting.
 Case-based learning cycle with different schemes of CBL:

1. Case retrieval: After the problem situation has been assessed, the best
matching case is searched in the case base and an approximate solution is
retrieved.
2. Case adaptation: The retrieved solution is adapted to fit better the new
problem.
3. Solution evaluation: The adapted solution can be evaluated either before the
solution is applied to the problem or after the solution has been applied. In any
case, if the accomplished result is not satisfactory, the retrieved solution must be
adapted again or more cases should be retrieved.
4. Case-based updating: If the solution was verified as correct, the new case
may be added to the case base.

Terms used in Cycle:

A new problem is matched against the cases furnishing the case base and one
or more similar cases are Retrieved.
A solution suggested by the matching cases is then Reused.

 The benefits of CBR as a lazy problem-solving method are:

 Ease of knowledge elicitation

 Absence of problem-solving bias
 Incremental learning
 Suitability for complex and not-fully formalized solution spaces
 Suitability for sequential problem solving.
 Ease of explanation
 Ease of maintenance

 The Limitations of Case Based Learning are as follows:

 Handling large case bases

 Dynamic problem domains
 Handling noisy data
 Fully automatic operation

 Applications of Case Based Learning are:

 Advising as a process of resolving diagnosed problems
 Design as a process of satisfying a number of posed constraints.
 Planning as a process of arranging a sequence of actions in time.
 Interpretation as a process of evaluating situations/problems in some
context.
 Classification as a process of explaining a number of encountered
symptoms.

 Major Paradigms of Machine Learning include:

 Rote Learning: It deals with One-to-one mapping from inputs to stored

representation. "Learning by memorization” Association-based storage and
retrieval.
 Induction: It uses specific examples to reach general conclusions
 Clustering: It involves automatically discovering natural grouping in data.
 Analogy: Helps to determine correspondence between two different
representations
 Discovery: It is a type of unsupervised learning in which a specific
goal/outcome is not provided.
 Genetic Algorithms: It is a method for solving both constrained and
unconstrained optimization problems based on a natural selection process
that mimics biological evolution.
 Reinforcement Learning: Only feedback (positive or negative reward) is
given at the end of a sequence of steps. Requires assigning reward to
steps by solving the credit assignment problem involving answering which
steps should receive credit or blame for a final result?

 The Inductive Learning Problem:

 Extrapolate from a given set of examples so that we can make accurate

predictions about future examples.
 Supervised vs Unsupervised learning
Want to learn an unknown function f(x) = y, where x is an input example
and y is the desired output. Supervised learning implies we are given a set
of (x, y) pairs by a "teacher." Unsupervised learning means we are only
given the xs. In either case, the goal is to estimate f.
 Concept learning
Given a set of examples of some concept/class/category, determine if a
given example is an instance of the concept or not. If it is an instance, we
call it a positive example. If it is not, it is called a negative example.
 Problem Example
Supervised Concept Learning by Induction
Given a training set of positive and negative examples of a concept,
construct a description that will accurately classify whether future
examples are positive or negative. That is, learn some good estimate of
function f given a training set {(x1, y1), (x2, y2), ..., (xn, yn)} where each yi
is either + (positive) or - (negative).

Chapter 13 Experimental Design and Analysis of Variance PDF
100% (1)
Chapter 13 Experimental Design and Analysis of Variance PDF
43 pages
DT Classifier
No ratings yet
DT Classifier
45 pages
T6 Decision Tree
No ratings yet
T6 Decision Tree
38 pages
Decision Trees
No ratings yet
Decision Trees
19 pages
06 Classification Decision Tree
No ratings yet
06 Classification Decision Tree
42 pages
FALLSEM2024-25 BCSE209L TH VL2024250101598 2024-08-05 Reference-Material-I
No ratings yet
FALLSEM2024-25 BCSE209L TH VL2024250101598 2024-08-05 Reference-Material-I
31 pages
Unit 4 - Decision Tree ID3
No ratings yet
Unit 4 - Decision Tree ID3
5 pages
DataMining-Handouts1 5
No ratings yet
DataMining-Handouts1 5
8 pages
L5 - Decision Tree - B
No ratings yet
L5 - Decision Tree - B
51 pages
Decision Tree
No ratings yet
Decision Tree
14 pages
Decision Tree
No ratings yet
Decision Tree
100 pages
Decision Trees Iterative Dichotomiser 3 (ID3) For Classification: An ML Algorithm
No ratings yet
Decision Trees Iterative Dichotomiser 3 (ID3) For Classification: An ML Algorithm
7 pages
ML Unit-3
No ratings yet
ML Unit-3
92 pages
Entropy and Information Gain Explained
No ratings yet
Entropy and Information Gain Explained
10 pages
Unit-4 (1) .Docx ML
No ratings yet
Unit-4 (1) .Docx ML
42 pages
Decision Tree
No ratings yet
Decision Tree
7 pages
Decision Tree
No ratings yet
Decision Tree
5 pages
Decision Tree Classifier-Introduction, ID3
No ratings yet
Decision Tree Classifier-Introduction, ID3
34 pages
Dec Tree
No ratings yet
Dec Tree
17 pages
Unit 3 (A) NGP
No ratings yet
Unit 3 (A) NGP
78 pages
DMDW Co3 Session 14
No ratings yet
DMDW Co3 Session 14
55 pages
Lecture 04 Decession Trees 04112022 015118pm
No ratings yet
Lecture 04 Decession Trees 04112022 015118pm
43 pages
Lect 8-Decision Tree-2
No ratings yet
Lect 8-Decision Tree-2
16 pages
Unit6 - 2 Classification-Decision-Trees
No ratings yet
Unit6 - 2 Classification-Decision-Trees
36 pages
Lesson 7 Supervised Method (Decision Trees) Algorithms
No ratings yet
Lesson 7 Supervised Method (Decision Trees) Algorithms
12 pages
ML Unit 3 Qa
No ratings yet
ML Unit 3 Qa
26 pages
Decision Trees
No ratings yet
Decision Trees
25 pages
Decision Trees
No ratings yet
Decision Trees
15 pages
Decision Tree (Class 37-38) 169692509554958626652505a71d481
No ratings yet
Decision Tree (Class 37-38) 169692509554958626652505a71d481
45 pages
Module 3-Decision Tree Learning
100% (1)
Module 3-Decision Tree Learning
33 pages
Decision Tree & Random Forest
No ratings yet
Decision Tree & Random Forest
41 pages
06 - Decision Trees
No ratings yet
06 - Decision Trees
14 pages
CS446: Machine Learning: Lecture 21 (ML Models - Decision Trees - ID3)
No ratings yet
CS446: Machine Learning: Lecture 21 (ML Models - Decision Trees - ID3)
54 pages
Decision Tree
No ratings yet
Decision Tree
12 pages
فاينل تعلم
No ratings yet
فاينل تعلم
144 pages
Day 5 Supervised Technique-Decision Tree For Classification PDF
100% (1)
Day 5 Supervised Technique-Decision Tree For Classification PDF
58 pages
Lec-3-Decision Trees
No ratings yet
Lec-3-Decision Trees
47 pages
Tree Models
No ratings yet
Tree Models
42 pages
Module 3 Chap 3 Decision Tree Learning
No ratings yet
Module 3 Chap 3 Decision Tree Learning
79 pages
Unit 3
No ratings yet
Unit 3
90 pages
Lecture2 DT
No ratings yet
Lecture2 DT
75 pages
Practice Q Machine Learning Ans
No ratings yet
Practice Q Machine Learning Ans
54 pages
1.decision Trees Concepts
No ratings yet
1.decision Trees Concepts
70 pages
2.3 Decision-Tree-Algorithm
No ratings yet
2.3 Decision-Tree-Algorithm
61 pages
Unit 3
No ratings yet
Unit 3
81 pages
Decision Tree Id3 Problem
No ratings yet
Decision Tree Id3 Problem
5 pages
Module - 2 Decision Tree Learning
No ratings yet
Module - 2 Decision Tree Learning
79 pages
Decision Tree Learning
No ratings yet
Decision Tree Learning
70 pages
Decision Tree Algorithm, Explained-1-22
No ratings yet
Decision Tree Algorithm, Explained-1-22
22 pages
07 - ML - Decision Tree
No ratings yet
07 - ML - Decision Tree
37 pages
ML Unit-2 Material WORD
No ratings yet
ML Unit-2 Material WORD
25 pages
Decision-Tree Learning .
No ratings yet
Decision-Tree Learning .
29 pages
Decision Tree
No ratings yet
Decision Tree
20 pages
Unit 2 1
No ratings yet
Unit 2 1
15 pages
Lecture 17 18
No ratings yet
Lecture 17 18
52 pages
Machine Learning Unit-3.2
No ratings yet
Machine Learning Unit-3.2
61 pages
Decision Tree
100% (4)
Decision Tree
66 pages
Geometric Intuition of Decision Tree: Axis Parallel Hyperplanes
No ratings yet
Geometric Intuition of Decision Tree: Axis Parallel Hyperplanes
7 pages
Decision Tree
No ratings yet
Decision Tree
43 pages
Decision Tree in Machine Learning
No ratings yet
Decision Tree in Machine Learning
11 pages
Basic Math Notes
From Everand
Basic Math Notes
Ernest Bywater
5/5 (2)
Non Linear Control of Four Wheel Omnidirectional Mobile Robot Modeling Simulation Real Time Implementation
No ratings yet
Non Linear Control of Four Wheel Omnidirectional Mobile Robot Modeling Simulation Real Time Implementation
23 pages
Feedback Control System For Inverted Cart Pendulum
No ratings yet
Feedback Control System For Inverted Cart Pendulum
16 pages
OptiTekServices Mining
No ratings yet
OptiTekServices Mining
3 pages
Ix MCQS
No ratings yet
Ix MCQS
37 pages
Computer ISCE Sample Paper
100% (1)
Computer ISCE Sample Paper
5 pages
Mst121 Chapter A1
No ratings yet
Mst121 Chapter A1
52 pages
Topic 3 Notes: Jeremy Orloff
No ratings yet
Topic 3 Notes: Jeremy Orloff
11 pages
Final Report
No ratings yet
Final Report
17 pages
The Bivariate Poisson Distribution
No ratings yet
The Bivariate Poisson Distribution
45 pages
Ad Tepa - Pessi by Qamar Ali
No ratings yet
Ad Tepa - Pessi by Qamar Ali
3 pages
Improved Serially Concatenated Convolution Turbo Code (SCCTC) Using Chicken Swarm Optimization
No ratings yet
Improved Serially Concatenated Convolution Turbo Code (SCCTC) Using Chicken Swarm Optimization
6 pages
4Q Sampling Distribution of The Sample Means
No ratings yet
4Q Sampling Distribution of The Sample Means
26 pages
3 Graphing Quadratic Functions Worksheet
No ratings yet
3 Graphing Quadratic Functions Worksheet
9 pages
Modeling and Design of Plate Heat Exchanger
No ratings yet
Modeling and Design of Plate Heat Exchanger
33 pages
Decisions Under Uncertainty-L3
No ratings yet
Decisions Under Uncertainty-L3
16 pages
Quadratic PDF
No ratings yet
Quadratic PDF
8 pages
The Multiple Classical Linear Regression Model (CLRM) : Specification and Assumptions
No ratings yet
The Multiple Classical Linear Regression Model (CLRM) : Specification and Assumptions
19 pages
II PU PASSING Questions & Answers 2024 Annual Exam PART D & E
No ratings yet
II PU PASSING Questions & Answers 2024 Annual Exam PART D & E
27 pages
Government Intervention Chapter - 9: Exercise Practice Set: S D S D S D
No ratings yet
Government Intervention Chapter - 9: Exercise Practice Set: S D S D S D
7 pages
Using Special Product Patterns To Multiply Binomials
No ratings yet
Using Special Product Patterns To Multiply Binomials
3 pages
Mathematical Techniques in Crystallography and Materials Science - Edward Prince
0% (1)
Mathematical Techniques in Crystallography and Materials Science - Edward Prince
199 pages
Type Code Movie Category: A C D F H M
No ratings yet
Type Code Movie Category: A C D F H M
2 pages
Mode From Frequency Table
No ratings yet
Mode From Frequency Table
16 pages
Solutions To Problems: = = ⎡ ⎣⎢ ⎤ ⎦⎥ ρ π kgm m 7 860 4 3 0 015 0 - = 0 111 - kg
No ratings yet
Solutions To Problems: = = ⎡ ⎣⎢ ⎤ ⎦⎥ ρ π kgm m 7 860 4 3 0 015 0 - = 0 111 - kg
15 pages
Otc 25457 MS PDF
No ratings yet
Otc 25457 MS PDF
9 pages
International - Competitions IMO 2013 16 PDF
No ratings yet
International - Competitions IMO 2013 16 PDF
2 pages
Problem Set 3a
No ratings yet
Problem Set 3a
2 pages
Common Core Lesson 12 Homework Answer Key
100% (2)
Common Core Lesson 12 Homework Answer Key
8 pages
Syllabii OF B.Tech. Computer Engineering 2002
No ratings yet
Syllabii OF B.Tech. Computer Engineering 2002
82 pages