0% found this document useful (0 votes)

19 views43 pages

Python Unit 4

Uploaded by

mudeshivakullayappanaik

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views43 pages

Python Unit 4

Uploaded by

mudeshivakullayappanaik

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 43

UNIT-IV

Introduction to Pattern Recognition:

What is a pattern?

A pattern is some phenomenon that repeats regularly based on a set rule or condition.

(Or)

Pattern is everything around in this digital world. A pattern can either be seen physically or it can
be observed mathematically by applying algorithms.

Example:
Slide #1 shows a set of training examples and the corresponding label associated with each
example.

Some of you will try to unlock the labeling pattern by looking at the data because we have few
examples only. Some of you with access to machine learning software will immediately plug in
these training examples and select their favored method of learning, for example decision trees, to
figure out the relationship between the numbers and their labels.

Let us look at our problem again. But this time let us use a different representation for our problem
as shown in Slide #2.

Dept. Of C.S.E-C.B.I.T Page 1

In this case, you will be easily able to discern the labeling pattern as shown in Slide #3. The label
assigned to a number depends upon the logical AND of second and sixth bits of binary
representation of each number.

This Manner Patterns can represent with a rule or condition.

Dept. Of C.S.E-C.B.I.T Page 2

Pattern Recognition:

Pattern recognition is the process of recognizing patterns by using machine learning algorithm.
Pattern recognition can be defined as the classification of data based on knowledge already
gained or on statistical information extracted from patterns and/or their representation.

Features:
A feature is a property or characteristic of a pattern. For example, weight and height are the
two different features characterizing the chairs and humans.

Types of Features
There are different types of features. We may categorize them as follows:

1. Nominal data:
 A nominal feature fN assumes values from a set of distinct values.

 Ex: Type of Curve: A possible domain of this feature is: {line, parabola, circle, ellipse}.

 We can perform some operations on these data include comparison, mode.

2. Ordinal data:
 The elements of the domain of the variable are ordered, in addition to being distinct.

 EX: Height of an object: domain = {very tall, tall, medium, short, very short}.

 operations possible on ordinal data comparison, mode, and median and percentile.

3. Interval-valued Data:

 For interval-valued data, the differences between values and also need to satisfy the
properties of ordinal data.

 An example is: Temperature in Celcius and Fahrenheit

 Mean and Standard Deviation are two possible operations on these Data.

Dept. Of C.S.E-C.B.I.T Page 3

Pattern Representation:
Patterns can be represented in a number of ways. All the ways pertains to giving the values of the
features used for that particular pattern.

Representing patterns as vectors

• The most popular method of representing patterns is as vectors.

• Here, the training dataset may be represented as a matrix of size (nxd), where each row
corresponds to a pattern and each column represents a feature.

• Each attribute/feature/variable is associated with a domain. A domain is a set of numbers, each

number pertains to a value of an attribute for that particular pattern.

• The class label is a dependent attribute which depends on the ‘d’ independent attributes.

In this case, n=7 and d=6. As can be seen, each pattern has six attributes (or features). Each
attribute in this case is a number between 1 and 9. The last number in each line gives the class of
the pattern. In this case, the class of the patterns is either 1, 2 or 3.

• If the patterns are two- or three-dimensional, they can be plotted.

• Consider the dataset

Dept. Of C.S.E-C.B.I.T Page 4

Representing patterns as strings:

• Here each pattern is a string of characters from an alphabet.

• This is generally used to represent gene expressions.

For example, DNA can be represented as

GTGCATCTGACTCCT...

RNA is expressed as

GUGCAUCUGACUCCU....

This can be translated into protein which would be of the form

VHLTPEEK ....

Each string of characters represents a pattern. Operations like pattern matching or finding the
similarity between strings are carried out with these patterns.

Representing patterns by using logical operators

• Here each pattern is represented by a sentence (well-formed formula) in a logic.

Dept. Of C.S.E-C.B.I.T Page 5

• An example would be if (beak(x) = red) and (color(x) = green) then parrot(x) This is a rule where
the antecedent is a conjunction of primitives and the consequent is the class label.

• Another example would be if (has-trunk(x)) and (color(x) = black) and (size(x) = large) then
elephant(x)

The curse of dimensionality:

The curse of dimensionality was first termed by Richard E. Bellman when considering problems in
dynamic programming.

In ML, curse of dimensionality can be defined as follows: as the number of features or dimensions
‘d’ grows, the amount of data we require to generalize accurately grows exponentially. As the
dimensions increase the data becomes sparse and as the data becomes sparse it becomes hard to
generalize the model.

Curse of Dimensionality refers to a set of problems that arise when working with high-
dimensional data.

The dimension of a dataset corresponds to the number of attributes/features that exist in a

dataset.

A dataset with a large number of attributes, generally of the order of a hundred or more, is
referred to as high dimensional data.

For example:

Suppose we are building several machine learning models to analyze the performance of a Formula
One (F1) driver. Consider the following cases:

i) Model_1 consists of only two features say the circuit name and the country name.

ii) Model_2 consists of 4 features say weather and max speed of the car including the above two.

iii) Model_3 consists of 8 features say driver’s experience, number of wins, car condition, and
driver’s physical fitness including all the above features.

iv) Model_4 consists of 16 features say driver’s age, latitude, longitude, driver’s height, hair color,
car color, the car company, and driver’s marital status including all the above features.

v) Model_5 consists of 32 features.

Dept. Of C.S.E-C.B.I.T Page 6

vi) Model_6 consists of 64 features.

vii) Model_7 consists of 128 features.

viii) Model_8 consists of 256 features.

ix) Model_9 consists of 512 features.

x) Model_10 consists of 1024 features.

The following figure shows the decrease in the standard deviation of the distribution as the
number of dimensions increases.

It is observed that on increasing the number of features the accuracy tends to increase until a
certain threshold value and after that, it starts to decrease. From the above example the accuracy
of Model_1 < accuracy of Model_2 < accuracy of Model_3 but if we try to extrapolate this trend it
doesn’t hold true for all the models having more than 8 features.

If we think logically some of the features provided to Model_4 don’t actually contribute anything
towards analyzing the performance of the F1 driver. For example, the driver’s height, hair color, car
color, car company, and the driver’s marital status is giving useless information for the model to
learn, hence the model gets confused with all this extra information, and the accuracy starts to go
down.

Dept. Of C.S.E-C.B.I.T Page 7

Dimensionality reduction:
Dimensionality Reduction is used to overcome the issue of the curse of dimensionality.

There are two components of dimensionality reduction:

 Feature selection: In this, we try to find a subset of the original set of variables, or features,
to get a smaller subset which can be used to model the problem. It usually involves three
ways:
1. Filter
2. Wrapper
3. Embedded

 Feature extraction: This reduces the data in a high dimensional space to a lower dimension
space, i.e. a space with lesser no. of dimensions.
A 3-D classification problem can be hard to visualize, whereas a 2-D one can be mapped to a
simple 2 dimensional space, and a 1-D problem to a simple line.

The various methods used for dimensionality reduction include:

 Principal Component Analysis (PCA)
 Linear Discriminant Analysis (LDA)
 Generalized Discriminant Analysis (GDA)

1) Principal component analysis (PCA):

 PCA comes under Feature Extraction.
 PCA is the process of computing the principal components and using them to perform
a change of basis on the data, sometimes using only the first few principal components and
ignoring the rest.
 The PCA method can be described and implemented using the tools of linear algebra.
 The below steps can explain the PCA operation applied to a dataset,

Dept. Of C.S.E-C.B.I.T Page 8

a) matrix A represented by an n x m that results in a projection of A which we will call
B. Let’s walk through the steps of this operation.

b) The first step is to calculate the mean values of each column.

c) Next, we need to center the values in each column by subtracting the mean column
value.

d) Covariance is a generalized version of correlation across multiple columns. A

covariance matrix is a calculation of covariance of a given matrix with covariance
scores for every column with every other column

e) Finally, we calculate the engine Values and engine vectors. The eigenvectors can be
sorted by the eigenvalues in descending order to provide a ranking of the
components or axes of the new subspace for A.

f) A total of m or less components must be selected to comprise the chosen subspace.

Ideally, we would select k eigenvectors, called principal components, that have the k
largest eigenvalues.

g) Once chosen, data can be projected into the subspace via matrix multiplication.
Where A is the original data that we wish to project, B^T is the transpose of the
chosen principal components and P is the projection of A.

Dept. Of C.S.E-C.B.I.T Page 9

This is called the covariance method for calculating the PCA,

from numpy import array

from numpy import mean
from numpy import cov
from numpy.linalg import eig
# define a matrix
A = array([[1, 2], [3, 4], [5, 6]])
print(A)
# calculate the mean of each column
M = mean(A.T, axis=1)
print(M)
# center columns by subtracting column means
C=A-M
print(C)
# calculate covariance matrix of centered matrix
V = cov(C.T)
print(V)
# eigendecomposition of covariance matrix
values, vectors = eig(V)
print(vectors)
print(values)
# project data
P = vectors.T.dot(C.T)
print(P.T)

Output:

2) Linear Discriminant Analysis (LDA)

 LDA is a dimensionality reduction technique which is commonly used for the supervised
classification problems.
 It is used for modeling differences in groups i.e. separating two or more classes.
 It is used to project the features in higher dimension space into a lower dimension space.

Dept. Of C.S.E-C.B.I.T Page 10

For example, we have two classes and we need to separate them efficiently. Classes can have
multiple features. Using only a single feature to classify them may result in some overlapping as
shown in the below figure. So, we will keep on increasing the number of features for proper
classification.

This LDA can be achieved in three steps :

The first step is to calculate the separability between different classes(i.e the distance between the

mean of different classes) also called as between-class variance

Second Step is to calculate the distance between the mean and sample of each class,which is called

the within class variance

The third step is to construct the lower dimensional space which maximizes the between class

variance and minimizes the within class variance. Let P be the lower dimensional space projection,

which is called Fisher’s criterion.

Dept. Of C.S.E-C.B.I.T Page 11

Classification:

Classification is a process of categorizing a given set of data into classes.

Given a pattern, the task of identifying the class to which the pattern belongs is called classification

In Classification, a program learns from the given dataset or observations and then classifies new
observation into a number of classes or groups. Such as, Yes or No, 0 or 1, Spam or Not Spam, cat
or dog, etc. Classes can be called as targets/labels or categories.

The main goal of the Classification algorithm is to identify the category of a given dataset, and these
algorithms are mainly used to predict the output for the categorical data.

It can be performed on both structured and unstructured data. The process starts with predicting
the class of given data points. The classes are often referred to as target, label or categories.

Generally, a set of patterns is given where the class label of each pattern is known. This is known as
the training data.

The algorithm which implements the classification on a dataset is known as a classifier. There are
two types of Classifications:

 Binary Classifier: If the classification problem has only two possible outcomes, then it is
called as Binary Classifier.
Examples: YES or NO, MALE or FEMALE, SPAM or NOT SPAM, CAT or DOG, etc.
 Multi-class Classifier: If a classification problem has more than two outcomes, then it is
called as Multi-class Classifier.
Example: Classifications of types of crops, Classification of types of music.

Classification Algorithms can be further divided into the Mainly two category:

Linear classifier

A linear classifier achieves by making a classification decision based on the value of a linear
combination of the characteristics.

Dept. Of C.S.E-C.B.I.T Page 12

An object's characteristics are also known as feature values and are typically presented to the
machine in a vector called a feature vector.

Non-linear classifier

Non-linear functions can be used to separate instances that are not linearly separable.

The various classifiers are:

 Bayesian
 Perceptron
 Nearest neighbor classifier
 Logistic regression
 Naïve-Bayes
 Decision trees and
 Random forests
 boosting and bagging

1) Bayesian Classifier:

A Bayesian classifier can be trained by determining the mean vector and the covariance matrices of
the discriminant functions for the abnormal and normal classes from the training data. Instead of
computing the maximum of the two discriminant functions gabnormal(x) and gnormal(x), the
decision was based on the ratio gabnorm(x)/normal(x).

A decision threshold T was set, such that if the ratio is larger than T the unknown pattern vector is
classified as abnormal, else as normal.

By changing T, the sensitivity/specificity trade-off of the Bayes classifier can be altered. A larger T
will result in lower TP and FP rates, while a smaller T will result in higher TP and FP rates.

The procedure described is illustrated in

Dept. Of C.S.E-C.B.I.T Page 13

2) Perceptron

The basic perceptron algorithm is a binary linear classifier for supervised learning. The idea behind
the binary linear classifier can be described as follows.

where x is the feature vector, θ is the weight vector, and θⁿ is the bias. The sign function is used to
distinguish x as either a positive (+1) or a negative (-1) label.

There is the decision boundary to separate the data with different labels, which occurs at

The decision boundary separates the hyperplane into two regions.

The data will be labeled as positive in the region that θ⋅ x + θⁿ > 0, and be labeled as negative in the
region that θ⋅ x + θⁿ < 0.

If all the instances in a given data are linearly separable, there exists a θ and a θⁿ such that y⁽ⁱ ⁾ (θ⋅
x⁽ⁱ ⁾ + θⁿ) > 0 for every i-th data point, where y⁽ⁱ ⁾ is the label.

Figure illustrates the aforementioned concepts with the 2-D case where the x = [x₀ x₁]ᵀ, θ = [θ₀ θ₁]
and θⁿ is a offset scalar.

Dept. Of C.S.E-C.B.I.T Page 14

3) Nearest neighbor classifier:

Among the various methods of supervised statistical pattern recognition, the Nearest Neighbour
rule achieves consistently high performance, without a priori assumptions about the distributions
from which the training examples are drawn.

It involves a training set of both positive and negative cases. A new sample is classified by
calculating the distance to the nearest training case; the sign of that point then determines the
classification of the sample.

The k-NN classifier extends this idea by taking the k nearest points and assigning the sign of the
majority. It is common to select k small and odd to break ties.

Larger k values help reduce the effects of noisy points within the training data set, and the choice of
k is often performed through cross-validation.

There are many techniques available for improving the performance and speed of a nearest
neighbour classification.

One approach to this problem is to pre-sort the training sets in some way.

Another solution is to choose a subset of the training data such that classification by the 1-NN rule
(using the subset) approximates the Bayes classifier.

This can result in significant speed improvements as k can now be limited to 1 and redundant data
points have been removed from the training set.

These data modification techniques can also improve the performance through removing points
that cause mis-classifications.

Example:
Consider the following data concerning credit default. Age and Loan are two numerical variables
(predictors) and Default is the target.

We can now use the training set to classify an unknown case (Age=48 and Loan=$142,000) using

Dept. Of C.S.E-C.B.I.T Page 15

Euclidean distance. If K=1 then the nearest neighbor is the last case in the training set with Default=Y.

D = Sqrt[(48-33)^2 + (142000-150000)^2] = 8000.01 >> Default=Y

With K=3, there are two Default=Y and one Default=N out of three closest neighbors. The prediction
for the unknown case is again Default=Y.

4) Logistic regression

 Logistic regression is one of the most popular Machine Learning algorithms, which comes
under the Supervised Learning technique. It is used for predicting the categorical dependent
variable using a given set of independent variables.
 Logistic regression predicts the output of a categorical dependent variable. Therefore the
outcome must be a categorical or discrete value. It can be either Yes or No, 0 or 1, true or
False, etc. but instead of giving the exact value as 0 and 1, it gives the probabilistic values
which lie between 0 and 1.
 Logistic Regression is much similar to the Linear Regression except that how they are used.
Linear Regression is used for solving Regression problems, whereas Logistic regression is
used for solving the classification problems.
 In Logistic regression, instead of fitting a regression line, we fit an "S" shaped logistic
function, which predicts two maximum values (0 or 1).
 The curve from the logistic function indicates the likelihood of something such as whether
the cells are cancerous or not, a mouse is obese or not based on its weight, etc.
 Logistic Regression is a significant machine learning algorithm because it has the ability to
provide probabilities and classify new data using continuous and discrete datasets.
 Logistic Regression can be used to classify the observations using different types of data and
can easily determine the most effective variables used for the classification. The below
image is showing the logistic function:

Dept. Of C.S.E-C.B.I.T Page 16

Logistic Function (Sigmoid Function):

 The sigmoid function is a mathematical function used to map the predicted values to
probabilities.
 It maps any real value into another value within a range of 0 and 1.
 The value of the logistic regression must be between 0 and 1, which cannot go beyond this
limit, so it forms a curve like the "S" form. The S-form curve is called the Sigmoid function or
the logistic function.
 In logistic regression, we use the concept of the threshold value, which defines the
probability of either 0 or 1. Such as values above the threshold value tends to 1, and a value
below the threshold values tends to 0.
 This function can be represented as:

Type of Logistic Regression:

On the basis of the categories, Logistic Regression can be classified into three types:

 Binomial: In binomial Logistic regression, there can be only two possible types of the
dependent variables, such as 0 or 1, Pass or Fail, etc.
 Multinomial: In multinomial Logistic regression, there can be 3 or more possible
unordered types of the dependent variable, such as "cat", "dogs", or "sheep"
 Ordinal: In ordinal Logistic regression, there can be 3 or more possible ordered types of
dependent variables, such as "low", "Medium", or "High".

5) Naïve-Bayes

 Naïve Bayes algorithm is a supervised learning algorithm, which is based on Bayes theorem
and used for solving classification problems.
 It is mainly used in text classification that includes a high-dimensional training dataset.
 Naïve Bayes Classifier is one of the simple and most effective Classification algorithms which
helps in building the fast machine learning models that can make quick predictions.

Dept. Of C.S.E-C.B.I.T Page 17

 It is a probabilistic classifier, which means it predicts on the basis of the probability of an
object.
 Some popular examples of Naïve Bayes Algorithm are spam filtration, Sentimental analysis,
and classifying articles.

The Naïve Bayes algorithm is comprised of two words Naïve and Bayes, Which can be described as:

 Naïve: It is called Naïve because it assumes that the occurrence of a certain feature is
independent of the occurrence of other features. Such as if the fruit is identified on the
bases of color, shape, and taste, then red, spherical, and sweet fruit is recognized as an
apple. Hence each feature individually contributes to identify that it is an apple without
depending on each other.
 Bayes: It is called Bayes because it depends on the principle of Bayes' Theorem.

Bayes' Theorem:

 Bayes' theorem is also known as Bayes' Rule or Bayes' law, which is used to determine the
probability of a hypothesis with prior knowledge. It depends on the conditional probability.
 The formula for Bayes' theorem is given as:

Where,

P(A|B) is Posterior probability: Probability of hypothesis A on the observed event B.

P(B|A) is Likelihood probability: Probability of the evidence given that the probability of a
hypothesis is true.

P(A) is Prior Probability: Probability of hypothesis before observing the evidence.

P(B) is Marginal Probability: Probability of Evidence.

Working of Naïve Bayes' Classifier:

Working of Naïve Bayes' Classifier can be understood with the help of the below example:

Suppose we have a dataset of weather conditions and corresponding target variable "Play". So
using this dataset we need to decide that whether we should play or not on a particular day
according to the weather conditions. So to solve this problem, we need to follow the below steps:

1. Convert the given dataset into frequency tables.

2. Generate Likelihood table by finding the probabilities of given features.
3. Now, use Bayes theorem to calculate the posterior probability.

Dept. Of C.S.E-C.B.I.T Page 18

Problem: If the weather is sunny, then the Player should play or not?

Solution: To solve this, first consider the below dataset:

Outlook Play

0 Rainy Yes

1 Sunny Yes

2 Overcast Yes

3 Overcast Yes

4 Sunny No

5 Rainy Yes

6 Sunny Yes

7 Overcast Yes

8 Rainy No

9 Sunny No

10 Sunny Yes

11 Rainy No

12 Overcast Yes

13 Overcast Yes

Dept. Of C.S.E-C.B.I.T Page 19

Frequency table for the Weather Conditions:

Weather Yes No

Overcast 5 0

Rainy 2 2

Sunny 3 2

Total 10 5

Likelihood table weather condition:

Weather No Yes

Overcast 0 5 5/14= 0.35

Rainy 2 2 4/14=0.29

Sunny 2 3 5/14=0.35

All 4/14=0.29 10/14=0.71

Applying Bayes'theorem:

P(Yes|Sunny)= P(Sunny|Yes)*P(Yes)/P(Sunny)

P(Sunny|Yes)= 3/10= 0.3

P(Sunny)= 0.35

P(Yes)=0.71

So P(Yes|Sunny) = 0.3*0.71/0.35= 0.60

P(No|Sunny)= P(Sunny|No)*P(No)/P(Sunny)

Dept. Of C.S.E-C.B.I.T Page 20

P(Sunny|NO)= 2/4=0.5

P(No)= 0.29

P(Sunny)= 0.35

So P(No|Sunny)= 0.5*0.29/0.35 = 0.41

So as we can see from the above calculation that P(Yes|Sunny)>P(No|Sunny)

Hence on a Sunny day, Player can play the game.

6) Decision trees

 Decision Tree is a Supervised learning technique that can be used for both classification and
Regression problems, but mostly it is preferred for solving Classification problems. It is a
tree-structured classifier, where internal nodes represent the features of a dataset,
branches represent the decision rules and each leaf node represents the outcome.
 In a Decision tree, there are two nodes, which are the Decision Node and Leaf Node.
Decision nodes are used to make any decision and have multiple branches, whereas Leaf
nodes are the output of those decisions and do not contain any further branches.
 The decisions or the test are performed on the basis of features of the given dataset.
 It is a graphical representation for getting all the possible solutions to a problem/decision
based on given conditions.
 It is called a decision tree because, similar to a tree, it starts with the root node, which
expands on further branches and constructs a tree-like structure.
 In order to build a tree, we use the CART algorithm, which stands for Classification and
Regression Tree algorithm.
 A decision tree simply asks a question, and based on the answer (Yes/No), it further split the
tree into subtrees.
 Below diagram explains the general structure of a decision tree:

Dept. Of C.S.E-C.B.I.T Page 21

Note: A decision tree can contain categorical data (YES/NO) as well as numeric data.

How does the Decision Tree algorithm Work?

In a decision tree, for predicting the class of the given dataset, the algorithm starts from the root
node of the tree. This algorithm compares the values of root attribute with the record (real dataset)
attribute and, based on the comparison, follows the branch and jumps to the next node.

For the next node, the algorithm again compares the attribute value with the other sub-nodes and
move further. It continues the process until it reaches the leaf node of the tree. The complete
process can be better understood using the below algorithm:

 Step-1: Begin the tree with the root node, says S, which contains the complete dataset.
 Step-2: Find the best attribute in the dataset.
 Step-3: Divide the S into subsets that contains possible values for the best attributes.
 Step-4: Generate the decision tree node, which contains the best attribute.
 Step-5: Recursively make new decision trees using the subsets of the dataset created in step
-3. Continue this process until a stage is reached where you cannot further classify the
nodes and called the final node as a leaf node.

Example: Suppose there is a candidate who has a job offer and wants to decide whether he should
accept the offer or Not. So, to solve this problem, the decision tree starts with the root node . The
root node splits further into the next decision node (distance from the office) and one leaf node
based on the corresponding labels. The next decision node further gets split into one decision node
(Cab facility) and one leaf node. Finally, the decision node splits into two leaf nodes (Accepted
offers and Declined offer). Consider the below diagram:

Dept. Of C.S.E-C.B.I.T Page 22

7) Random forests

Random Forest is a popular machine learning algorithm that belongs to the supervised learning
technique. It can be used for both Classification and Regression problems in ML. It is based on the
concept of ensemble learning, which is a process of combining multiple classifiers to solve a
complex problem and to improve the performance of the model.

As the name suggests, "Random Forest is a classifier that contains a number of decision trees on
various subsets of the given dataset and takes the average to improve the predictive accuracy of
that dataset." Instead of relying on one decision tree, the random forest takes the prediction from
each tree and based on the majority votes of predictions, and it predicts the final output.

The greater number of trees in the forest leads to higher accuracy and prevents the problem of
overfitting.

The below diagram explains the working of the Random Forest algorithm:

Dept. Of C.S.E-C.B.I.T Page 23

Working of Random Forest algorithm

Random Forest works in two-phase first is to create the random forest by combining N decision
tree, and second is to make predictions for each tree created in the first phase.

The Working process can be explained in the below steps and diagram:

Step-1: Select random K data points from the training set.

Step-2: Build the decision trees associated with the selected data points (Subsets).

Step-3: Choose the number N for decision trees that you want to build.

Step-4: Repeat Step 1 & 2.

Step-5: For new data points, find the predictions of each decision tree, and assign the new data
points to the category that wins the majority votes.

The working of the algorithm can be better understood by the below example:

Example: Suppose there is a dataset that contains multiple fruit images. So, this dataset is given to
the Random forest classifier. The dataset is divided into subsets and given to each decision tree.
During the training phase, each decision tree produces a prediction result, and when a new data
point occurs, then based on the majority of results, the Random Forest classifier predicts the final
decision. Consider the below image:

Applications of Random Forest

There are mainly four sectors where Random forest mostly used:

Dept. Of C.S.E-C.B.I.T Page 24

1. Banking: Banking sector mostly uses this algorithm for the identification of loan risk.
2. Medicine: With the help of this algorithm, disease trends and risks of the disease can be
identified.
3. Land Use: We can identify the areas of similar land use by this algorithm.
4. Marketing: Marketing trends can be identified using this algorithm.

8) Boosting and bagging:

The important techniques of ensemble models: bagging and boosting.

Boosting

Boosting algorithms is the family of algorithms that combine weak learners into a strong learner.

Main Steps involved in boosting are :

 Train model A on the whole set

 Train the model B with exaggerated data on the regions in which A performs poorly

Instead of training the models in parallel, we can train them sequentially. This is the main idea of
Boosting.

The idea behind boosting algorithms is to learn weak classifiers that are only slightly correlated with
the true classification, they combine them into a strong classifier that is well-correlated with the
true classification.

Boosting algorithm iteratively learns weak classifiers and add them to a final strong classifier. The
added weak classifiers are usually weighted according to their accuracy. After each iteration, the
training data is reweighted in a way that misclassified instances gain weight and correctly classified
instances lose weight. On the next iteration, the future weak learner concentrates mostly on
wrongly classified instances. The boosting algorithms mostly differ in reweighting approach applied
to the training set.

Dept. Of C.S.E-C.B.I.T Page 25

Common Boosting algorithms:

 AdaBoost
 GBM
 XGBM
 Light GBM
 CatBoost

Bagging( Bootstrap Aggregating)

As we discussed before bagging is an ensemble technique mainly used to reduce the variance of our
predictions by combining the result of multiple classifiers modelled on different sub-samples of the
same data set.

Working of bagging

Main Steps involved in bagging are :

 Creating multiple datasets: Sampling is done with a replacement on the original data set
and new datasets are formed from the original dataset.
 Building multiple classifiers: On each of these smaller datasets, a classifier is built, usually,
the same classifier is built on all the datasets.
 Combining Classifiers: The predictions of all the individual classifiers are now combined to
give a better classifier, usually with very less variance compared to before.

Bagging is similar to Divide and conquer. It is a group of predictive models run on multiple subsets
from the original dataset combined together to achieve better accuracy and model stability.

Dept. Of C.S.E-C.B.I.T Page 26

Clustering
Clustering is the task of dividing the population or data points into a number of groups such that
data points in the same groups are more similar to other data points in the same group and
dissimilar to the data points in other groups.

It is basically a collection of objects on the basis of similarity and dissimilarity between them.

Hierarchical Clustering

Hierarchical clustering is another unsupervised machine learning algorithm, which is used to group
the unlabeled datasets into a cluster and also known as hierarchical cluster analysis or HCA.

In this algorithm, we develop the hierarchy of clusters in the form of a tree, and this tree-shaped
structure is known as the dendrogram.

The hierarchical clustering technique has two approaches:

1. Agglomerative: Agglomerative is a bottom-up approach, in which the algorithm starts with

taking all data points as single clusters and merging them until one cluster is left.
2. Divisive: Divisive algorithm is the reverse of the agglomerative algorithm as it is a top-down
approach.

Agglomerative Hierarchical clustering :

The agglomerative hierarchical clustering algorithm is a popular example of HCA. To group the
datasets into clusters, it follows the bottom-up approach. It means, this algorithm considers each
dataset as a single cluster at the beginning, and then start combining the closest pair of clusters
together. It does this until all the clusters are merged into a single cluster that contains all the
datasets.

This hierarchy of clusters is represented in the form of the dendrogram.

The working of the AHC algorithm can be explained using the below steps:

Dept. Of C.S.E-C.B.I.T Page 27

 Step-1: Create each data point as a single cluster. Let's say there are N data points, so the
number of clusters will also be N.

 Step-2: Take two closest data points or clusters and merge them to form one cluster. So,
there will now be N-1 clusters.

 Step-3: Again, take the two closest clusters and merge them together to form one cluster.
There will be N-2 clusters.

 Step-4: Repeat Step 3 until only one cluster left. So, we will get the following clusters.
Consider the below images:

 Step-5: Once all the clusters are combined into one big cluster, develop the dendrogram to
divide the clusters as per the problem.

Dept. Of C.S.E-C.B.I.T Page 28

Divisive clustering:

Also known as top-down approach. This algorithm also does not require to prespecify the number
of clusters. Top-down clustering requires a method for splitting a cluster that contains the whole
data and proceeds by splitting clusters recursively until individual data have been splitted into
singleton cluster.

Partitioning Clustering:

It is a type of clustering that divides the data into non-hierarchical groups. It is also known as the
centroid-based method. The most common example of partitioning clustering is the K-Means
Clustering algorithm.

K-Means Clustering algorithm:

In this type, the dataset is divided into a set of k groups, where K is used to define the number of
pre-defined groups. The cluster center is created in such a way that the distance between the data
points of one cluster is minimum as compared to another cluster centroid.

The following steps need to be perform for K-mean Clustering:

Step 1: Choose the number of clusters k

Step 2: Select k random points from the data as centroids

Step 3: Assign all the points to the closest cluster centroid

Step 4: Recompute the centroids of newly formed clusters

Step 5: Repeat steps 3 and 4

Stopping Criteria for K-Means Clustering

There are essentially three stopping criteria that can be adopted to stop the K-means algorithm:

1. Centroids of newly formed clusters do not change

2. Points remain in the same cluster
3. Maximum number of iterations are reached

Dept. Of C.S.E-C.B.I.T Page 29

Regression:
Regression analysis is a statistical method to model the relationship between a dependent (target)
and independent (predictor) variables with one or more independent variables. More specifically,
Regression analysis helps us to understand how the value of the dependent variable is changing
corresponding to an independent variable when other independent variables are held fixed. It
predicts continuous/real values such as temperature, age, salary, price, etc.

We can understand the concept of regression analysis using the below example:

Example: Suppose there is a marketing company A, who does various advertisement every year and
get sales on that. The below list shows the advertisement made by the company in the last 5 years
and the corresponding sales:

Now, the company wants to do the advertisement of $200 in the year 2019 and wants to know the
prediction about the sales for this year. So to solve such type of prediction problems in machine
learning, we need regression analysis.

Types of Regression:

There are various types of regressions which are used in data science and machine learning.

Dept. Of C.S.E-C.B.I.T Page 30

Linear Regression:
o Linear regression is a statistical regression method which is used for predictive analysis.
o Linear regression shows the linear relationship between the independent variable (X-axis)
and the dependent variable (Y-axis), hence called linear regression.

o Below is the mathematical equation for Linear regression:

Y= aX+b

Here, Y= dependent variables (target variables),

X= Independent variables (predictor variables),
a and b are the linear coefficients

Logistic Regression:
o Logistic regression algorithm works with the categorical variable such as 0 or 1, Yes or No,
True or False, Spam or not spam, etc.

o Logistic regression uses sigmoid function or logistic function which is a complex cost
function. This sigmoid function is used to model the data in logistic regression. The function
can be represented as:

o f(x)= Output between the 0 and 1 value.

o x= input to the function
o e= base of natural logarithm.

Dept. Of C.S.E-C.B.I.T Page 31

Cost Function
It's a function that determines how well a Machine Learning model performs for a given set of data.
The Cost Function calculates the difference between anticipated and expected values and shows it
as a single real number. Cost Functions may be created in a variety of methods depending on the
situation.

To estimate how poorly models perform, cost functions are employed. Simply put, a cost function is
a measure of how inaccurate the model is in estimating the connection between X and y. This is
usually stated as a difference or separation between the expected and actual values.

The term ‘loss' in machine learning refers to the difference between the anticipated and actual
value. The "Loss Function" is a function that is used to quantify this loss in the form of a single real
number during the training phase. These are utilised in algorithms that apply optimization
approaches in supervised learning.

Types of the cost function

There are many cost functions in machine learning and each has its use cases depending on
whether it is a regression problem or classification problem.

1. Regression cost Function

2. Binary Classification cost Functions
3. Multi-class Classification cost Functions

1. Regression cost Function:

Regression models deal with predicting a continuous value for example salary of an employee, price
of a car, loan prediction, etc. A cost function used in the regression problem is called “Regression
Cost Function”.

2. Multi-class Classification cost Functions

This cost function is used in the classification problems where there are multiple classes and input
data belongs to only one class. Let us now understand how cross-entropy is calculated.

Dept. Of C.S.E-C.B.I.T Page 32

K-Nearest Neighbor (KNN) :

 K-Nearest Neighbour is one of the simplest Machine Learning algorithms based on

Supervised Learning technique.
 K-NN algorithm assumes the similarity between the new case/data and available cases and
put the new case into the category that is most similar to the available categories.
 K-NN algorithm stores all the available data and classifies a new data point based on the
similarity. This means when new data appears then it can be easily classified into a well
suite category by using K- NN algorithm.

 K-NN algorithm can be used for Regression as well as for Classification but mostly it is used
for the Classification problems.
 K-NN is a non-parametric algorithm, which means it does not make any assumption on
underlying data.
 It is also called a lazy learner algorithm because it does not learn from the training set
immediately instead it stores the dataset and at the time of classification, it performs an
action on the dataset.
 KNN algorithm at the training phase just stores the dataset and when it gets new data, then
it classifies that data into a category that is much similar to the new data.

Example: Suppose, we have an image of a creature that looks similar to cat and dog, but we want to
know either it is a cat or dog. So for this identification, we can use the KNN algorithm, as it works on
a similarity measure. Our KNN model will find the similar features of the new data set to the cats
and dogs images and based on the most similar features it will put it in either cat or dog category.

Why do we need a K-NN Algorithm?

Dept. Of C.S.E-C.B.I.T Page 33

Suppose there are two categories, i.e., Category A and Category B, and we have a new data point
x1, so this data point will lie in which of these categories. To solve this type of problem, we need a
K-NN algorithm. With the help of K-NN, we can easily identify the category or class of a particular
dataset. Consider the below diagram:

How does K-NN work?

The K-NN working can be explained on the basis of the below algorithm:

 Step-1: Select the number K of the neighbors

 Step-2: Calculate the Euclidean distance of K number of neighbors
 Step-3: Take the K nearest neighbors as per the calculated Euclidean distance.
 Step-4: Among these k neighbors, count the number of the data points in each category.
 Step-5: Assign the new data points to that category for which the number of the neighbor is
maximum.
 Step-6: Our model is ready.

How to select the value of K in the K-NN Algorithm:

Below are some points to remember while selecting the value of K in the K-NN algorithm:

 There is no particular way to determine the best value for "K", so we need to try some
values to find the best out of them. The most preferred value for K is 5.
 A very low value for K such as K=1 or K=2, can be noisy and lead to the effects of outliers in
the model.
 Large values for K are good, but it may find some difficulties.

Dept. Of C.S.E-C.B.I.T Page 34

Training and testing a classifier:
In machine learning, a common task is the study and construction of algorithms that can learn from
and make predictions on data.

Such algorithms function by making data-driven predictions or decisions, through building a

mathematical model from input data. These input data used to build the model are usually divided
in multiple data sets.

In particular, three data sets are commonly used in different stages of the creation of the model:
training, validation and test sets.

The model is initially fit on a training data set, which is a set of examples used to fit the parameters
(e.g. weights of connections between neurons in artificial neural networks) of the model.

The model (e.g. a naive Bayes classifier) is trained on the training data set using a supervised
learning method, for example using optimization methods such as gradient descent or stochastic
gradient descent.

In practice, the training data set often consists of pairs of an input vector (or scalar) and the
corresponding output vector (or scalar), where the answer key is commonly denoted as the target
(or label).

The current model is run with the training data set and produces a result, which is then compared
with the target, for each input vector in the training data set.

Based on the result of the comparison and the specific learning algorithm being used, the
parameters of the model are adjusted. The model fitting can include both variable selection and
parameter estimation.

Successively, the fitted model is used to predict the responses for the observations in a second data
set called the validation data set.

The validation data set provides an unbiased evaluation of a model fit on the training data set while
tuning the model's hyper parameters (e.g. the number of hidden units—layers and layer widths—in
a neural network).

Validation datasets can be used for regularization by early stopping (stopping training when the
error on the validation data set increases, as this is a sign of over-fitting to the training data set).

This simple procedure is complicated in practice by the fact that the validation dataset's error may
fluctuate during training, producing multiple local minima. This complication has led to the creation
of many ad-hoc rules for deciding when over-fitting has truly begun.

Dept. Of C.S.E-C.B.I.T Page 35

Finally, the test data set is a data set used to provide an unbiased evaluation of a final model fit on
the training data set. If the data in the test data set has never been used in training (for example in
cross-validation), the test data set is also called a holdout data set.

The term "validation set" is sometimes used instead of "test set" in some literature (e.g., if the
original data set was partitioned into only two subsets, the test set might be referred to as the
validation set).

Deciding the sizes and strategies for data set division in training, test and validation sets is very
dependent on the problem and data available

Cross-validation
The goal of cross-validation is to test the model's ability to predict new data that was not used in
estimating it.

The three steps involved in cross-validation are as follows:

1. Reserve some portion of sample data-set.

2. Using the rest data-set train the model.
3. Test the model using the reserve portion of the data-set.

Methods of Cross Validation:

1) Validation:
In this method, we perform training on the 50% of the given data-set and rest 50% is used for the
testing purpose. The major drawback of this method is that we perform training on the 50% of
the dataset, it may possible that the remaining 50% of the data contains some important
information which we are leaving while training our model i.e higher bias.

2) LOOCV (Leave One Out Cross Validation)

In this method, we perform training on the whole data-set but leaves only one data-point of the
available data-set and then iterates for each data-point. It has some advantages as well as
disadvantages also.
An advantage of using this method is that we make use of all data points and hence it is low bias.

3) K-Fold Cross Validation

In this method, we split the data-set into k number of subsets(known as folds) then we perform
training on the all the subsets but leave one(k-1) subset for the evaluation of the trained model.
In this method, we iterate k times with a different subset reserved for testing purpose each time.

Dept. Of C.S.E-C.B.I.T Page 36

Class-imbalance:
If you have spent some time in machine learning and data science, you would have definitely come
across imbalanced class distribution.

This is a scenario where the number of observations belonging to one class is significantly lower
than those belonging to the other classes.

This problem is predominant in scenarios where anomaly detection is crucial like fraudulent
transactions in banks, identification of rare diseases, etc.

In this situation, the predictive model developed using conventional machine learning algorithms
could be biased and inaccurate.

This happens because Machine Learning Algorithms are usually designed to improve accuracy by
reducing the error. Thus, they do not take into account the class distribution / proportion or
balance of classes.

Handling Imbalanced Data :

Dealing with imbalanced datasets strategies improve classification algorithms or balancing classes in
the training data before providing the data as input to the machine learning algorithm. The later
technique is preferred as it has wider application.

The main objective of balancing classes is to either increasing the frequency of the minority class or
decreasing the frequency of the majority class.

This is done in order to obtain approximately the same number of instances for both the classes.
Let us look at a few resampling techniques:

1) Random Under-Sampling
Random Undersampling aims to balance class distribution by randomly eliminating majority class
examples. This is done until the majority and minority class instances are balanced out.

Total Observations = 1000

Fraudulent Observations =20

Non Fraudulent Observations = 980

Event Rate= 2 %

Dept. Of C.S.E-C.B.I.T Page 37

In this case we are taking 10 % samples without replacement from Non Fraud instances. And
combining them with Fraud instances.

Non Fraudulent Observations after random under sampling = 10 % of 980 =98

Total Observations after combining them with Fraudulent observations = 20+98=118

Event Rate for the new dataset after under sampling = 20/118 = 17%

2) Random Over-Sampling

Over-Sampling increases the number of instances in the minority class by randomly replicating
them in order to present a higher representation of the minority class in the sample.

Total Observations = 1000

Fraudulent Observations =20

Non Fraudulent Observations = 980

Event Rate= 2 %

In this case we are replicating 20 fraud observations 20 times.

Non Fraudulent Observations =980

Fraudulent Observations after replicating the minority class observations= 400

Total Observations in the new data set after oversampling=1380

Event Rate for the new data set after under sampling= 400/1380 = 29 %

3) Informed Over Sampling

This technique is followed to avoid overfitting which occurs when exact replicas of minority
instances are added to the main dataset. A subset of data is taken from the minority class as an
example and then new synthetic similar instances are created. These synthetic instances are then
added to the original dataset. The new dataset is used as a sample to train the classification models.

Total Observations = 1000

Fraudulent Observations = 20

Non Fraudulent Observations = 980

Dept. Of C.S.E-C.B.I.T Page 38

Event Rate = 2 %

A sample of 15 instances is taken from the minority class and similar synthetic instances are
generated 20 times

Post generation of synthetic instances, the following data set is created

Minority Class (Fraudulent Observations) = 300

Majority Class (Non-Fraudulent Observations) = 980

Event rate= 300/1280 = 23.4 %

4) Algorithmic Ensemble Techniques

An alternate approach i.e. Modifying existing classification algorithms to make them appropriate
for imbalanced data sets.

The main objective of ensemble methodology is to improve the performance of single classifiers.

The approach involves constructing several two stage classifiers from the original data and then
aggregate their predictions.

5) Bagging Based techniques

Bagging is an abbreviation of Bootstrap Aggregating. The conventional bagging algorithm involves

generating ‘n’ different bootstrap training samples with replacement. And training the algorithm on
each bootstrapped algorithm separately and then aggregating the predictions at the end.

Bagging is used for reducing Overfitting in order to create strong learners for generating accurate
predictions. Unlike boosting, bagging allows replacement in the bootstrapped sample.

Dept. Of C.S.E-C.B.I.T Page 39

Total Observations = 1000

Fraudulent Observations =20

Non Fraudulent Observations = 980

Event Rate= 2 %

There are 10 bootstrapped samples chosen from the population with replacement. Each sample
contains 200 observations. And each sample is different from the original dataset but resembles the
dataset in distribution & variability.

The machine learning algorithms like logistic regression, neural networks, decision tree are fitted to
each bootstrapped sample of 200 observations. And the Classifiers c1, c2…c10 are aggregated to
produce a compound classifier. This ensemble methodology produces a stronger compound
classifier since it combines the results of individual classifiers to come up with an improved one.

Confusion matrix:
A confusion matrix is a technique for summarizing the performance of a classification algorithm.

Classification accuracy alone can be misleading if you have an unequal number of observations in
each class or if you have more than two classes in your dataset.

Calculating a confusion matrix can give you a better idea of what your classification model is getting
right and what types of errors it is making.

A confusion matrix is a summary of prediction results on a classification problem.

The number of correct and incorrect predictions are summarized with count values and broken
down by each class. This is the key to the confusion matrix.

The confusion matrix shows the ways in which your classification model
is confused when it makes predictions.

It gives you insight not only into the errors being made by your classifier but more importantly the
types of errors that are being made.

Below is the process for calculating a confusion Matrix.

1. You need a test dataset or a validation dataset with expected outcome values.

Dept. Of C.S.E-C.B.I.T Page 40

2. Make a prediction for each row in your test dataset.
3. From the expected outcomes and predictions count:
1. The number of correct predictions for each class.
2. The number of incorrect predictions for each class, organized by the class that was
predicted.

These numbers are then organized into a table, or a matrix as follows:

 Expected down the side: Each row of the matrix corresponds to a predicted class.
 Predicted across the top: Each column of the matrix corresponds to an actual class.

True Positive (TP)

 The predicted value matches the actual value

 The actual value was positive and the model predicted a positive value

True Negative (TN)

 The predicted value matches the actual value

 The actual value was negative and the model predicted a negative value

False Positive (FP)

 The predicted value was falsely predicted

 The actual value was negative but the model predicted a positive value

False Negative (FN)

 The predicted value was falsely predicted

 The actual value was positive but the model predicted a negative value

Dept. Of C.S.E-C.B.I.T Page 41

The counts of correct and incorrect classification are then filled into the table.

The total number of correct predictions for a class go into the expected row for that class value and
the predicted column for that class value.

In the same way, the total number of incorrect predictions for a class go into the expected row for
that class value and the predicted column for that class value.

Evaluation metrics
Evaluation metrics are used to measure the quality of the statistical or machine learning model.
Evaluating machine learning models or algorithms is essential for any project.

There are many different types of evaluation metrics available to test a model. These include
classification accuracy, logarithmic loss, confusion matrix, and others.

Classification accuracy is the ratio of the number of correct predictions to the total number of input
samples, which is usually what we refer to when we use the term accuracy.

In a classification task, the precision for a class is the number of true positives divided by the total
number of elements labeled as belonging to the positive class.

Dept. Of C.S.E-C.B.I.T Page 42

Recall is defined as the number of true positives divided by the total number of elements that
actually belong to the positive class.

Logarithmic loss, also called log loss, works by penalizing the false classifications. Log loss is one of
the most popular measurements of error in applied machine learning.

Errors and learning initiative failures play an essential role in the machine learning process, as
discovering them and minimizing them ultimately maximizes the process’s accuracy.

Evaluation metrics involves using a combination of these individual evaluation metrics to test a
model or algorithm.

Dept. Of C.S.E-C.B.I.T Page 43

Pattern Recognition - Organizer - 2023
100% (2)
Pattern Recognition - Organizer - 2023
112 pages
Solidworks (Basics & Advanced) PDF
100% (9)
Solidworks (Basics & Advanced) PDF
272 pages
Pattern Recognition
No ratings yet
Pattern Recognition
11 pages
Chapter 4 - Introduction To Pattern Recognition &
No ratings yet
Chapter 4 - Introduction To Pattern Recognition &
71 pages
dfb24b10-9590-4cbb-a3e5-61243da91187
No ratings yet
dfb24b10-9590-4cbb-a3e5-61243da91187
44 pages
Basic Pattern Recognition Concept
No ratings yet
Basic Pattern Recognition Concept
5 pages
PP&DS 4
No ratings yet
PP&DS 4
82 pages
Ai - Foundations of Machine Learning III
No ratings yet
Ai - Foundations of Machine Learning III
98 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
274 pages
Chapter 2
No ratings yet
Chapter 2
70 pages
DSP Unit - III
No ratings yet
DSP Unit - III
49 pages
Pattern and Classification
No ratings yet
Pattern and Classification
20 pages
1 Introduction
No ratings yet
1 Introduction
81 pages
Saudi Aramco Presentation
60% (5)
Saudi Aramco Presentation
20 pages
CHP 4
No ratings yet
CHP 4
72 pages
Book Magazine
No ratings yet
Book Magazine
4 pages
Pattern Recognition: Dr. Farah Qais Al-Khalidi
100% (1)
Pattern Recognition: Dr. Farah Qais Al-Khalidi
49 pages
Pattern Recognition
No ratings yet
Pattern Recognition
5 pages
Pattern Recognition
No ratings yet
Pattern Recognition
3 pages
Pattern Recognition
No ratings yet
Pattern Recognition
11 pages
ml module 6
No ratings yet
ml module 6
29 pages
Summary Chap 1 & 2
No ratings yet
Summary Chap 1 & 2
5 pages
UNIT-V Notes
No ratings yet
UNIT-V Notes
24 pages
Pattern Classification
100% (1)
Pattern Classification
42 pages
Pattern Recognition
No ratings yet
Pattern Recognition
66 pages
Chapter6 - Unit IV2024
No ratings yet
Chapter6 - Unit IV2024
84 pages
j077 2011 KulHar WileyTutorial
No ratings yet
j077 2011 KulHar WileyTutorial
14 pages
Int and DF
No ratings yet
Int and DF
73 pages
Different Paradigms of Pattern Recognition
No ratings yet
Different Paradigms of Pattern Recognition
8 pages
Feature Dimensionality Reduction: A Review: Survey and State of The Art
No ratings yet
Feature Dimensionality Reduction: A Review: Survey and State of The Art
31 pages
PP&DS Unit Iv
No ratings yet
PP&DS Unit Iv
49 pages
UNIT04
No ratings yet
UNIT04
35 pages
PR Unit 1 2
No ratings yet
PR Unit 1 2
40 pages
ML-Unit IV-1
No ratings yet
ML-Unit IV-1
28 pages
ActivInspire Handbook
No ratings yet
ActivInspire Handbook
19 pages
Spoken Dialog Systems and Voice XML
No ratings yet
Spoken Dialog Systems and Voice XML
94 pages
Pattern Recognition: Lasse Holmstr Om and Petri Koistinen
No ratings yet
Pattern Recognition: Lasse Holmstr Om and Petri Koistinen
10 pages
Introduction To Pattern Recognition
No ratings yet
Introduction To Pattern Recognition
46 pages
ML Unit2 Classppt
No ratings yet
ML Unit2 Classppt
44 pages
PR Unit 1 ....
No ratings yet
PR Unit 1 ....
34 pages
Pattern - Recognigation - Lab 3 Sept 23 - Practical File
No ratings yet
Pattern - Recognigation - Lab 3 Sept 23 - Practical File
19 pages
Pattern Recognition 14
No ratings yet
Pattern Recognition 14
46 pages
Data
No ratings yet
Data
36 pages
Unit 1 Image Proc
No ratings yet
Unit 1 Image Proc
37 pages
Feature Selection Based On Class-Dependent Densities For High-Dimensional Binary Data
No ratings yet
Feature Selection Based On Class-Dependent Densities For High-Dimensional Binary Data
13 pages
PR Some Solutions
No ratings yet
PR Some Solutions
26 pages
Pattern Recognition
No ratings yet
Pattern Recognition
52 pages
Portinale Saitta 2002a
No ratings yet
Portinale Saitta 2002a
22 pages
Pattern Recognition
No ratings yet
Pattern Recognition
45 pages
Curse of Dimensionality and Its Reduction
No ratings yet
Curse of Dimensionality and Its Reduction
5 pages
Ids Unit 4 Case Study 1 Checking Patterns in Data
No ratings yet
Ids Unit 4 Case Study 1 Checking Patterns in Data
5 pages
Lecture4-Dimensionality Reduction Methods
No ratings yet
Lecture4-Dimensionality Reduction Methods
40 pages
Unit-I, Part-2 Feature Engineering
No ratings yet
Unit-I, Part-2 Feature Engineering
21 pages
PR Assignment 01 - Seemal Ajaz (206979)
No ratings yet
PR Assignment 01 - Seemal Ajaz (206979)
7 pages
Data Science Cheat Sheet
No ratings yet
Data Science Cheat Sheet
10 pages
What Is Data Science? Probability Overview Descriptive Statistics
No ratings yet
What Is Data Science? Probability Overview Descriptive Statistics
10 pages
Image Representation
No ratings yet
Image Representation
26 pages
Key Creator Getting Started Guide V6 English
No ratings yet
Key Creator Getting Started Guide V6 English
150 pages
CS434a/541a: Pattern Recognition Prof. Olga Veksler
No ratings yet
CS434a/541a: Pattern Recognition Prof. Olga Veksler
42 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
41 pages
91afc6c8555c8f6a4633be5f89e73f62
No ratings yet
91afc6c8555c8f6a4633be5f89e73f62
8 pages
2.06 SIPOC Diagram
No ratings yet
2.06 SIPOC Diagram
4 pages
Representation of Patterns and Classes Lesson 3
No ratings yet
Representation of Patterns and Classes Lesson 3
8 pages
AI UNIT 2
No ratings yet
AI UNIT 2
104 pages
Towards A Methodology For Evaluating The Execution Semantics of IEC 61499 Runtime Environments
No ratings yet
Towards A Methodology For Evaluating The Execution Semantics of IEC 61499 Runtime Environments
7 pages
2.16 UG BCA NEP Syllabus 3 4th Sem 2022 23 Onwards 17 11 22
No ratings yet
2.16 UG BCA NEP Syllabus 3 4th Sem 2022 23 Onwards 17 11 22
28 pages
PHPExcel Developer Documentation PDF
No ratings yet
PHPExcel Developer Documentation PDF
52 pages
Samsung t100 Secret Codes
No ratings yet
Samsung t100 Secret Codes
4 pages
Closure CFL
No ratings yet
Closure CFL
5 pages
Python Unit 1
No ratings yet
Python Unit 1
32 pages
Python Unit 2
No ratings yet
Python Unit 2
25 pages
M.SC (Computer Science), GGCA, Syllabus-New
No ratings yet
M.SC (Computer Science), GGCA, Syllabus-New
32 pages
(By Kuafu) Introduction To 3D Game Programming With DirectX90c A Shader Approach
100% (1)
(By Kuafu) Introduction To 3D Game Programming With DirectX90c A Shader Approach
413 pages
Cpu Scheduling Codes
No ratings yet
Cpu Scheduling Codes
12 pages
Warranty Card
No ratings yet
Warranty Card
3 pages
PG - IV Semester
No ratings yet
PG - IV Semester
150 pages
Zond-MT 1D: User Guide
No ratings yet
Zond-MT 1D: User Guide
25 pages
Intel® Core™ I3 Desktop Processor Comparison Chart: Technical Documents Datashseets, Design Guides, Etc
No ratings yet
Intel® Core™ I3 Desktop Processor Comparison Chart: Technical Documents Datashseets, Design Guides, Etc
2 pages
Stack Solutions - 2022-09-13 - 21-12-25
No ratings yet
Stack Solutions - 2022-09-13 - 21-12-25
7 pages
General Information Cis375 Fall2015
No ratings yet
General Information Cis375 Fall2015
6 pages
Small CorelDraw Tutorial E - Book
No ratings yet
Small CorelDraw Tutorial E - Book
28 pages
Practice Problems
No ratings yet
Practice Problems
4 pages
Information Security Plan Gramm Leach Bliley Compliance HIPAA Compliance PCI Compliance
No ratings yet
Information Security Plan Gramm Leach Bliley Compliance HIPAA Compliance PCI Compliance
6 pages
Edexcel GCE: Decision Mathematics D1 Advanced/Advanced Subsidiary
No ratings yet
Edexcel GCE: Decision Mathematics D1 Advanced/Advanced Subsidiary
24 pages
Subqueries With ANY, IN, or SOME: ANY True True ANY s1 t1 s1 s1 t2 IN Any IN Any Some ANY
No ratings yet
Subqueries With ANY, IN, or SOME: ANY True True ANY s1 t1 s1 s1 t2 IN Any IN Any Some ANY
7 pages
Oscar Chen Resume
No ratings yet
Oscar Chen Resume
1 page
Eogui Manual
No ratings yet
Eogui Manual
17 pages
万能密码fuzz
No ratings yet
万能密码fuzz
4 pages
Bus 188 - Chapter 5 - Database Processing
No ratings yet
Bus 188 - Chapter 5 - Database Processing
3 pages
Installing Oracle, PHP and Apache On WINDows
No ratings yet
Installing Oracle, PHP and Apache On WINDows
5 pages