0% found this document useful (0 votes)

151 views55 pages

ML Unit-2

The document outlines the syllabus for a Machine Learning course, covering topics such as supervised and unsupervised learning, ensemble methods, and neural networks. It details various algorithms and techniques, including K-Nearest Neighbors, Decision Trees, and their applications in classification and regression tasks. Additionally, it discusses key concepts like distance metrics, attribute selection measures, and the importance of model evaluation metrics.

Uploaded by

N200831 SHAIK ABDUL REHAMAN

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

151 views55 pages

ML Unit-2

Uploaded by

N200831 SHAIK ABDUL REHAMAN

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 55

Artificial Intelligence and Data Science (AI & DS)

Home

Machine Learning [B20AD3201]

Syllabus
Unit-1

Introduction- Artificial Intelligence, Machine Learning, Deep Learning, Types of Machine

Learning Systems, Main Challenges of Machine Learning. Statistical Learning: Introduction,
Supervised and Unsupervised Learning, Training and Test Loss, Tradeoffs in Statistical
Learning, Estimating Risk Statistics, Sampling distribution of an estimator, Empirical Risk
Minimization.

Unit-2

Supervised Learning (Regression/Classification): Basic Methods: Distance-based Methods,

Nearest Neighbours, Decision Trees, Naive Bayes, Linear Models: Linear Regression, Logistic
Regression, Generalized Linear Models, Support Vector Machines, Binary Classification:
Multiclass/Structured outputs, MNIST, Ranking.

Unit-3

Ensemble Learning and Random Forests: Introduction, Voting Classifiers, Bagging and
Pasting, Random Forests, Boosting, Stacking. Support Vector Machine: Linear SVM
Classification, Nonlinear SVM Classification SVM Regression, Naïve Bayes Classifiers.

Unit-4

Unsupervised Learning Techniques: Clustering, K-Means, Limits of K-Means, Using

Clustering for Image Segmentation, Using Clustering for Pre processing, Using Clustering for
Semi-Supervised Learning, DBSCAN, Gaussian Mixtures. Dimensionality Reduction: The
Curse of Dimensionality, Main Approaches for Dimensionality Reduction, PCA, Using Scikit-
Learn, Randomized PCA, Kernel PCA.

Unit-5

Neural Networks and Deep Learning: Introduction to Artificial Neural Networks with Keras,
Implementing MLPs with Keras, Installing Tensor Flow 2, Loading and Preprocessing Data
with Tensor Flow.

Department of Information Technology, SRKREC(A)

Artificial Intelligence and Data Science (AI & DS)

Home
Unit-2

Supervised Learning (Regression/Classification): Basic Methods: Distance-based Methods,

Nearest Neighbours, Decision Trees, Naive Bayes, Linear Models: Linear Regression,
Logistic Regression, Generalized Linear Models, Support Vector Machines, Binary
Classification: Multiclass/Structured outputs, MNIST, Ranking.

Supervised Learning (Regression/Classification): Basic Methods: Distance-

based Methods
Regression and Classification algorithms are Supervised Learning algorithms. Both are used
for prediction in Machine learning and work with labeled datasets. Distance-based methods are
a subset of machine learning algorithms that rely on calculating distance or similarity between
data points to make predictions. These methods are widely used in both regression (predicting
continuous outcomes) and classification (categorizing data points into discrete classes).

Regression is a type of supervised learning used to predict continuous numeric values. The
goal of regression is to model the relationship between input features (independent variables)
and the target variable (dependent variable).
Examples:
• Predicting house prices based on features like size, location, and age.
• Forecasting stock prices or sales revenue.
• Estimating temperature based on weather conditions.
Key Characteristics:
• Output: Continuous numeric values (e.g., y=32.5).
• Evaluation Metrics: Mean Squared Error (MSE), Mean Absolute Error (MAE), R-
squared (R2).

Classification is a type of supervised learning used to predict discrete labels or categories.

The model learns to classify input data into one or more predefined categories.
Examples:
• Email spam detection (Spam vs. Not Spam).
• Identifying handwritten digits (e.g., 0–9).
• Diagnosing diseases (e.g., Cancer vs. No Cancer).
Key Characteristics:
• Output: Discrete categories or labels (e.g., Class A, B, or C).
• Evaluation Metrics: Accuracy, Precision, Recall, F1 Score.

Department of Information Technology, SRKREC(A)

Artificial Intelligence and Data Science (AI & DS)

Home
Distance Metrics: Distance is a measure of similarity or dissimilarity between two data points.

• Euclidean Distance: Straight-line distance in a multi-dimensional space.

• Manhattan Distance: Sum of absolute differences.

• Minkowski Distance: Generalization of both Euclidean and Manhattan distances.

• Cosine Similarity: Measures angular similarity instead of distance.

Types of Regression and Classification

Department of Information Technology, SRKREC(A)

Artificial Intelligence and Data Science (AI & DS)

Home
K-Nearest Neighbour

K-NN is a Supervised Machine Learning Algorithm. It solves both Regression and

Classification problems, mostly used for Classification problems. KNN stores all available data
points and classifies new data point based on a similarity measure. K in KNN is a parameter
that refers to the number of the nearest neighbours to include in the majority voting process.
We can use KNN when Dataset is labelled and noise-free and it’s must be small because KNN
is a “Lazy learner”.

The K-NN working can be explained based on the below algorithm:

1. Choose the number of neighbors (k):

• The value of kkk determines how many nearest neighbors will be considered
for prediction.
• A smaller kkk makes the model more sensitive to noise, while a larger kkk
results in smoother predictions.
2. Calculate the distance between the target point and all other points:
• Euclidean distance
• Manhattan distance
• Cosine similarity
3. Sort the distances:
• After calculating distances, sort the training data points based on how close they
are to the target point.
4. Select the top k nearest neighbors:
• Identify the k closest data points to the target data point.
5. Make the prediction:
• For classification: Use the majority class among the k nearest neighbors.
• For regression: Calculate the average (or weighted average) of the target values
of the k nearest neighbors.

Department of Information Technology, SRKREC(A)

Artificial Intelligence and Data Science (AI & DS)

Home
Suppose we have a new data point and we need to put it in the required category. Consider the
below image:

• Firstly, we will choose the number of neighbors, so we will choose the k=5.
• Next, we will calculate the Euclidean distance between the data points. The Euclidean
distance is the distance between two points, which we have already studied in geometry.
It can be calculated as:

• By calculating the Euclidean distance we got the nearest neighbors, as three nearest
neighbors in category A and two nearest neighbors in category B. Consider the below
image:

Department of Information Technology, SRKREC(A)

Artificial Intelligence and Data Science (AI & DS)

Home
• As we can see the 3 nearest neighbors are from category A, hence this new data point
must belong to category A.

How to select the value of K in the K-NN Algorithm?

Below are some points to remember while selecting the value of K in the K-NN algorithm:
o There is no particular way to determine the best value for "K", so we need to try some
values to find the best out of them. The most preferred value for K is 5.
o A very low value for K such as K=1 or K=2, can be noisy and lead to the effects of
outliers in the model.
o Large values for K are good, but it may find some difficulties.
Advantages of KNN Algorithm:
o It is simple to implement.
o It is robust to the noisy training data
o It can be more effective if the training data is large.
Disadvantages of KNN Algorithm:
o Always needs to determine the value of K which may be complex some time.
o The computation cost is high because of calculating the distance between the data points
for all the training samples.

Example: The table below represents our data set. We have two columns Brightness and Saturation.
Each row in the table has a class of either Red or Blue.Before we introduce a new data entry, let's
assume the value of K is 5.

Department of Information Technology, SRKREC(A)

Artificial Intelligence and Data Science (AI & DS)

Home
Here's the new data entry:

We have a new entry, but it doesn't have a class yet. To know its class, we must calculate the
distance from the latest entry to other entries in the data set using the Euclidean distance
formula.
Here's the formula: √(X₂-X₁)²+(Y₂-Y₁)²
Where:
• X₂ = New entry's brightness (20).
• X₁= Existing entry's brightness.
• Y₂ = New entry's saturation (35).
• Y₁ = Existing entry's saturation.
Let's do the calculation together. I'll calculate the first three.
Distance #1
For the first row, d1:

d1 = √(20 - 40)² + (35 - 20)²

= √400 + 225
= √625
= 25
We now know the distance from the new data entry to the first entry in the table. Let's update
the table.

Department of Information Technology, SRKREC(A)

Artificial Intelligence and Data Science (AI & DS)

Home
Here's what the table will look like after all the distances have been calculated:

Let's rearrange the distances in ascending order:

Since we chose 5 as the value of K, we'll only consider the first five rows. That is:

As you can see above, the majority class within the 5 nearest neighbors to the new entry is Red.
Therefore, we'll classify the new entry as Red.
Brightness Saturation Class
20 35 Red

Department of Information Technology, SRKREC(A)

Artificial Intelligence and Data Science (AI & DS)

Home
Decision Trees
Decision tree induction is the learning of decision trees from class-labeled training tuples. A
decision tree is a flowchart-like tree structure, where each internal node (nonleaf node) denotes
a test on an attribute, each branch represents an outcome of the test, and each leaf node (or
terminal node) holds a class label. The topmost node in a tree is the root node. A typical decision
tree is shown in below figure. It represents the concept buys computer, that is, it predicts
whether a customer at AllElectronics is likely to purchase a computer. Rectangles denote
internal nodes, and ovals denote leaf nodes. Some decision tree algorithms produce only binary
trees (where each internal node branches to exactly two other nodes), whereas others can
produce nonbinary trees.

A decision tree for the concept buys computer, indicating whether an AllElectronics customer
is likely to purchase a computer. Each internal (nonleaf) node represents a test on an
attribute. Each leaf node represents a class
“How are decision trees used for classification?” Given a tuple, X, for which the associated
class label is unknown, the attribute values of the tuple are tested against the decision tree. A
path is traced from the root to a leaf node, which holds the class prediction for that tuple.
Decision trees can easily be converted to classification rules.
“Why are decision tree classifiers so popular?” The construction of decision tree classifiers
does not require any domain knowledge or parameter setting, and therefore is appropriate for
exploratory knowledge discovery. Decision trees can handle multidimensional data. Their
representation of acquired knowledge in tree formis intuitive and generally easy to assimilate
by humans. The learning and classification steps of decision tree induction are simple and fast.

Department of Information Technology, SRKREC(A)

Artificial Intelligence and Data Science (AI & DS)

Home
In general, decision tree classifiers have good accuracy. However, successful use may depend
on the data at hand. Decision tree induction algorithms have been used for classification in
many application areas such as medicine, manufacturing and production, financial analysis,
astronomy, and molecular biology. Decision trees are the basis of several commercial rule
induction systems.

Decision Tree Induction

During the late 1970s and early 1980s, J. Ross Quinlan, a researcher in machine learning,
developed a decision tree algorithm known as ID3 (Iterative Dichotomiser). This work
expanded on earlier work on concept learning systems, described by E. B. Hunt, J. Marin, and
P. T. Stone. Quinlan later presented C4.5 (a successor of ID3), which became a benchmark to
which newer supervised learning algorithms are often compared. In 1984, a group of
statisticians (L. Breiman, J. Friedman, R. Olshen, and C. Stone) published the book
Classification and Regression Trees (CART), which described the generation of binary
decision trees.
ID3, C4.5, and CART adopt a greedy (i.e., nonbacktracking) approach in which decision trees
are constructed in a top-down recursive divide-and-conquer manner. Most algorithms for
decision tree induction also follow a top-down approach, which starts with a training set of
tuples and their associated class labels. The training set is recursively partitioned into smaller
subsets as the tree is being built.

Department of Information Technology, SRKREC(A)

Artificial Intelligence and Data Science (AI & DS)

Home

Attribute Selection Measures

An attribute selection measure is a heuristic for selecting the splitting criterion that “best”
separates a given data partition, D, of class-labeled training tuples into individual classes. If we
were to split D into smaller partitions according to the outcomes of the splitting criterion,
ideally each partition would be pure (i.e., all the tuples that fall into a given partition would
belong to the same class). Conceptually, the “best” splitting criterion is the one that most
closely results in such a scenario. Attribute selection measures are also known as splitting rules
because they determine how the tuples at a given node are to be split.
Three popular attribute selection measures—
1. Information gain, (ID3)
2. Gain ratio, (C4.5)
3. Gini index. (CART)

Department of Information Technology, SRKREC(A)

Artificial Intelligence and Data Science (AI & DS)

Home
Information Gain
ID3 uses information gain as its attribute selection measure. The expected information needed
to classify a tuple in D is given by

where pi is the nonzero probability that an arbitrary tuple in D belongs to class Ci and is
estimated by |Ci,D| / |D|.Note that, at this point, the information we have is based solely on the
proportions of tuples of each class. Info(D) is also known as the entropy of D. How much more
information would we still need (after the partitioning) to arrive at an exact classification? This
amount is measured by

Information gain is defined as the difference between the original information requirement (i.e.,
based on just the proportion of classes) and the new requirement (i.e., obtained after
partitioning on A). That is,

Example: Induction of a decision tree using information gain.

The class label attribute, buys computer, has two distinct values (namely, fyes, nog); therefore,
there are two distinct classes (i.e., m =2). Let class C1 correspond to yes and class C2
correspond to no.There are nine tuples of class yes and five tuples of class no. A (root) node N
is createdfor the tuples in D. To find the splitting criterion for these tuples, we must compute
the information gain of each attribute.

Department of Information Technology, SRKREC(A)

Artificial Intelligence and Data Science (AI & DS)

Home

Next, we need to compute the expected information requirement for each attribute. Let’s start
with the attribute age. We need to look at the distribution of yes and no tuples for each category
of age. For the age category “youth,” there are two yes tuples and three no tuples. For the
category “middle aged,” there are four yes tuples and zero no tuples.For the category “senior,”
there are three yes tuples and two no tuples.

Hence, the gain in information from such a partitioning would be

Similarly, we can compute Gain.income = 0.029 bits, Gain.student= 0.151 bits, and Gain.credit
rating= 0.048 bits. Because age has the highest information gain among the attributes, it is
selected as the splitting attribute. Node N is labeled with age, and branches are grown for each
of the attribute’s values. Notice that the tuples falling into the partition for age = middle aged
all belong to the same class. Because they all belong to class “yes,”a leaf should therefore be
created at the end of this branch and labeled “yes.”

Department of Information Technology, SRKREC(A)

Artificial Intelligence and Data Science (AI & DS)

Home
Gain Ratio
The information gain measure is biased toward tests with many outcomes. That is, it prefers to
select attributes having a large number of values. For example, consider an attribute that acts
as a unique identifier such as product ID. A split on product ID would result in a large number
of partitions (as many as there are values), each one containing just one tuple. Because each
partition is pure, the information required to classify data set D based on this partitioning would
be Infoproduct ID(D) = 0. Therefore, the information gained by partitioning on this attribute is
maximal. Clearly, such a partitioning is useless for classification.
C4.5, a successor of ID3, uses an extension to information gain known as gain ratio, which
attempts to overcome this bias. It applies a kind of normalization to information gain using a
“split information” value defined analogously with Info(D) as

The gain ratio is defined as

The attribute with the maximum gain ratio is selected as the splitting attribute. Note, however,
that as the split information approaches 0, the ratio becomes unstable. A constraint is added to
avoid this, whereby the information gain of the test selected must be large at least as great as
the average gain over all tests examined.
Computation of gain ratio for the attribute income. A test on income splits the data of Table
8.1 into three partitions, namely low, medium, and high, containing four, six, and four tuples,
respectively. To compute the gain ratio of income.

we have Gain(income) = 0.029. Therefore,

GainRatio(income) = 0.029/1.557 D 0.019.

Department of Information Technology, SRKREC(A)

Artificial Intelligence and Data Science (AI & DS)

Home
Gini Index
The Gini index is used in CART. Using the notation previously described, the Gini index
measures the impurity of D, a data partition or set of training tuples, as

For example, if income has three possible values, namely {low, medium, high}, then the
possible subsets are {low, medium, high}, {low, medium}, {low, high}, {medium, high}, {low},
{medium}, {high}, and {}. We exclude the power set, {low, medium, high}, and the empty set
from consideration since, conceptually, they do not represent a split. If a binary split on A
partitions D into D1 and D2, the Gini index of D given that partitioning is

The reduction in impurity that would be incurred by a binary split on a discrete- or continuous-
valued attribute A is

Induction of a decision tree using the Gini index. There are nine tuples belonging to the class
buys_computer =yes and the remaining five tuples belong to the class buys_computer = no. A
(root) node N is created for the tuples in D. Use the Gini index to compute the impurity of D:

We need to compute the Gini index for each attribute. Let’s start with the attribute income and
consider each of the possible splitting subsets. Consider the subset {low, medium}. This would
result in 10 tuples in partition D1 satisfying the condition “income {low, medium}.” The
remaining four tuples of D would be assigned to partition D2.

Similarly, the Gini index values for splits on the remaining subsets are 0.458 (for the subsets
{low, high} and {medium}) and 0.450 (for the subsets {medium, high} and {low}). Therefore,
the best binary split for attribute income is on {low, medium} (or {high}) because it minimizes
the Gini index. The lowest value of Gini is considered for selecting the best root node.

Department of Information Technology, SRKREC(A)

Artificial Intelligence and Data Science (AI & DS)

Home
Naive Bayes
What are Bayesian classifiers?” Bayesian classifiers are statistical classifiers. They can predict
class membership probabilities such as the probability that a given tuple belongs to a particular
class.Bayesian classification is based on Bayes’ theorem, described next. Studies comparing
classification algorithms have found a simple Bayesian classifier known as the naïve Bayesian
classifier to be comparable in performance with decision tree and selected neural network
classifiers. Bayesian classifiers have also exhibited high accuracy and speed when applied to
large databases.
Naïve Bayesian Classification:
The na¨ıve Bayesian classifier, or simple Bayesian classifier, works as follows:
1. Let D be a training set of tuples and their associated class labels. As usual, each tuple is
represented by an n-dimensional attribute vector, X = (x1, x2, : : : , xn), depicting n measurements
made on the tuple from n attributes, respectively, A1, A2, : : : , An.
2. Suppose that there are m classes, C1, C2, : : : , Cm. Given a tuple, X, the classifier will predict
that X belongs to the class having the highest posterior probability, conditioned on X. That is,
the naïve Bayesian classifier predicts that tuple X belongs to the class Ci if and only if

Thus, we maximize P(Ci | X). The class Ci for which P(Ci | X) is maximized is called the
maximum posteriori hypothesis.

3. As P(X) is constant for all classes, only P(X|Ci)P(Ci) needs to be maximized.

4. Given data sets with many attributes, it would be extremely computationally expensive to
compute P(X | Ci). To reduce computation in evaluating P(X | Ci), the naïve assumption of
class-conditional independence is made.

Example : Predicting a class label using na¨ıve Bayesian classification. We wish to predict the
class label of a tuple using na¨ıve Bayesian classification, given the same training data as
discussed in decision tree induction. The data tuples are described by the attributes age, income,
student, and credit rating. The class label attribute, buys computer, has two distinct values
(namely, {yes, no}). Let C1 correspond to the class buys computer = yes and C2 correspond to
buys computer = no. The tuple we wish to classify is

Department of Information Technology, SRKREC(A)

Artificial Intelligence and Data Science (AI & DS)

Home

Department of Information Technology, SRKREC(A)

Artificial Intelligence and Data Science (AI & DS)

Home
Linear Models
Linear models are statistical techniques used to describe the relationship between one or more
independent variables (x) and a dependent variable (y). They assume that this relationship can
be expressed as a linear equation.
General Form of Linear Models
The general form of a linear model is:

Where:
• y: Dependent variable (response variable).
• x1,x2,…,xk: Independent variables (predictors).
• β0: Intercept (value of y when all x variables are 0).
• β1,β2,…,βk: Coefficients representing the effect of each independent variable on y.
• ϵ: Error term representing the difference between the observed and predicted y.
Linear models are widely used because they are simple, interpretable, and effective in many
practical applications.

Types of Linear Models

1. Simple Linear Regression
2. Multiple Linear Regression
3. Logistic Regression (Generalized Linear Models)
4. Polynomial Regression
5. Ridge, Lasso, and Elastic Net Regression

Department of Information Technology, SRKREC(A)

Artificial Intelligence and Data Science (AI & DS)

Home

Department of Information Technology, SRKREC(A)

Artificial Intelligence and Data Science (AI & DS)

Home
Linear Regression
Linear Regression is one of the most simple Machine learning algorithm that comes under
Supervised Learning technique and used for solving regression problems. It is used for
predicting the continuous dependent variable with the help of independent variables.
The goal of the Linear regression is to find the best fit line that can accurately predict the output
for the continuous dependent variable. If single independent variable is used for prediction then
it is called Simple Linear Regression and if there are more than two independent variables then
such regression is called as Multiple Linear Regression. By finding the best fit line, algorithm
establish the relationship between dependent variable and independent variable. And the
relationship should be of linear nature.The output for Linear regression should only be the
continuous values such as price, age, salary, etc. The relationship between the dependent
variable and independent variable can be shown in below image:

In above image the dependent variable is on Y-axis (salary) and independent variable is on x-
axis(experience). The regression line can be written as:

The formula y=α+β*x is the common way to represent the equation of a simple linear
regression model, where:
• y: Dependent variable (e.g., Salary)
• x: Independent variable (e.g., Experience)
• α: Intercept (value of y when x=0)
• β: Slope (rate of change of y with respect to x)
This representation is equivalent to y=c+mx, but with different symbols.
The slope β in linear regression is usually calculated as:

Department of Information Technology, SRKREC(A)

Artificial Intelligence and Data Science (AI & DS)

Home

Where:

The intercept α\alphaα is calculated as:

Example: Work Experience and Salary

Let's use the same dataset:

Department of Information Technology, SRKREC(A)

Artificial Intelligence and Data Science (AI & DS)

Home

Prediction
For 6 years of experience (x=6):
y=30,000+3,800*6=30,000+22,800=52,800
The predicted salary is $52,800.

Department of Information Technology, SRKREC(A)

Artificial Intelligence and Data Science (AI & DS)

Home
Multiple Linear Regression
Multiple linear regression models the relationship between one dependent variable (y) and two
or more independent variables (x1,x2,…,xk). It generalizes simple linear regression to handle
multiple predictors.
The equation for multiple linear regression is:

We want to fit the data to:

Department of Information Technology, SRKREC(A)

Artificial Intelligence and Data Science (AI & DS)

Home
Logistic Regression
Logistic regression is one of the most popular Machine learning algorithm that comes under
Supervised Learning techniques. It can be used for Classification as well as for Regression
problems, but mainly used for Classification problems. Logistic regression is used to predict
the categorical dependent variable with the help of independent variables. The output of
Logistic Regression problem can be only between the 0 and 1.
Logistic regression can be used where the probabilities between two classes is required. Such
as whether it will rain today or not, either 0 or 1, true or false etc. Logistic regression is based
on the concept of Maximum Likelihood estimation. According to this estimation, the observed
data should be most probable. In logistic regression, we pass the weighted sum of inputs
through an activation function that can map values in between 0 and 1. Such activation function
is known as sigmoid function and the curve obtained is called as sigmoid curve or S-curve.
Consider the below image:

The logistic regression equation is:

Where:

Department of Information Technology, SRKREC(A)

Artificial Intelligence and Data Science (AI & DS)

Home
Types of Logistic Regression
1. Binary Logistic Regression:
o Predicts one of two possible outcomes.
o Example: Predicting if a customer will buy a product (yes/no).
2. Multinomial Logistic Regression:
o Predicts outcomes with three or more unordered categories.
o Example: Predicting the type of transport (car, bus, train).
3. Ordinal Logistic Regression:
o Predicts outcomes with three or more ordered categories.
o Example: Predicting customer satisfaction (low, medium, high).

1. Binary Logistic Regression

Example: Predicting Disease Diagnosis Scenario
A medical researcher wants to predict whether a patient has a disease (Yes = 1, No = 0) based
on their cholesterol level (x). The dataset is as follows:

Department of Information Technology, SRKREC(A)

Artificial Intelligence and Data Science (AI & DS)

Home

Steps to calculate β0 , β1

Partial Derivatives of the Log-Likelihood

To maximize the log-likelihood, we calculate the partial derivatives with respect to β0 and β1 :

Department of Information Technology, SRKREC(A)

Artificial Intelligence and Data Science (AI & DS)

Home

Predicted probability: 88.1%

If the decision threshold is 0.5, the model predicts the patient has the disease (1).

Department of Information Technology, SRKREC(A)

Artificial Intelligence and Data Science (AI & DS)

Home

Department of Information Technology, SRKREC(A)

Artificial Intelligence and Data Science (AI & DS)

Home
Multinomial Logistic Regression
Multinomial logistic regression is an extension of binary logistic regression that is used when
the dependent variable (y) has more than two categories. Here, y is categorical and takes on
one of k classes (y∈{1,2,…,k}).
Example Problem
Dataset
We have the following dataset with x1 (Age) and x2 (Education Level) as predictors and y (Job
Type) as the outcome variable. y has three categories: y=1 (Engineer), y=2 (Teacher), and y=3
(Doctor).

Department of Information Technology, SRKREC(A)

Artificial Intelligence and Data Science (AI & DS)

Home

Department of Information Technology, SRKREC(A)

Artificial Intelligence and Data Science (AI & DS)

Home

The final probabilities are:

• P(y=1): 2.66% (Engineer)
• P(y=2): 97.21% (Teacher)
• P(y=3): 0.13% (Doctor)
The predicted job type for this person is Teacher (y = 2) since it has the highest probability.

Department of Information Technology, SRKREC(A)

Artificial Intelligence and Data Science (AI & DS)

Home
Ordinal Logistic Regression:
Ordinal Logistic Regression is used when the dependent variable (y) is ordinal—i.e., it has
a natural order but the distances between categories are not assumed to be equal. For instance,
ratings like "Low," "Medium," and "High" have an order, but the difference between "Low"
and "Medium" may not be the same as between "Medium" and "High."
Example:
Consider the following dataset where we predict customer satisfaction (y) based on their
monthly income (x1) and hours spent shopping online (x2):

Here:
• y (Satisfaction) is the ordinal dependent variable with values 1 (Low), 2 (Medium), 3
(High).
• x1 (Income) and x2 (Hours Online) are independent variables.

Step 1: Model Setup

Ordinal logistic regression predicts the cumulative probability of being in a category or below.
The model assumes that there is a linear relationship between the predictors and the log odds
of being at or below a particular category.
The model can be written as:

Where:
• j is the category (j=1,2).
• θj are the threshold parameters (cut-points) separating the categories.
• β1 and β2 are the coefficients for x1 (Income) and x2(Hours Online), respectively.

Department of Information Technology, SRKREC(A)

Artificial Intelligence and Data Science (AI & DS)

Home
The cumulative probabilities are modeled as:
1. P(y≤1) (Low satisfaction or below),
2. P(y≤2) (Medium satisfaction or below).
The probability for y=3(High satisfaction) is:
P(y=3) = 1−P(y≤2)

Department of Information Technology, SRKREC(A)

Artificial Intelligence and Data Science (AI & DS)

Home

Department of Information Technology, SRKREC(A)

Artificial Intelligence and Data Science (AI & DS)

Home

Department of Information Technology, SRKREC(A)

Artificial Intelligence and Data Science (AI & DS)

Home
Difference between Linear regression vs Logistic Regression

Linear Regression Logistic Regression

Linear Regression is a supervised Logistic Regression is a supervised

regression model. classification model.

Predicts continuous numeric values Predicts probabilities for categorical outcomes

Continuous (e.g., 23.5, 100.1) Probability (e.g., 0.85) or class label (0 or 1)

Continuous Binary or Categorical

y=β0+β1x+ϵ P(y=1) = 1/(1+e−(β0+β1x))

Mean Squared Error (MSE) Log Loss (Cross-Entropy Loss)

Linearity, normality, homoscedasticity, Log-odds linear with predictors, independent

independence observations

Models a linear relationship Models a sigmoid (S-shaped) curve

R2, RMSE, MSE Accuracy, Precision, Recall, F1 Score, ROC-AUC

Here no threshold value is needed. Here a threshold value is added.

It is based on the least square

It is based on maximum likelihood estimation.
estimation.
Any change in the coefficient leads to a change
Here when we plot the training in both the direction and the steepness of the
datasets, a straight line can be drawn logistic function. It means positive slopes result
that touches maximum plots. in an S-shaped curve and negative slopes result
in a Z-shaped curve.
Slightly more complex due to probability
Simple to implement and interpret
estimation
Applications of logistic regression:
Applications of linear regression: • Medicine
• Financial risk assessment • Credit scoring
• Business insights • Hotel Booking
• Market analysis • Gaming
• Text editing

Department of Information Technology, SRKREC(A)

Artificial Intelligence and Data Science (AI & DS)

Home
Generalized Linear Model (GLM)
A Generalized Linear Model (GLM) is an extension of ordinary linear regression that allows
the dependent variable (y) to have a distribution other than the normal distribution. GLMs are
highly flexible and can handle different types of data such as binary, count, and categorical
outcomes.
Key Components of a GLM
1. Random Component:
• Specifies the probability distribution of the response variable (y).
• Examples: Normal, Binomial, Poisson, etc.
2. Systematic Component:
• A linear predictor is used to combine the independent variables (x1,x2,…).
• Form: η=β0+β1x1+β2x2+……
• Where η\etaη is the linear predictor.
3. Link Function:
• Links the expected value of the response variable (E(y)=μ) to the linear
predictor.
• Form: g(μ)=η

Types of GLMs
1. Linear Regression
• Distribution: Normal.
• Link Function: Identity g(μ)=η.
• Example: Predicting house prices based on area and location.
Price=β0+β1⋅Area+β2⋅Location
2. Logistic Regression
• Distribution: Binomial.
• Link Function: Logit (g(μ)=ln(μ/1−μ)
• Example: Predicting whether a student passes (1) or fails (0) based on study
hours

Department of Information Technology, SRKREC(A)

Artificial Intelligence and Data Science (AI & DS)

Home
3. Poisson Regression
• Distribution: Poisson.
• Link Function: Log (g(μ)=ln(μ).
• Example: Predicting the number of customers arriving at a store based on
time of day.
ln(μ)=β0+β1⋅Time of Day
4. Multinomial Logistic Regression
• Distribution: Multinomial.
• Link Function: Generalized Logit.
• Example: Predicting the choice of transportation (car, bus, train) based on
income and distance.

5. Ordinal Logistic Regression

• Distribution: Cumulative probabilities for ordinal data.
• Link Function: Logit or complementary log-log.
• Example: Predicting customer satisfaction (Low, Medium, High) based on
service quality.

Example:
A hospital wants to model the number of daily patient arrivals (y) based on the number of staff
on duty (x1) and whether it is a weekend (x2).
Data:

Department of Information Technology, SRKREC(A)

Artificial Intelligence and Data Science (AI & DS)

Home

Department of Information Technology, SRKREC(A)

Artificial Intelligence and Data Science (AI & DS)

Home
Support Vector Machines
Support Vector Machine or SVM is one of the most popular Supervised Learning algorithms,
which is used for Classification as well as Regression problems. However, primarily, it is used
for Classification problems in Machine Learning. The goal of the SVM algorithm is to create
the best line or decision boundary that can segregate n-dimensional space into classes so that
we can easily put the new data point in the correct category in the future. This best decision
boundary is called a hyperplane. SVM chooses the extreme points/vectors that help in creating
the hyperplane. These extreme cases are called as support vectors, and hence algorithm is
termed as Support Vector Machine. Consider the below diagram in which there are two
different categories that are classified using a decision boundary or hyperplane:

Example: SVM can be understood with the example that we have used in the KNN classifier.
Suppose we see a strange cat that also has some features of dogs, so if we want a model that
can accurately identify whether it is a cat or dog, so such a model can be created by using the
SVM algorithm. We will first train our model with lots of images of cats and dogs so that it can
learn about different features of cats and dogs, and then we test it with this strange creature. So
as support vector creates a decision boundary between these two data (cat and dog) and choose
extreme cases (support vectors), it will see the extreme case of cat and dog. On the basis of the
support vectors, it will classify it as a cat. Consider the below diagram:

SVM algorithm can be used for Face detection, image classification, text categorization, etc.

Department of Information Technology, SRKREC(A)

Artificial Intelligence and Data Science (AI & DS)

Home
Types of SVM:
• Linear SVM: Linear SVM is used for linearly separable data, which means if a dataset
can be classified into two classes by using a single straight line, then such data is termed
as linearly separable data, and classifier is used called as Linear SVM classifier.
• Non-linear SVM: Non-Linear SVM is used for non-linearly separated data, which
means if a dataset cannot be classified by using a straight line, then such data is termed
as non-linear data and classifier used is called as Non-linear SVM classifier.
Hyperplane, Support Vectors and Margin in the SVM algorithm:
Hyperplane: There can be multiple lines/decision boundaries to segregate the classes in n-
dimensional space, but we need to find out the best decision boundary that helps to classify the
data points. This best boundary is known as the hyperplane of SVM.
The dimensions of the hyperplane depend on the features present in the dataset, which means
if there are 2 features (as shown in image), then hyperplane will be a straight line. And if there
are 3 features, then hyperplane will be a 2-dimension plane.
We always create a hyperplane that has a maximum margin, which means the maximum
distance between the data points.
Support Vectors:
The data points or vectors that are the closest to the hyperplane and which affect the position
of the hyperplane are termed as Support Vector. Since these vectors support the hyperplane,
hence called a Support vector.
Margin
A margin is a separation gap between the two lines on the closest data points. It is calculated
as the perpendicular distance from the line to support vectors or closest data points. In SVMs,
we try to maximize this separation gap so that we get maximum margin.

How does SVM works?

Linear SVM:
The working of the SVM algorithm can be understood by using an example. Suppose we have
a dataset that has two tags (green and blue), and the dataset has two features x1 and x2. We
want a classifier that can classify the pair(x1, x2) of coordinates in either green or blue.
Consider the below image:

Department of Information Technology, SRKREC(A)

Artificial Intelligence and Data Science (AI & DS)

Home

So as it is 2-d space so by just using a straight line, we can easily separate these two classes.
But there can be multiple lines that can separate these classes. Consider the below image:

Hence, the SVM algorithm helps to find the best line or decision boundary; this best boundary
or region is called as a hyperplane. SVM algorithm finds the closest point of the lines from
both the classes. These points are called support vectors. The distance between the vectors and
the hyperplane is called as margin. And the goal of SVM is to maximize this margin.
The hyperplane with maximum margin is called the optimal hyperplane.

Non-Linear SVM:
If data is linearly arranged, then we can separate it by using a straight line, but for non-linear
data, we cannot draw a single straight line. Consider the below image:

Department of Information Technology, SRKREC(A)

Artificial Intelligence and Data Science (AI & DS)

Home

So to separate these data points, we need to add one more dimension. For linear data, we have
used two dimensions x and y, so for non-linear data, we will add a third dimension z. It can be
calculated as: z=x2 +y2
By adding the third dimension, the sample space will become as below image:

So now, SVM will divide the datasets into classes in the following way. Consider the below
image:

Since we are in 3-d Space, hence it is looking like a plane parallel to the x-axis. If we convert
it in 2d space with z=1, then it will become as:

Department of Information Technology, SRKREC(A)

Artificial Intelligence and Data Science (AI & DS)

Home

Hence we get a circumference of radius 1 in case of non-linear data.

Kernel trick
In practice, SVM algorithm is implemented using a kernel. It uses a technique called the kernel
trick. In simple words, a kernel is just a function that maps the data to a higher dimension where
data is separable. A kernel transforms a low-dimensional input data space into a higher
dimensional space. So, it converts non-linear separable problems to linear separable problems
by adding more dimensions to it. Thus, the kernel trick helps us to build a more accurate
classifier. Hence, it is useful in non-linear separation problems.
We can define a kernel function as follows-
Kernel function

In the context of SVMs, there are 4 popular kernels –

1. Linear kernel,
2. Polynomial kernel,
3. Radial Basis Function (RBF) kernel (also called Gaussian kernel)
4. Sigmoid kernel.

Linear kernel
In linear kernel, the kernel function takes the form of a linear function as follows-
linear kernel : K(xi , xj ) = xiT xj
Linear kernel is used when the data is linearly separable. It means that data can be separated
using a single line. It is one of the most common kernels to be used. It is mostly used when
there are large number of features in a dataset. Linear kernel is often used for text classification

Department of Information Technology, SRKREC(A)

Artificial Intelligence and Data Science (AI & DS)

Home
purposes.Training with a linear kernel is usually faster, because we only need to optimize the
C regularization parameter. When training with other kernels, we also need to optimize the γ
parameter. So, performing a grid search will usually take more time.Linear kernel can be
visualized with the following figure.
Linear Kernel

Polynomial Kernel
Polynomial kernel represents the similarity of vectors (training samples) in a feature space over
polynomials of the original variables. The polynomial kernel looks not only at the given
features of input samples to determine their similarity, but also combinations of the input
samples.For degree-d polynomials, the polynomial kernel is defined as follows –
Polynomial kernel : K(xi , xj ) = (γxiT xj + r)d , γ > 0
Polynomial kernel is very popular in Natural Language Processing. The most common degree
is d = 2 (quadratic), since larger degrees tend to overfit on NLP problems. It can be visualized
with the following diagram.
Polynomial Kernel

Department of Information Technology, SRKREC(A)

Artificial Intelligence and Data Science (AI & DS)

Home
Radial Basis Function Kernel
Radial basis function kernel is a general purpose kernel. It is used when we have no prior
knowledge about the data. The RBF kernel on two samples x and y is defined by the following
equation –
Radial Basis Function kernel

The following diagram demonstrates the SVM classification with RBF kernel.
SVM Classification with RBF kernel

Sigmoid kernel
Sigmoid kernel has its origin in neural networks. We can use it as the proxy for neural networks.
Sigmoid kernel is given by the following equation.
sigmoid kernel : k (x, y) = tanh(αxTy + c)
Sigmoid kernel can be visualized with the following diagram-
Sigmoid kernel

Department of Information Technology, SRKREC(A)

Artificial Intelligence and Data Science (AI & DS)

Home
Types of Classification Models: Binary, Multiclass, and Multilabel

Machine learning classification has numerous applications across various fields. It ranges from
spam detection in emails to medical diagnosis and sentiment analysis in customer reviews.

Binary Classification: The Foundation

Binary classification is a fundamental aspect of machine learning, categorizing data into two
distinct classes. This method is essential for tasks like email spam detection and medical
diagnostics. It provides a clear decision boundary, making it a cornerstone of many
applications.

Logistic regression is a widely used algorithm for binary classification. It determines the
probability of a sample falling into one of two classes. This approach is particularly effective
when a simple yes-or-no decision is necessary.

When evaluating binary classification models, several key metrics are crucial. Accuracy
measures the overall correctness, while precision focuses on true positives. Recall evaluates
the model's ability to identify all positive instances. The F1 score, a balanced measure,
combines precision and recall to assess model performance.

Metric Description Calculation

Ratio of true positives to True Positives / (True Positives + False

Precision
predicted positives Positives)

Ratio of true positives to True Positives / (True Positives + False

Recall
actual positives Negatives)

Harmonic mean of 2 * (Precision * Recall) / (Precision +

F1 Score
precision and recall Recall)

Despite its simplicity and efficiency, binary classification faces challenges with imbalanced
datasets and setting appropriate decision boundaries.

Department of Information Technology, SRKREC(A)

Artificial Intelligence and Data Science (AI & DS)

Home
Multiclass Classification: Expanding Possibilities

Multiclass classification elevates binary classification by categorizing data into three or more
classes. This method is essential for tackling complex problems that require more than simple
yes-or-no answers. It's a powerful tool for a variety of real-world applications.

Definition and Characteristics

In multiclass classification, each data point is assigned to one of several classes. Unlike binary
classification, which limits itself to two categories, multiclass models can manage multiple
distinct groups. This flexibility makes it a valuable asset for many tasks.

Common Use Cases for Multiclass Classification

Here are some key scenarios where multiclass classification excels:

• Handwritten digit recognition (0-9)

• Plant species identification

• Language detection in text

• Medical diagnosis across multiple conditions

Techniques for Multiclass Model Training

There are several methods for training multiclass models:

• One-vs-rest strategy: Trains a separate classifier for each class against all others

• One-vs-one approach: Creates binary classifiers for every pair of classes

• Softmax activation: Often used in neural networks to output probability distributions

across classes

These strategies facilitate the effective management of complex classification tasks. They make
multiclass classification a versatile tool in machine learning.

Department of Information Technology, SRKREC(A)

Artificial Intelligence and Data Science (AI & DS)

Home

Technique Description Pros Cons

Trains K Simple, works

One-vs-rest classifiers for K well for many Can be imbalanced
classes problems

Better for More

Trains K(K-1)/2
One-vs-one imbalanced computationally
classifiers
datasets expensive

Outputs
Softmax Direct multiclass Requires neural
probabilities for
activation approach network architecture
each class

Multi-label Classification: Handling Complex Scenarios

Multi-label classification is a method designed to handle complex scenarios where data points
can fit into multiple categories at once. Unlike traditional binary or multiclass models, this
technique allows for detailed categorization. It's particularly useful for tasks such as document
tagging and image annotation.

In this approach, each instance can be linked to several labels. For instance, a news article could
be classified under "politics," "economy," and "international affairs" simultaneously. This
flexibility is essential for real-world applications where items often possess multiple attributes
or fall into overlapping categories.

One common strategy in multi-label classification is binary relevance. This method breaks
the problem down into several binary classification tasks, one for each label. Although
straightforward, it might not fully capture the relationships between labels. Other strategies,
like label powerset and algorithm adaptation methods, focus on enhancing these relationships.

When evaluating multi-label models, specialized metrics are necessary. Hamming loss
measures the proportion of incorrectly predicted labels. Precision at k and recall at k evaluate
the model's performance for the top k predicted labels. These metrics are crucial for assessing
the model's accuracy in complex labeling scenarios.

Department of Information Technology, SRKREC(A)

Artificial Intelligence and Data Science (AI & DS)

Home
However, multi-label classification also presents challenges. Managing label
correlations and dealing with large label spaces can be complex. As the number of possible
label combinations increases, so does the computational complexity. Researchers are
continually exploring new techniques to address these challenges and enhance multi-label
classification performance across various domains.

MNIST

The MINST dataset stands for "Modified National Institute of Standards and Technology".
The dataset contains a large collection of handwritten digits that is commonly used for training
various image processing systems. The dataset was created by re-mixing samples from NIST's
original datasets, which were taken from American Census Bureau employees and high school
students. It is designed to help scientists develop and test machine learning algorithms in
pattern recognition and machine learning. It contains 60,000 training images and 10,000 testing
images, each of which is a grayscale image of size 28x28 pixels.

Structure of MNIST dataset

The MNIST dataset is a collection of 70,000 handwritten digits (0-9), with each image being
28x28 pixels. Here is the dataset information in the specified format:

• Number of Instances: 70,000 images

• Number of Attributes: 784 (28x28 pixels)

• Target: Column represents the digit (0-9) corresponding to the handwritten image

• Pixel 1-784: Each pixel value (0-255) represents the grayscale intensity of the
corresponding pixel in the image.

• The dataset is divided into two main subsets:

1. Training Set: Consists of 60,000 images along with their labels, commonly
used for training machine learning models.

2. Test Set: Contains 10,000 images with their corresponding labels, used for
evaluating the performance of trained models.

Origin of the MNIST Dataset

The MNIST dataset, which currently represents a primary input for many tasks in image
processing and machine learning, can be traced back to the National Institute of Standards and

Department of Information Technology, SRKREC(A)

Artificial Intelligence and Data Science (AI & DS)

Home
Technology (NIST). NIST, a US government agency focused on measurement science and
standards, curates various datasets, including two particularly relevant to handwritten digits:

• Special Database 1 (SD-1): Since being the Bureau of US census employees with
sizable population among the workplace was private handwritten data - they all came
from a desirable source. Census staff are seen handling written values on a repeat basis,
thus rendering their samples a high chance of success in algorithm training.

• Special Database 3 (SD-3): This data set contained digitized handwriting figures of
high-schoolers, provided by students. However, in terms of authenticity, this
information looked less "official" than the numbers provided by the Census Bureau, but
the great thing is that they applied in a variety of writing styles.

While these datasets existed, unfortunately, they could not be used directly and instead, they
had to be transformed and divided into specifically data for training and testing the AI models.
The separation between the two NIST collections created a potential bias:

• SD-1 was then kept aside as a teaching set. The AI problem can be attributed to the fact
that the technicians having more experience in writing the hand-written numbers. So
the model might go on to become overly biased towards such "clean" numbers.

• In SD-3 we assigned it to do the test runs. Without being exposed to more types of write
styles during training (if only from SD-1), the model may misguided on SD-3 testing.

To tackle this bias and get a more balanced data set for machine learning, the MNIST
developers used an original trick of combining characters from NIST Special databases and
symbols from a such font as Zapf Dingbats. By using this approach, the data used for both
training and testing became more inclusive of the wide range of alphabets used, thereby
resulting in more generally applicable data processing and machine learning models.

Methods to load MNIST dataset in Python

Loading the MNIST dataset in Python can be done in several ways, depending on the libraries
and tools you prefer to use. Below are some of the most common methods to load the MNIST
dataset using different Python libraries:

1. Loading the MNIST dataset using TensorFlow/Keras

2. Loading MNIST dataset using PyTorch

Department of Information Technology, SRKREC(A)

Artificial Intelligence and Data Science (AI & DS)

Home
Loading MNIST dataset using TensorFlow/Keras

This code snippet load mnist dataset keras example using Keras, retrieves the training
images and labels, and then plots four images in a row with their corresponding labels. Each
image is displayed in grayscale.

from tensorflow.keras.datasets import mnist

import matplotlib.pyplot as plt

import numpy as np

# Load the MNIST dataset

(X_train, y_train), (_, _) = mnist.load_data()

# Print 4 images in a row

plt.figure(figsize=(10, 5))

for i in range(4):

plt.subplot(1, 4, i+1)

plt.imshow(X_train[i], cmap='gray')

plt.title(f"Label: {y_train[i]}")

plt.axis('off')

plt.tight_layout()

plt.show()

Output:

Department of Information Technology, SRKREC(A)

Artificial Intelligence and Data Science (AI & DS)

Home
Loading MNIST dataset Using PyTorch
In this examples we will explore to load mnist dataset pytorch example. PyTorch offers a
similar utility through torchvision.datasets, which is very convenient, especially when
combined with torchvision.transforms to perform basic preprocessing like converting images
to tensor format.
import matplotlib.pyplot as plt
import torch
from torchvision import datasets, transforms
# Define the transformation to convert images to PyTorch tensors
transform = transforms.Compose([transforms.ToTensor()])
# Load the MNIST dataset with the specified transformation
mnist_pytorch = datasets.MNIST(root='./data', train=True, download=True,
transform=transform)
# Create a DataLoader to load the dataset in batches
train_loader_pytorch = torch.utils.data.DataLoader(mnist_pytorch, batch_size=1,
shuffle=False)
# Create a figure to display the images
plt.figure(figsize=(15, 3))
# Print the first few images in a row
for i, (image, label) in enumerate(train_loader_pytorch):
if i < 5: # Print the first 5 samples
plt.subplot(1, 5, i + 1)
plt.imshow(image[0].squeeze(), cmap='gray')
plt.title(f"Label: {label.item()}")
plt.axis('off')
else:
break # Exit the loop after printing 5 samples
plt.tight_layout()
plt.show()
Output:

Department of Information Technology, SRKREC(A)

Artificial Intelligence and Data Science (AI & DS)

Home
Applications of MNIST
While it's primarily used for educational purposes and in benchmarking algorithms in academic
studies, learning and experimenting with the MNIST dataset can also have practical
applications. MNIST dataset finds applications in the Banking Sector, Postal Services, and
Document Management:
1. Banking Sector
• Recognizing Handwritten Numbers on Checks: The banks are primarily
responsible for this role, namely, cashing the checks. MNIST is the core of
training numerical recognition systems in classifying the digits to identify the
ones or the amount on a check. Thus, this removes data entry, eliminates error,
and expedites check handling.
2. Postal Services
• Automating Postal Code Reading: Accurate parcel sorting and timely postal
delivery depend a lot on proper recognition of a postal code. The MNIST data
set is used to train an image recognition model recognizing zip codes on
envelopes regardless of varied hand writing quality and print quality. This
results in the rapid links of the sorting and postage which would ultimately
facilitate fast delivery hence reducing delays.
3. Document Management
• Digitizing Written Documents and Recognizing Numbers: A lot of them
have handwriting numbers too; these are the invoices, receipts, and forms.
MNIST can be applied in developing such systems that can perform operations
like extracting and recognizing those figures during the scan and digitization
process. The benefits of data entry automation are the opportunities to
streamline the process, simplify the data mining, and increase the documents
searchability.

Department of Information Technology, SRKREC(A)

Artificial Intelligence and Data Science (AI & DS)

Home
Ranking
Ranking in Machine Learning
Ranking is a type of machine learning problem where the goal is to predict the order or
preference of items rather than the exact label. It is commonly used in applications like search
engines, recommendation systems, and advertisements.
Types of Ranking Problems:
1. Pointwise Ranking:
o In pointwise ranking, each item is treated independently, and the model predicts
a score for each item. The items are then ranked based on their scores.
o Example: Ranking products based on their relevance to a user’s query.
2. Pairwise Ranking:
o In pairwise ranking, the model learns to predict the relative ranking of two
items. For each pair, the model predicts which item is preferred.
o Example: Ranking search results by learning which document is better between
two given documents.
o RankNet, SVMRank are popular pairwise ranking algorithms.
3. Listwise Ranking:
o Listwise ranking algorithms consider the entire list of items simultaneously and
predict the best order for all items.
o Example: Ranking search results where the whole list of results is considered to
maximize the quality of ranking as a whole.
o LambdaRank, LambdaMART are examples of listwise ranking methods.
Applications of Ranking:
• Search Engines: Ranking search results based on relevance to the query.
• Recommendation Systems: Ranking items like products, movies, or music based on a
user’s preferences.
• Ad Placement: Ranking ads based on predicted click-through rates to optimize
advertising revenue.

Department of Information Technology, SRKREC(A)

Think Stats 3rd Edition Early Release - Allen Downey
No ratings yet
Think Stats 3rd Edition Early Release - Allen Downey
97 pages
Unit 5 - DA - Classification & Clustering
No ratings yet
Unit 5 - DA - Classification & Clustering
105 pages
ML Unit 2 (Ab22)
No ratings yet
ML Unit 2 (Ab22)
61 pages
ML Unit 2
No ratings yet
ML Unit 2
24 pages
Determinants of Youth Unemployment in Uganda The R
100% (1)
Determinants of Youth Unemployment in Uganda The R
29 pages
Data Mining Unit-2
No ratings yet
Data Mining Unit-2
37 pages
Data Science Report
No ratings yet
Data Science Report
35 pages
DSV Ia2
No ratings yet
DSV Ia2
18 pages
Credit Scoring, Statistical Techniques and Evaluation Criteria: A Review of The Literature
No ratings yet
Credit Scoring, Statistical Techniques and Evaluation Criteria: A Review of The Literature
41 pages
CMTH642 - Module 10.2 - Classification
No ratings yet
CMTH642 - Module 10.2 - Classification
10 pages
IML-IITKGP - Assignment 5 Solution
No ratings yet
IML-IITKGP - Assignment 5 Solution
7 pages
ML-Module 3
No ratings yet
ML-Module 3
64 pages
Data Sciene - Unit 5 Material
No ratings yet
Data Sciene - Unit 5 Material
15 pages
K-Nearest Neighbor (KNN) Algorithm For Machine Learning - Javatpoint
No ratings yet
K-Nearest Neighbor (KNN) Algorithm For Machine Learning - Javatpoint
18 pages
12 ML KNN
No ratings yet
12 ML KNN
28 pages
Multiple Regression Analysis
No ratings yet
Multiple Regression Analysis
13 pages
Landsat: Assessing Using Discrete Statistical
No ratings yet
Landsat: Assessing Using Discrete Statistical
8 pages
Machine Learning Unit 3
No ratings yet
Machine Learning Unit 3
40 pages
ML-UNIT-IV - Complete
No ratings yet
ML-UNIT-IV - Complete
42 pages
CH 2
No ratings yet
CH 2
30 pages
Ue21cs352a 20230830121009
No ratings yet
Ue21cs352a 20230830121009
42 pages
Machine Learning Module-03
No ratings yet
Machine Learning Module-03
24 pages
10 11648 J Ijebo 20221002 12
No ratings yet
10 11648 J Ijebo 20221002 12
16 pages
Why Do We Need A K-NN Algorithm?
No ratings yet
Why Do We Need A K-NN Algorithm?
11 pages
ANZ Analyst Interview Question
100% (1)
ANZ Analyst Interview Question
22 pages
Financial Literacy, Financial Education and Downstream Financial Behaviors
No ratings yet
Financial Literacy, Financial Education and Downstream Financial Behaviors
103 pages
Distance-Based Methods - KNN
No ratings yet
Distance-Based Methods - KNN
8 pages
No Name - Heart Rate Predictor Heart Failure Journal Reading Kardio Ugm
No ratings yet
No Name - Heart Rate Predictor Heart Failure Journal Reading Kardio Ugm
23 pages
Pset 8 Solutions PDF
No ratings yet
Pset 8 Solutions PDF
41 pages
Machine Learning
No ratings yet
Machine Learning
5 pages
Unit 1
No ratings yet
Unit 1
36 pages
Machine Learning: An Applied Econometric Approach: Sendhil Mullainathan and Jann Spiess
No ratings yet
Machine Learning: An Applied Econometric Approach: Sendhil Mullainathan and Jann Spiess
38 pages
1 KNN-Algo
No ratings yet
1 KNN-Algo
27 pages
Unit 2 ML
No ratings yet
Unit 2 ML
89 pages
ML Unit 3
No ratings yet
ML Unit 3
12 pages
Unit V: Distance and Rule Based Models
No ratings yet
Unit V: Distance and Rule Based Models
56 pages
ML Unit-2
No ratings yet
ML Unit-2
138 pages
ML CH 3
No ratings yet
ML CH 3
88 pages
An Analysis of Nba Spatio Temporal Data
No ratings yet
An Analysis of Nba Spatio Temporal Data
44 pages
2.unit 2 ML Q&A
No ratings yet
2.unit 2 ML Q&A
36 pages
ML Unit V
No ratings yet
ML Unit V
10 pages
Beyond Panaceas in Water Institutions
No ratings yet
Beyond Panaceas in Water Institutions
6 pages
Project
No ratings yet
Project
16 pages
JNTUK R20 B.tech CSE 3-2 Machine Learning Unit 2 Notes
No ratings yet
JNTUK R20 B.tech CSE 3-2 Machine Learning Unit 2 Notes
33 pages
MLT by Engineering Express
No ratings yet
MLT by Engineering Express
94 pages
Machine Learning To Develop Credit Card Customer Churn Prediction
No ratings yet
Machine Learning To Develop Credit Card Customer Churn Prediction
14 pages
Bda Review
No ratings yet
Bda Review
13 pages
QRM2 C4
No ratings yet
QRM2 C4
58 pages
K-Nearest Neighbors
No ratings yet
K-Nearest Neighbors
35 pages
ML Lecture 13 KNN
No ratings yet
ML Lecture 13 KNN
14 pages
ML Unit 2
No ratings yet
ML Unit 2
46 pages
ML Unit-2 (CEC)
No ratings yet
ML Unit-2 (CEC)
96 pages
Financial Analytics - BA Presentation Final
No ratings yet
Financial Analytics - BA Presentation Final
19 pages
Introduction To KNN
100% (1)
Introduction To KNN
8 pages
Jntuk R20 ML Unit-Ii
No ratings yet
Jntuk R20 ML Unit-Ii
37 pages
Jntuk r20 ML Unit-II
No ratings yet
Jntuk r20 ML Unit-II
33 pages
Interview Questions
No ratings yet
Interview Questions
26 pages
Unit Ii
No ratings yet
Unit Ii
102 pages
Optimization Transfer Algorithms in Statistics
No ratings yet
Optimization Transfer Algorithms in Statistics
29 pages
CH 04 Classification Techniques
No ratings yet
CH 04 Classification Techniques
89 pages
Unit 2
No ratings yet
Unit 2
20 pages
Packages Stata
No ratings yet
Packages Stata
30 pages
m3 Final-1
No ratings yet
m3 Final-1
171 pages
ML Unit-1
No ratings yet
ML Unit-1
43 pages
Dunn RandomizedQuantileResiduals 1996
No ratings yet
Dunn RandomizedQuantileResiduals 1996
10 pages
ML Unit2
No ratings yet
ML Unit2
38 pages
ML Unit - 2
No ratings yet
ML Unit - 2
85 pages
Human Resources Analytics Case Study Assignment
No ratings yet
Human Resources Analytics Case Study Assignment
4 pages
ML 2
No ratings yet
ML 2
6 pages
ML Unit-2
No ratings yet
ML Unit-2
33 pages
Chapter#10 (Part#01) SL (K-NN)
No ratings yet
Chapter#10 (Part#01) SL (K-NN)
27 pages
AIML-Unit 4 Notes-Assignment 4
No ratings yet
AIML-Unit 4 Notes-Assignment 4
21 pages
Lecture Week 2 KNN and Model Evaluation PDF
100% (1)
Lecture Week 2 KNN and Model Evaluation PDF
53 pages
CSE445 NSU Week - 5
No ratings yet
CSE445 NSU Week - 5
26 pages
Machine Learning-Breastfeeding
No ratings yet
Machine Learning-Breastfeeding
15 pages
U3 KNN
No ratings yet
U3 KNN
6 pages
2EL1730-ML-Lecture04-Non Parametric Learning and Nearest Neighbor
No ratings yet
2EL1730-ML-Lecture04-Non Parametric Learning and Nearest Neighbor
47 pages
Sayan Das - Machine Learning
No ratings yet
Sayan Das - Machine Learning
4 pages
Assessing A Single Classification Algorithm and Two Classification Algorithms
No ratings yet
Assessing A Single Classification Algorithm and Two Classification Algorithms
12 pages
Introduction To AI and ML - UNIT 4
No ratings yet
Introduction To AI and ML - UNIT 4
29 pages
KNN
No ratings yet
KNN
53 pages
KNN - Algorithm - SVM - Algorithm
No ratings yet
KNN - Algorithm - SVM - Algorithm
27 pages
Assignment 3 B
No ratings yet
Assignment 3 B
7 pages
CH5 Data Mining Classification Prepared by Dr. Maher Abuhamdeh
No ratings yet
CH5 Data Mining Classification Prepared by Dr. Maher Abuhamdeh
61 pages
Supervised Learning and K Nearest Neighbors: Business Intelligence For Managers
No ratings yet
Supervised Learning and K Nearest Neighbors: Business Intelligence For Managers
15 pages
Week 07
No ratings yet
Week 07
24 pages
IV Distance and Rule Based Models 4.1 Distance Based Models
No ratings yet
IV Distance and Rule Based Models 4.1 Distance Based Models
45 pages
Data Mining: Kabith Sivaprasad (BE/1234/2009) Rimjhim (BE/1134/2009) Utkarsh Ahuja (BE/1226/2009)
No ratings yet
Data Mining: Kabith Sivaprasad (BE/1234/2009) Rimjhim (BE/1134/2009) Utkarsh Ahuja (BE/1226/2009)
32 pages
Assignment 1
No ratings yet
Assignment 1
1 page
Decision Tree KNN
No ratings yet
Decision Tree KNN
9 pages
KNN (K Nearest Neighbor)
No ratings yet
KNN (K Nearest Neighbor)
21 pages
29 K-Nearest Neighbor and Summing Up The End-To-End Workflow
No ratings yet
29 K-Nearest Neighbor and Summing Up The End-To-End Workflow
6 pages
6 - KNN Classifier
No ratings yet
6 - KNN Classifier
10 pages