0% found this document useful (0 votes)
11 views23 pages

ICT202B AI ML and Emerging Technologies UNIT 3 (Classification and Regression) 2

The document outlines a course on AI, ML, and emerging technologies, focusing on classification and regression techniques. It covers various machine learning algorithms, including K-Nearest Neighbors, Naive Bayes, and Support Vector Machines, along with regression analysis methods like linear and polynomial regression. The course aims to equip students with skills in data analysis, programming in Python, and applying machine learning to biological data.

Uploaded by

rachitmadhal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views23 pages

ICT202B AI ML and Emerging Technologies UNIT 3 (Classification and Regression) 2

The document outlines a course on AI, ML, and emerging technologies, focusing on classification and regression techniques. It covers various machine learning algorithms, including K-Nearest Neighbors, Naive Bayes, and Support Vector Machines, along with regression analysis methods like linear and polynomial regression. The course aims to equip students with skills in data analysis, programming in Python, and applying machine learning to biological data.

Uploaded by

rachitmadhal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 23

ICT202B: AI, ML and Emerging technologies

UNIT 3 Classification & Regression


AI, ML and Emerging technologies
Course Outcome: Credit : 02

CO1 :: Remember the concepts of data analysis in genomics and proteomics

CO2 :: Discuss the advanced topics of Python language used for programming

CO3 :: Apply neural networks for medical diagnosis by medical image classification

CO4 :: Analyze clustering for unlabeled data for training

CO5 :: Evaluate the concepts of machine learning

CO6 :: Validate the application of machine learning for biological data analysis
UNIT I
Data & Feature Engineering : Data vs information, types of data: numerical data (discrete and continuous),
categorical data (ordinal and nominal data), time series data, unstructured data, data labelling, What is feature,
importance of feature selection, feature selection algorithms, Sequential forward selection, sequential backward
selection, bidirectional feature selection, feature extraction

UNIT II

Advanced Python packages : introduction to numpy, creation and accessing of nD arrays, operations on nD
arrays, introduction to pandas, data-frame, reading csv/excel data, Dimensionality reduction, PCA and LDA,
visualization using Matplot-lib, line plot, subplots, scatter plot, bar graph, histogram, pie chart

UNIT II

Classification & Regression : Introduction to classification, KNN, Decision Tree, Naive Bayes classifier, Support
Vector Machine classifier, classification on a given dataset, Introduction to regression, linear regression,
Polynomial regression, regression on a given dataset
Machine Learning
Introduction to classification
(K-NN) algorithm is a versatile and widely used machine learning
algorithm that is primarily used for its simplicity and ease of
implementation.

• It does not require any assumptions about the underlying data distribution.
• It can also handle both numerical and categorical data, making it a flexible
choice for various types of datasets in classification and regression tasks.
• It is a non-parametric method that makes predictions based on the
similarity of data points in a given dataset.
• K-NN is less sensitive to outliers compared to other algorithms.
Introduction to classification
Naive Bayes classifier

A Naive Bayes classifiers, a family of algorithms based on Bayes’ Theorem.


Despite the “naive” assumption of feature independence, these classifiers are
widely utilized for their simplicity and efficiency in machine learning.

Naive Bayes classifiers are a collection of classification algorithms based on


Bayes’ Theorem. It is not a single algorithm but a family of algorithms where all
of them share a common principle, i.e. every pair of features being classified is
independent of each other.
Basically, we are trying to find probability of event A, given the event B is true. Event B is also termed as evidence.
• P(A) is the priori of A (the prior probability, i.e. Probability of event before evidence is seen). The evidence is an
attribute value of an unknown instance(here, it is event B).
• P(B) is Marginal Probability: Probability of Evidence.
• P(A|B) is a posteriori probability of B, i.e. probability of event after evidence is seen.
• P(B|A) is Likelihood probability i.e the likelihood that a hypothesis will come true based on the evidence.
Support Vector Machines (SVM)

• In SVM, the line that is used to separate the classes is


referred to as hyperplane. The data points on either side
of the hyperplane that are closest to the hyperplane are
called Support Vectors which is used to plot the
boundary line.
• In SVM Classification, the data can be either linear or
non-linear. There are different kernels that can be set in
an SVM Classifier. For a linear dataset, we can set the
kernel as ‘linear’.
• On the other hand, for a non-linear dataset, there are two
kernels, namely ‘rbf’ and ‘polynomial’. In this, the data
is mapped to a higher dimension which makes it easier to
draw the hyperplane. Afterwards, it is brought down to
the lower dimension.
From the above diagram, we can see that there are two classes of shapes, rectangle and circle. As it is difficult
to draw a SVM line in the 2D Plane, we map the data points to a higher dimension (3D Plane) and then draw
the hyperplane. It is then brought down to the original plane with the SVM Classifier drawn in red color.
Introduction to regression

Regression is a statistical method that tries to determine the strength and


character of the relationship between one dependent variable and a
series of other variables.

or

Regression analysis is a set of statistical operations to estimate the


relationships between an independent variable (X) and a dependent
variables (Y)
Regression Line is defined as a
statistical concept that facilitates and
predicts the relationship between two or
more variables. A regression line is a
straight line that reflects the best-fit
connection in a dataset between
independent and dependent variables.

The equation of a simple linear regression line is given by:


Y = a + bX + ε
Here,
•Y is the dependent variable
•X is the independent variable
•a is the y-intercept, which represents the value of Y when X is 0.
•b is the slope, which represents the change in Y for a unit change in X
•ε is residual error.
Polynomial regression

Polynomial regression is a statistical and machine learning technique that


models the relationship between variables using higher-degree
polynomials.

Polynomial regression uses higher-degree functions of the independent


variable, like squares and cubes, to fit the data.
When to use it
Polynomial regression is useful when there is a non-linear relationship between the
variables, like when predicting how many likes a social media post will get over time.

Model complexity
As the degree of the model increases, its performance may improve, but it also increases
the risk of over-fitting or under-fitting the data.

Model selection
There are two approaches to choosing the order of a polynomial model: forward selection
and backward elimination.

Bias and variance


A model with high bias is unable to capture patterns in the data, while a model with high
variance passes through most of the data points. Ideally, a model should have low bias and
low variance, but this is usually not possible in practice.

You might also like