0% found this document useful (0 votes)
9 views5 pages

DSUP Exp6

The document presents an experiment comparing Support Vector Machines (SVM) and Random Forest Classifier using the Iris dataset to predict flower species. It discusses the theoretical background of machine learning, the characteristics of both algorithms, and their performance outcomes, concluding that SVM generally performs better on this specific dataset. The choice between the two algorithms depends on the dataset's nature, interpretability needs, and available computational resources.

Uploaded by

Chetan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views5 pages

DSUP Exp6

The document presents an experiment comparing Support Vector Machines (SVM) and Random Forest Classifier using the Iris dataset to predict flower species. It discusses the theoretical background of machine learning, the characteristics of both algorithms, and their performance outcomes, concluding that SVM generally performs better on this specific dataset. The choice between the two algorithms depends on the dataset's nature, interpretability needs, and available computational resources.

Uploaded by

Chetan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Name: Nikhil Namade

Roll No. A16


ID - TU4F2223015

Experiment No. 6

AIM:
Implement and compare any one case study using SVM classifier and
Random Forest Classifier with web deployment

Theory:
Machine Learning is the field of study that gives computers the capability to
learn without being explicitly programmed. ML is one of the most exciting
technologies that one would have ever come across. As it is evident from
the name, it gives the computer that makes it more similar to humans: The
ability to learn. Machine learning is actively being used today, perhaps in
many more places than one would expect.
Machine Learning is an essential skill for any aspiring data analyst and data
scientist, and also for those who wish to transform a massive amount of raw
data into trends and predictions.
Supervised learning is the types of machine learning in which machines are
trained using well "labelled" training data, and on basis of that data,
machines predict the output. The labelled data means some input data is
already tagged with the correct output. In supervised learning, the training
data provided to the machines work as the supervisor that teaches the
machines to predict the output correctly.
Supervised learning is a process of providing input data as well as correct
output data to the machine learning model. The aim of a supervised
learning algorithm is to find a mapping function to map the input
variable(x) with the output variable(y).

Fortunately, with libraries such as Scikit Learn, it’s now easy to study
structured or unstructured data using scientific methods, algorithms and
systems to extract knowledge.

Here we are going to discuss two of the most popular algorithms — Support
Vector Machines abbreviated as SVMs and Random Forests.

SUPPORT VECTOR MACHINES:


Name: Nikhil Namade
Roll No. A16
ID - TU4F2223015

Support Vector Machine is a supervised learning model which can be used


for both classification or regression challenges. However, it is mostly used
in classification problems where the data is sparse (easy to classify). We
perform classification by finding the hyper-plane that differentiates
between the two classes very well .

RANDOM FOREST:
Random Forest is also one of the most used algorithms in machine learning.
It can be used for both classification and regression tasks. The “forest” it
builds, is an ensemble of decision trees, usually trained with the “bagging”
method. The general idea of the bagging method is to create a combination
of learning models which improves the overall result. Basically, Random
forest uses multiple decision trees and merges them together to get an
accurate and stable prediction.

Support Vector Machines


Aspect Random Forests
(SVM)
Model Type Discriminative model Ensemble model (Bagging)
Constructs multiple decision
Finds optimal separating
Algorithm trees and combines their
hyperplane
outputs
Less interpretable (black More interpretable (can
Interpretability
box) extract feature importances)
Handling Can handle nonlinear
Inherently handles nonlinear
relationships using kernel
Nonlinearity relationships
tricks
Handling Outliers Sensitive to outliers More robust to outliers
Feature No direct feature Provides feature importance
Importance importance measure scores
Less prone to overfitting for Less prone to overfitting due
Overfitting
nonlinear kernels to ensemble approach
Does not require scaling of
Scaling Requires scaling of features
features
Parallelization possible for
Parallelization Limited parallelization
training and prediction
Multiclass Handles multiclass
Handles multiclass problems
problems using one-vs-one
Problems natively
or one-vs-rest
Scales poorly with large
Scales well with number of
Memory Usage number of samples and
samples
features
Name: Nikhil Namade
Roll No. A16
ID - TU4F2223015

Kernel type, regularization Number of trees, max depth,


Hyperparameters
parameter, gamma max features, etc.

Implementation:
Dataset:
We are going to discuss SVM VS Random forests by taking an example of Iris
dataset (data of flowers). Here we have to predict the species of the flower
with certain features, namely, sepal width, sepal length, petal width and petal
length.

Code:

Random Forest Classifier:

SVM
Classifier:
Name: Nikhil Namade
Roll No. A16
ID - TU4F2223015

Output:

Accuracy of Random Forest Classifier:

Accuracy of SVM Classifier:


Name: Nikhil Namade
Roll No. A16
ID - TU4F2223015

Conclusion:

It’s because in this dataset, data is sparse and easy to classify, hence SVM
works faster and provides better results. However, random forest also gives
good results but does not match SVM for this particular dataset. The choice
of algorithm depends upon the desired outcome. Although both of the
models are good at their place, but, it very much depends upon the quality
of data when it comes to algorithm’s performance. The choice between SVM
and Random Forest depends on the specific requirements of your project,
including the nature of your dataset, the importance of interpretability, and
the computational resources available. In some cases, SVMs may
outperform Random Forests, especially in tasks that require a clear
separation between classes or when interpretability is crucial. Conversely,
Random Forests may be more suitable for tasks with a large number of
features or when dealing with complex, non-linearly separable data.

You might also like