0% found this document useful (0 votes)
18 views

Support Vector Machines - Problem - Statement

The document discusses support vector machines (SVM), a supervised machine learning algorithm. It provides instructions for a student to build an SVM model on a given housing dataset and analyze the results. Code snippets in R are provided to load and clean the data, train SVM models with different kernels, and evaluate the model performance on test data using metrics like accuracy.

Uploaded by

Dathu Gurram
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

Support Vector Machines - Problem - Statement

The document discusses support vector machines (SVM), a supervised machine learning algorithm. It provides instructions for a student to build an SVM model on a given housing dataset and analyze the results. Code snippets in R are provided to load and clean the data, train SVM models with different kernels, and evaluate the model performance on test data using metrics like accuracy.

Uploaded by

Dathu Gurram
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 15

Topic: Support Vector Machine (SVM)

Instructions
Please share your answers filled inline in the word document. Submit Python code and R
code files wherever applicable.

Please ensure you update all the details:


Name: GURRAM DATHU SWAMY
Batch Id: DS_08032021
Topic: Support Vector Machines.

1. Business Problem
1.1. Objective
1.2. Constraints (if any)

2. Work on each feature of the dataset to create a data dictionary as displayed in the below
image:

2.1 Make a table as shown above and provide information about the features such as its Data type
and its relevance to the model building, if not relevant provide reasons and provide description of the
feature.
Using R and Python codes perform:
3. Data Pre-processing
3.1 Data Cleaning, Feature Engineering, etc.
3.2 Outlier Imputation

4. Exploratory Data Analysis (EDA):


4.1. Summary
4.2. Univariate analysis
4.3. Bivariate analysis

5. Model Building
5.1 Build the model on the scaled data (try multiple options)
5.2 Perform Support Vector Machines.
5.3 Train and Test the data and compare accuracies by Confusion Matrix and use
different Hyper Parameters
5.4 Briefly explain the model output in the documentation

6. Share the benefits/impact of the solution - how or in what way the business (client) gets
benefit from the solution provided

Note:
The assignment should be submitted in the following format:

 R code
 Python code
 Code Modularization should be maintained
 Documentation of the model building (elaborating on steps mentioned above)

Problem Statement: -

A construction firm wants to develop a suburban locality with new infrastructure but they are faced
with a challenge of incurring losses if they cannot sell the properties. To overcome this, they consult
an analytics firm and would like to get insights on how densely the area is populated and different
level of income group people reside. You as a Data Scientist perform Support Vector Machines
Algorithm on the given dataset and bring out informative insights and also comment on if its viable
for investment in that area.
R-code:
#####Support Vector Machines

# Load the Dataset


salarydata_test <- read.csv(file.choose(), stringsAsFactors = TRUE)
salarydata_train <- read.csv(file.choose(), stringsAsFactors = TRUE)

summary(salarydata_test)
summary(salarydata_train)

# Training a model on the data ----


# Begin by training a simple linear SVM
install.packages("kernlab")
library(kernlab)

salarydatatest_classifier <- ksvm(Salary ~ ., data = salarydata_test, kernel =


"vanilladot")
salarydatatrain_classifier <- ksvm(Salary ~ ., data = salarydata_train, kernel =
"vanilladot")

## Evaluating model performance ----


# predictions on testing dataset
salarytest_predictions <- predict(salarydatatest_classifier, salarydata_test)

salarytrain_predictions <- predict(salarydatatrain_classifier, salarydata_train)

###for test data###


table(salarytest_predictions, salarydata_test$Salary)
agreement <- salarytest_predictions == salarydata_test$Salary
table(agreement)
prop.table(table(agreement))

###for train data###

table(salarytrain_predictions, salarydata_train$Salary)

© 2013 - 2020 360DigiTMG. All Rights Reserved.


agreement1 <- salarytrain_predictions == salarydata_test$Salary
table(agreement1)
prop.table(table(agreement1))

## Improving model performance----


salarydatatest_classifier_rbf <- ksvm(Salary ~ ., data = salarydata_train, kernel =
"rbfdot")
salarytest_predictions_rbf <- predict(salarydatatest_classifier_rbf, salarydata_test)
agreement_rbf <- salarytest_predictions_rbf == salarydata_test$Salary
table(agreement_rbf)
prop.table(table(agreement_rbf))

salarydatatrain_classifier_rbf <- ksvm(Salary ~ ., data = salarydata_test, kernel =


"rbfdot")
salarytrain_predictions_rbf <- predict(salarydatatrain_classifier_rbf, salarydata_train)
agreement_rbf1 <- salarytrain_predictions_rbf == salarydata_train$Salary
table(agreement_rbf1)
prop.table(table(agreement_rbf1))

Output:
> # Load the Dataset
> salarydata_test <- read.csv(file.choose(), stringsAsFactors = TRUE)
> salarydata_train <- read.csv(file.choose(), stringsAsFactors = TRUE)
> summary(salarydata_test)
age workclass education educationno
Min. :17.00 Federal-gov : 463 HS-grad :4943 Min. : 1.00
1st Qu.:28.00 Local-gov : 1033 Some-college:3221 1st Qu.: 9.00
Median :37.00 Private :11021 Bachelors :2526 Median :10.00
Mean :38.77 Self-emp-inc : 572 Masters : 887 Mean :10.11
3rd Qu.:48.00 Self-emp-not-inc: 1297 Assoc-voc : 652 3rd Qu.:13.00
Max. :90.00 State-gov : 667 11th : 571 Max. :16.00
Without-pay : 7 (Other) :2260
maritalstatus occupation relationship
Divorced :2083 Exec-managerial:1992 Husband :6203
Married-AF-spouse : 11 Craft-repair :1990 Not-in-family :3976
Married-civ-spouse :6990 Prof-specialty :1970 Other-relative: 460
Married-spouse-absent: 182 Sales :1824 Own-child :2160

© 2013 - 2020 360DigiTMG. All Rights Reserved.


Never-married :4872 Adm-clerical :1819 Unmarried :1576
Separated : 472 Other-service :1596 Wife : 685
Widowed : 450 (Other) :3869
race sex capitalgain capitalloss
Amer-Indian-Eskimo: 149 Female: 4913 Min. : 0 Min. : 0.00
Asian-Pac-Islander: 408 Male :10147 1st Qu.: 0 1st Qu.: 0.00
Black : 1411 Median : 0 Median : 0.00
Other : 122 Mean : 1120 Mean : 89.04
White :12970 3rd Qu.: 0 3rd Qu.: 0.00
Max. :99999 Max. :3770.00

hoursperweek native Salary


Min. : 1.00 United-States:13788 <=50K:11360
1st Qu.:40.00 Mexico : 293 >50K : 3700
Median :40.00 Philippines : 95
Mean :40.95 Puerto-Rico : 66
3rd Qu.:45.00 Germany : 65
Max. :99.00 Canada : 56
(Other) : 697
> summary(salarydata_train)
age workclass education educationno
Min. :17.00 Federal-gov : 943 HS-grad :9840 Min. : 1.00
1st Qu.:28.00 Local-gov : 2067 Some-college:6677 1st Qu.: 9.00
Median :37.00 Private :22285 Bachelors :5044 Median :10.00
Mean :38.44 Self-emp-inc : 1074 Masters :1627 Mean :10.12
3rd Qu.:47.00 Self-emp-not-inc: 2499 Assoc-voc :1307 3rd Qu.:13.00
Max. :90.00 State-gov : 1279 11th :1048 Max. :16.00
Without-pay : 14 (Other) :4618
maritalstatus occupation relationship
Divorced : 4214 Prof-specialty :4038 Husband :12463
Married-AF-spouse : 21 Craft-repair :4030 Not-in-family : 7726
Married-civ-spouse :14065 Exec-managerial:3992 Other-relative: 888
Married-spouse-absent: 370 Adm-clerical :3721 Own-child : 4466
Never-married : 9725 Sales :3584 Unmarried : 3212
Separated : 939 Other-service :3212 Wife : 1406
Widowed : 827 (Other) :7584
race sex capitalgain capitalloss
Amer-Indian-Eskimo: 286 Female: 9781 Min. : 0 Min. : 0.0
Asian-Pac-Islander: 895 Male :20380 1st Qu.: 0 1st Qu.: 0.0

© 2013 - 2020 360DigiTMG. All Rights Reserved.


Black : 2817 Median : 0 Median : 0.0
Other : 231 Mean : 1092 Mean : 88.3
White :25932 3rd Qu.: 0 3rd Qu.: 0.0
Max. :99999 Max. :4356.0

hoursperweek native Salary


Min. : 1.00 United-States:27504 <=50K:22653
1st Qu.:40.00 Mexico : 610 >50K : 7508
Median :40.00 Philippines : 188
Mean :40.93 Germany : 128
3rd Qu.:45.00 Puerto-Rico : 109
Max. :99.00 Canada : 107
(Other) : 1515
> # Training a model on the data ----
> # Begin by training a simple linear SVM
> install.packages("kernlab")
Error in install.packages : Updating loaded packages

Restarting R session...

> install.packages("kernlab")
WARNING: Rtools is required to build R packages but is not currently installed. Please
download and install the appropriate version of Rtools before proceeding:

https://cran.rstudio.com/bin/windows/Rtools/
Installing package into ‘C:/Users/HP/Documents/R/win-library/4.0’
(as ‘lib’ is unspecified)
trying URL 'https://cran.rstudio.com/bin/windows/contrib/4.0/kernlab_0.9-29.zip'
Content type 'application/zip' length 2849843 bytes (2.7 MB)
downloaded 2.7 MB

package ‘kernlab’ successfully unpacked and MD5 sums checked

The downloaded binary packages are in


C:\Users\HP\AppData\Local\Temp\Rtmpu0W708\downloaded_packages
> library(kernlab)
> View(salarydata_test)
> View(salarydata_test)
> View(salarydata_test)

© 2013 - 2020 360DigiTMG. All Rights Reserved.


> View(salarydata_test)
> salarydatatest_classifier <- ksvm(Salary ~ ., data = salarydata_test, kernel =
"vanilladot")
Setting default kernel parameters
> salarydatatrain_classifier <- ksvm(Salary ~ ., data = salarydata_train, kernel =
"vanilladot")
Setting default kernel parameters
> ## Evaluating model performance ----
> # predictions on testing dataset
> salarytest_predictions <- predict(salarydatatest_classifier, salarydata_test)
> salarytrain_predictions <- predict(salarydatatrain_classifier, salarydata_test)
> salarytrain_predictions <- predict(salarydatatrain_classifier, salarydata_train)
> table(salarytest_predictions, salarydata_test$Salary)

salarytest_predictions <=50K >50K


<=50K 10612 1541
>50K 748 2159
> table(salarytest_predictions, salarydata_test$Salary)

salarytest_predictions <=50K >50K


<=50K 10612 1541
>50K 748 2159
> table(salarytest_predictions, salarydata_test$Salary)

salarytest_predictions <=50K >50K


<=50K 10612 1541
>50K 748 2159
> agreement <- salarytest_predictions == salarydata_test$Salary
> table(agreement)
agreement
FALSE TRUE
2289 12771
> prop.table(table(agreement))
agreement
FALSE TRUE
0.151992 0.848008
> table(salarytrain_predictions, salarydata_train$Salary)

salarytrain_predictions <=50K >50K


<=50K 21191 3114
© 2013 - 2020 360DigiTMG. All Rights Reserved.
>50K 1462 4394
> agreement1 <- salarytrain_predictions == salarydata_test$letter
> table(agreement1)
< table of extent 0 >
> prop.table(table(agreement1))
numeric(0)
> table(salarytrain_predictions, salarydata_train$Salary)

salarytrain_predictions <=50K >50K


<=50K 21191 3114
>50K 1462 4394
> agreement1 <- salarytrain_predictions == salarydata_test$Salary
Warning messages:
1: In `==.default`(salarytrain_predictions, salarydata_test$Salary) :
longer object length is not a multiple of shorter object length
2: In is.na(e1) | is.na(e2) :
longer object length is not a multiple of shorter object length
> table(agreement1)
agreement1
FALSE TRUE
10357 19804
> prop.table(table(agreement1))
agreement1
FALSE TRUE
0.3433905 0.6566095
> ## Improving model performance----
> salarydatatest_classifier_rbf <- ksvm(Salary ~ ., data = salarydata_train, kernel =
"rbfdot")
> salarytest_predictions_rbf <- predict(salarydatatest_classifier_rbf, salarydata_test)
> agreement_rbf <- salarytest_predictions_rbf == salarydata_test$Salary
> table(agreement_rbf)
agreement_rbf
FALSE TRUE
2197 12863
> prop.table(table(agreement_rbf))
agreement_rbf
FALSE TRUE
0.1458831 0.8541169
> salarydatatrain_classifier_rbf <- ksvm(Salary ~ ., data = salarydata_test, kernel =
"rbfdot")
© 2013 - 2020 360DigiTMG. All Rights Reserved.
> salarytrain_predictions_rbf <- predict(salarydatatrain_classifier_rbf, salarydata_train)
> agreement_rbf1 <- salarytrain_predictions_rbf == salarydata_train$Salary
> table(agreement_rbf1)
agreement_rbf1
FALSE TRUE
4517 25644
> prop.table(table(agreement_rbf1))
agreement_rbf1
FALSE TRUE
0.1497629 0.8502371

Inferences: For salary train data and test data rbfdot kernel gives more accuracy than
vanilladot kernel.

Problem Statement: -
In California, annual forest fires can cause huge loss of wild life, human life and
property damage can skyrocket in billions. Local officials would like to predict the size
burned area in forest fires annually so that they can be better prepared in future
calamities.
Build a Support Vector Machines algorithm on the dataset and share your insights on it
in the documentation.
Note: - Size_ category is the output variable.

© 2013 - 2020 360DigiTMG. All Rights Reserved.


R-code:

#####Support Vector Machines

# Load the Dataset


forestfires <- read.csv(file.choose(), stringsAsFactors = TRUE)

summary(forestfires)

# Partition Data into train and test data


forestfires_train <- forestfires[1:413, ]
forestfires_test <- forestfires[414:517, ]

# Training a model on the data ----


# Begin by training a simple linear SVM
install.packages("kernlab")
library(kernlab)

forestfires_classifier <- ksvm(size_category ~ ., data = forestfires_train, kernel = "vanilladot")


?ksvm

## Evaluating model performance ----


# predictions on testing dataset
forestfires_predictions <- predict(forestfires_classifier, forestfires_test)

table(forestfires_predictions, forestfires_test$size_category)
agreement <- forestfires_predictions == forestfires_test$size_category
table(agreement)
prop.table(table(agreement))

## Improving model performance ----


forestfires_classifier_rbf <- ksvm(size_category ~ ., data = forestfires_train, kernel = "rbfdot")
forestfires_predictions_rbf <- predict(forestfires_classifier_rbf, forestfires_test)
agreement_rbf <- forestfires_predictions_rbf == forestfires_test$size_category
table(agreement_rbf)
prop.table(table(agreement_rbf))

Output:

> # Load the Dataset


> salarydata_test <- read.csv(file.choose(), stringsAsFactors = TRUE)
Error in file.choose() : file choice cancelled
> # Load the Dataset
> forestfires <- read.csv(file.choose(), stringsAsFactors = TRUE)

© 2013 - 2020 360DigiTMG. All Rights Reserved.


> summary(forestfires)
month day FFMC DMC DC ISI
aug :184 fri:85 Min. :18.70 Min. : 1.1 Min. : 7.9 Min. : 0.000
sep :172 mon:74 1st Qu.:90.20 1st Qu.: 68.6 1st Qu.:437.7 1st Qu.: 6.500
mar : 54 sat:84 Median :91.60 Median :108.3 Median :664.2 Median : 8.400
jul : 32 sun:95 Mean :90.64 Mean :110.9 Mean :547.9 Mean : 9.022
feb : 20 thu:61 3rd Qu.:92.90 3rd Qu.:142.4 3rd Qu.:713.9 3rd Qu.:10.800
jun : 17 tue:64 Max. :96.20 Max. :291.3 Max. :860.6 Max. :56.100
(Other): 38 wed:54
temp RH wind rain area
Min. : 2.20 Min. : 15.00 Min. :0.400 Min. :0.00000 Min. : 0.00
1st Qu.:15.50 1st Qu.: 33.00 1st Qu.:2.700 1st Qu.:0.00000 1st Qu.: 0.00
Median :19.30 Median : 42.00 Median :4.000 Median :0.00000 Median : 0.52
Mean :18.89 Mean : 44.29 Mean :4.018 Mean :0.02166 Mean : 12.85
3rd Qu.:22.80 3rd Qu.: 53.00 3rd Qu.:4.900 3rd Qu.:0.00000 3rd Qu.: 6.57
Max. :33.30 Max. :100.00 Max. :9.400 Max. :6.40000 Max. :1090.84

dayfri daymon daysat daysun daythu


Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.000
1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.000
Median :0.0000 Median :0.0000 Median :0.0000 Median :0.0000 Median :0.000
Mean :0.1644 Mean :0.1431 Mean :0.1625 Mean :0.1838 Mean :0.118
3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:0.000
Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.000

daytue daywed monthapr monthaug monthdec


Min. :0.0000 Min. :0.0000 Min. :0.00000 Min. :0.0000 Min. :0.00000
1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.00000 1st Qu.:0.0000 1st Qu.:0.00000
Median :0.0000 Median :0.0000 Median :0.00000 Median :0.0000 Median :0.00000
Mean :0.1238 Mean :0.1044 Mean :0.01741 Mean :0.3559 Mean :0.01741
3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:0.00000 3rd Qu.:1.0000 3rd Qu.:0.00000
Max. :1.0000 Max. :1.0000 Max. :1.00000 Max. :1.0000 Max. :1.00000

monthfeb monthjan monthjul monthjun monthmar


Min. :0.00000 Min. :0.000000 Min. :0.0000 Min. :0.00000 Min. :0.0000
1st Qu.:0.00000 1st Qu.:0.000000 1st Qu.:0.0000 1st Qu.:0.00000 1st Qu.:0.0000
Median :0.00000 Median :0.000000 Median :0.0000 Median :0.00000 Median :0.0000
Mean :0.03868 Mean :0.003868 Mean :0.0619 Mean :0.03288 Mean :0.1044
3rd Qu.:0.00000 3rd Qu.:0.000000 3rd Qu.:0.0000 3rd Qu.:0.00000 3rd Qu.:0.0000
Max. :1.00000 Max. :1.000000 Max. :1.0000 Max. :1.00000 Max. :1.0000

monthmay monthnov monthoct monthsep size_category


Min. :0.000000 Min. :0.000000 Min. :0.00000 Min. :0.0000 large:139
1st Qu.:0.000000 1st Qu.:0.000000 1st Qu.:0.00000 1st Qu.:0.0000 small:378
Median :0.000000 Median :0.000000 Median :0.00000 Median :0.0000
Mean :0.003868 Mean :0.001934 Mean :0.02901 Mean :0.3327
3rd Qu.:0.000000 3rd Qu.:0.000000 3rd Qu.:0.00000 3rd Qu.:1.0000

© 2013 - 2020 360DigiTMG. All Rights Reserved.


Max. :1.000000 Max. :1.000000 Max. :1.00000 Max. :1.0000

> forestfires_train <- letter[1:413, ]


Error: object 'letter' not found
> forestfires_test <- letter[414:517, ]
Error: object 'letter' not found
> # Partition Data into train and test data
> forestfires_train <- forestfires[1:413, ]
> forestfires_test <- forestfires[414:517, ]
> library(kernlab)
> forestfires_classifier <- ksvm(forestfires ~ ., data = forestfires_train, kernel = "vanilladot")
Error in model.frame.default(data = ..1, formula = x) :
invalid type (list) for variable 'forestfires'
> View(forestfires)
> View(forestfires)
> View(forestfires)
> View(forestfires)
> View(forestfires)
> View(forestfires)
> forestfires_classifier <- ksvm(size_category ~ ., data = forestfires_train, kernel = "vanilladot")
Setting default kernel parameters
Warning message:
In .local(x, ...) : Variable(s) `' constant. Cannot scale data.
> View(forestfires)
> View(forestfires)
> forestfires_classifier <- ksvm(size_category ~ ., data = forestfires_train, kernel = "vanilladot")
Setting default kernel parameters
Warning message:
In .local(x, ...) : Variable(s) `' constant. Cannot scale data.
> ## Evaluating model performance ----
> # predictions on testing dataset
> forestfires_predictions <- predict(forestfires_classifier, forest_test)
Error in is(newdata, "list") : object 'forest_test' not found
> ## Evaluating model performance ----
> # predictions on testing dataset
> forestfires_predictions <- predict(forestfires_classifier, forestfires_test)
> table(forestfires_predictions, forestfires_test$size_category)

forestfires_predictions large small


large 30 1
small 1 72
> agreement <- forestfires_predictions == letters_test$size_category
Error: object 'letters_test' not found
> agreement <- forestfires_predictions == forestfires_test$size_category
> table(agreement)
agreement
FALSE TRUE

© 2013 - 2020 360DigiTMG. All Rights Reserved.


2 102
> prop.table(table(agreement))
agreement
FALSE TRUE
0.01923077 0.98076923
> ## Improving model performance ----
> forestfires_classifier_rbf <- ksvm(size_category ~ ., data = forestfires_train, kernel = "rbfdot")
Warning message:
In .local(x, ...) : Variable(s) `' constant. Cannot scale data.
> forestfires_predictions_rbf <- predict(forestfires_classifier_rbf, forestfires_test)
> agreement_rbf <- forestfires_predictions_rbf == forestfires_test$size_category
> table(agreement_rbf)
agreement_rbf
FALSE TRUE
31 73
> prop.table(table(agreement_rbf))
agreement_rbf
FALSE TRUE
0.2980769 0.7019231

Inferences: Vanilladot Kernel gives more accuracy than rbfdot kernel.

Python-code:

import pandas as pd
import numpy as np

forestfires = pd.read_csv("D:/360DigiTMG/Assignment/BlackBox
technique/Datasets_SVM/forestfires.csv")
forestfires.describe()

from sklearn.svm import SVC


from sklearn.model_selection import train_test_split

train,test = train_test_split(forestfires, test_size = 0.20)

train_X = train.iloc[:, 3:29]


train_y = train.iloc[:, 30]
test_X = test.iloc[:, 3:29]
test_y = test.iloc[:, 30]

# kernel = linear
help(SVC)
model_linear = SVC(kernel = "linear")
model_linear.fit(train_X, train_y)

© 2013 - 2020 360DigiTMG. All Rights Reserved.


pred_test_linear = model_linear.predict(test_X)

np.mean(pred_test_linear == test_y)

# kernel = rbf
model_rbf = SVC(kernel = "rbf")
model_rbf.fit(train_X, train_y)
pred_test_rbf = model_rbf.predict(test_X)

np.mean(pred_test_rbf==test_y)

Output:

import pandas as pd

import numpy as np

forestfires = pd.read_csv("D:/360DigiTMG/Assignment/BlackBox
technique/Datasets_SVM/forestfires.csv")

forestfires.describe()
Out[47]:
FFMC DMC DC ... monthnov monthoct monthsep
count 517.000000 517.000000 517.000000 ... 517.000000 517.000000 517.000000
mean 90.644681 110.872340 547.940039 ... 0.001934 0.029014 0.332689
std 5.520111 64.046482 248.066192 ... 0.043980 0.168007 0.471632
min 18.700000 1.100000 7.900000 ... 0.000000 0.000000 0.000000
25% 90.200000 68.600000 437.700000 ... 0.000000 0.000000 0.000000
50% 91.600000 108.300000 664.200000 ... 0.000000 0.000000 0.000000
75% 92.900000 142.400000 713.900000 ... 0.000000 0.000000 1.000000
max 96.200000 291.300000 860.600000 ... 1.000000 1.000000 1.000000

[8 rows x 28 columns]

from sklearn.svm import SVC

from sklearn.model_selection import train_test_split

train,test = train_test_split(forestfires, test_size = 0.20)

train_X = train.iloc[:, 3:29]

train_y = train.iloc[:, 30]

test_X = test.iloc[:, 3:29]

© 2013 - 2020 360DigiTMG. All Rights Reserved.


test_y = test.iloc[:, 30]

model_linear = SVC(kernel = "linear")

model_linear.fit(train_X, train_y)
Out[56]: SVC(kernel='linear')

pred_test_linear = model_linear.predict(test_X)

np.mean(pred_test_linear == test_y)
Out[58]: 0.9711538461538461

# kernel = rbf

model_rbf = SVC(kernel = "rbf")

model_rbf.fit(train_X, train_y)
Out[61]: SVC()

pred_test_rbf = model_rbf.predict(test_X)

np.mean(pred_test_rbf==test_y)
Out[63]: 0.7596153846153846

© 2013 - 2020 360DigiTMG. All Rights Reserved.

You might also like