0% found this document useful (0 votes)

6 views8 pages

04 SVM

The document outlines a laboratory exercise focused on Support Vector Machines (SVM) for classification and regression tasks. It includes instructions for data preparation, model building, accuracy calculation, and hyperparameter tuning using datasets such as breast cancer and iris plants. The final deliverable is a Python script that implements the exercises and generates required output files.

Uploaded by

stifi.extra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views8 pages

04 SVM

Uploaded by

stifi.extra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Laboratorium: SVM

1 Cel/Zakres
• Klasyfikacja SVM.
• Skalowanie.
• Budowanie potoków.
• Regresja SVM.
• Poszukiwanie wartości hiperparametrów.

2 Przygotowanie danych dla klasyfikacji

Załaduj zbiory danych, które będą używane w klasyfikacji.

from sklearn import datasets

Pierwszy zbiór zawiera dane obrazów przypadków nowotworów piersi:

data_breast_cancer = datasets.load_breast_cancer(as_frame=False)
print(data_breast_cancer['DESCR'])

.. _breast_cancer_dataset:

Breast cancer wisconsin (diagnostic) dataset

--------------------------------------------

Data Set Characteristics:

:Number of Instances: 569

:Number of Attributes: 30 numeric, predictive attributes and the class

:Attribute Information:
- radius (mean of distances from center to points on the perimeter)
- texture (standard deviation of gray-scale values)
- perimeter
- area
- smoothness (local variation in radius lengths)
- compactness (perimeter^2 / area - 1.0)
- concavity (severity of concave portions of the contour)

1
- concave points (number of concave portions of the contour)
- symmetry
- fractal dimension ("coastline approximation" - 1)

The mean, standard error, and "worst" or largest (mean of the three
worst/largest values) of these features were computed for each image,
resulting in 30 features. For instance, field 0 is Mean Radius, field
10 is Radius SE, field 20 is Worst Radius.

- class:
- WDBC-Malignant
- WDBC-Benign

:Summary Statistics:

===================================== ====== ======

Min Max
===================================== ====== ======
radius (mean): 6.981 28.11
texture (mean): 9.71 39.28
perimeter (mean): 43.79 188.5
area (mean): 143.5 2501.0
smoothness (mean): 0.053 0.163
compactness (mean): 0.019 0.345
concavity (mean): 0.0 0.427
concave points (mean): 0.0 0.201
symmetry (mean): 0.106 0.304
fractal dimension (mean): 0.05 0.097
radius (standard error): 0.112 2.873
texture (standard error): 0.36 4.885
perimeter (standard error): 0.757 21.98
area (standard error): 6.802 542.2
smoothness (standard error): 0.002 0.031
compactness (standard error): 0.002 0.135
concavity (standard error): 0.0 0.396
concave points (standard error): 0.0 0.053
symmetry (standard error): 0.008 0.079
fractal dimension (standard error): 0.001 0.03
radius (worst): 7.93 36.04
texture (worst): 12.02 49.54
perimeter (worst): 50.41 251.2
area (worst): 185.2 4254.0
smoothness (worst): 0.071 0.223
compactness (worst): 0.027 1.058
concavity (worst): 0.0 1.252
concave points (worst): 0.0 0.291
symmetry (worst): 0.156 0.664
fractal dimension (worst): 0.055 0.208

2
===================================== ====== ======

:Missing Attribute Values: None

:Class Distribution: 212 - Malignant, 357 - Benign

:Creator: Dr. William H. Wolberg, W. Nick Street, Olvi L. Mangasarian

:Donor: Nick Street

:Date: November, 1995

This is a copy of UCI ML Breast Cancer Wisconsin (Diagnostic) datasets.

https://goo.gl/U2Uwz2

Features are computed from a digitized image of a fine needle

aspirate (FNA) of a breast mass. They describe
characteristics of the cell nuclei present in the image.

Separating plane described above was obtained using

Multisurface Method-Tree (MSM-T) [K. P. Bennett, "Decision Tree
Construction Via Linear Programming." Proceedings of the 4th
Midwest Artificial Intelligence and Cognitive Science Society,
pp. 97-101, 1992], a classification method which uses linear
programming to construct a decision tree. Relevant features
were selected using an exhaustive search in the space of 1-4
features and 1-3 separating planes.

The actual linear program used to obtain the separating plane

in the 3-dimensional space is that described in:
[K. P. Bennett and O. L. Mangasarian: "Robust Linear
Programming Discrimination of Two Linearly Inseparable Sets",
Optimization Methods and Software 1, 1992, 23-34].

This database is also available through the UW CS ftp server:

ftp ftp.cs.wisc.edu
cd math-prog/cpo-dataset/machine-learn/WDBC/

|details-start|
**References**
|details-split|

- W.N. Street, W.H. Wolberg and O.L. Mangasarian. Nuclear feature extraction
for breast tumor diagnosis. IS&T/SPIE 1993 International Symposium on
Electronic Imaging: Science and Technology, volume 1905, pages 861-870,
San Jose, CA, 1993.
- O.L. Mangasarian, W.N. Street and W.H. Wolberg. Breast cancer diagnosis and

3
prognosis via linear programming. Operations Research, 43(4), pages 570-577,
July-August 1995.
- W.H. Wolberg, W.N. Street, and O.L. Mangasarian. Machine learning techniques
to diagnose breast cancer from fine-needle aspirates. Cancer Letters 77 (1994)
163-171.

|details-end|

Drugi zawiera „klasyczny” zbiór parametrów irysów:

data_iris = datasets.load_iris()
print(data_iris['DESCR'])

.. _iris_dataset:

Iris plants dataset

--------------------

Data Set Characteristics:

:Number of Instances: 150 (50 in each of three classes)

:Number of Attributes: 4 numeric, predictive attributes and the class
:Attribute Information:
- sepal length in cm
- sepal width in cm
- petal length in cm
- petal width in cm
- class:
- Iris-Setosa
- Iris-Versicolour
- Iris-Virginica

:Summary Statistics:

============== ==== ==== ======= ===== ====================

Min Max Mean SD Class Correlation
============== ==== ==== ======= ===== ====================
sepal length: 4.3 7.9 5.84 0.83 0.7826
sepal width: 2.0 4.4 3.05 0.43 -0.4194
petal length: 1.0 6.9 3.76 1.76 0.9490 (high!)
petal width: 0.1 2.5 1.20 0.76 0.9565 (high!)
============== ==== ==== ======= ===== ====================

:Missing Attribute Values: None

:Class Distribution: 33.3% for each of 3 classes.
:Creator: R.A. Fisher
:Donor: Michael Marshall (MARSHALL%[email protected])

4
:Date: July, 1988

The famous Iris database, first used by Sir R.A. Fisher. The dataset is taken
from Fisher's paper. Note that it's the same as in R, but not as in the UCI
Machine Learning Repository, which has two wrong data points.

This is perhaps the best known database to be found in the

pattern recognition literature. Fisher's paper is a classic in the field and
is referenced frequently to this day. (See Duda & Hart, for example.) The
data set contains 3 classes of 50 instances each, where each class refers to a
type of iris plant. One class is linearly separable from the other 2; the
latter are NOT linearly separable from each other.

|details-start|
**References**
|details-split|

- Fisher, R.A. "The use of multiple measurements in taxonomic problems"

Annual Eugenics, 7, Part II, 179-188 (1936); also in "Contributions to
Mathematical Statistics" (John Wiley, NY, 1950).
- Duda, R.O., & Hart, P.E. (1973) Pattern Classification and Scene Analysis.
(Q327.D83) John Wiley & Sons. ISBN 0-471-22361-1. See page 218.
- Dasarathy, B.V. (1980) "Nosing Around the Neighborhood: A New System
Structure and Classification Rule for Recognition in Partially Exposed
Environments". IEEE Transactions on Pattern Analysis and Machine
Intelligence, Vol. PAMI-2, No. 1, 67-71.
- Gates, G.W. (1972) "The Reduced Nearest Neighbor Rule". IEEE Transactions
on Information Theory, May 1972, 431-433.
- See also: 1988 MLC Proceedings, 54-64. Cheeseman et al"s AUTOCLASS II
conceptual clustering system finds 3 classes in the data.
- Many, many more …

|details-end|

Podpowiedź: funkcje load_... domyślnie zwracają obiekty numpy, ale jeżeli przekażemy im argu-
ment as_frame=True, elementy data oraz target będą strukturami pandas, a dodatkowo dostępny
będzie element frame, ktory zawiera data połączone z target.

3 Klasyfikacja
1. Podziel zbiór danych na uczący i testujący w proporcjach 80/20.
2. Zbuduj modele klasyfikacji SVM dla średnich (mean) wartości cech area oraz smoothness;
stwórz dwa modele:
1. LinearSVC, z funkcją straty “hinge”,
2. LinearSVC, z funkcją straty “hinge”, po uprzednim automatycznym skalowaniu wartości
cech.
3. Policz dokładność (accuracy) dla ww. klasyfikacji osobno na zbiorze uczącym i testują-

5
cym, zapisz wartości na liście w kolejności: zbiór uczący bez skalowania, zbiór testujący
bez skalowania, zbiór uczący ze m, zbiór testujący ze skalowaniem. Listę zapisz w pliku
Pickle bc_acc.pkl.
4 pkt.
4. Czy skalowanie coś dało?
5. Ekperyment powtórz dla zbioru irysów; zbuduj model wykrywający, czy dany przypadek jest
gatunku Virginica na podstawie cech: długość i szerokość płatka.
6. Policz dokładność (accuracy) dla w/w klasyfikacji osobno na zbiorze uczącym i testującym, za-
pisz wartości na liście w kolejności: zbiór uczący bez skalowania, zbiór testujący bez skalowa-
nia, zbiór uczący ze skalowanie, zbiór testujący ze skalowaniem. W.w. listę zapisz w pliku
Pickle iris_acc.pkl.
4 pkt.
7. Czy skalowanie coś dało?

4 Przygotowanie danych dla regresji

1. Użyj tej samej funkcji co z laboratorium o regresji.

import numpy as np
import pandas as pd
size = 900
X = np.random.rand(size)*5-2.5
w4, w3, w2, w1, w0 = 1, 2, 1, -4, 2
y = w4*(X**4) + w3*(X**3) + w2*(X**2) + w1*X + w0 + np.random.randn(size)*8-4
df = pd.DataFrame({'x': X, 'y': y})
df.plot.scatter(x='x',y='y')

<Axes: xlabel='x', ylabel='y'>

6
2. Podziel zbiór uczący i testowy w proporcji 80:20.

5 Regresja
1. Zbuduj potok rozszerzający cechy do 4 wymiarów, za pomocą wielomianu 4 stopnia oraz
regresora LinearSVR z domyslnymi parametrami.
2. Oblicz MSE dla zbioru uczącego i zbioru testowego. Wyniki powinny być podobne do najlep-
szych rezultatów z ćwiczenia o regresji, lub nawet lepsze.
3. Powtórz uczenie dla regresora SVR z kernelem poly 4 stopnia i pozostałymi parametrami z
wartościami domyslnymi. Wyniki MSE powinny być … rozczarowujące.
4. Jakie hiperparametry użyć żeby SVR miał podobną jakość co LinearSVR? Użyj Grid-
SearchCV na całym zbiorze danych (nie tylko uczącym!). Do znalezienia optymalnej pary
parametrów coef0 oraz C. Jak funkcje oceny zastosuj neg_mean_squared_error. Poszukaj
optymalnych wartości spośród: "C" : [0.1, 1, 10], "coef0" : [0.1, 1, 10].
5. Dla wyliczonych optymalnych wartości hiperparametrów przeprowadź proces uczenia SVR
raz jeszcze. Oblicz wyniki MSE dla zbioru uczącego i testowego.
6. Zapisz wyniki MSE z punktu 2 i 5 na liście (4 elementy), którą następnie zapisz w pliku Pickle
o nazwie: reg_mse.pkl.
4 pkt.

7
6 Prześlij raport
Prześlij plik o nazwie lab04/lab04.py realizujący ww. ćwiczenia.
Sprawdzane będzie, czy skrypt Pythona tworzy wszystkie wymagane pliki oraz czy ich zawartość
jest poprawna.

Deutz Allis Tractor Operators Manual de o d6806
No ratings yet
Deutz Allis Tractor Operators Manual de o d6806
6 pages
Gebru Netsanet Kassaye 150519190409
No ratings yet
Gebru Netsanet Kassaye 150519190409
65 pages
WD801
No ratings yet
WD801
2 pages
Codes and Other Relevant Explanations For Supervised Learning (Part 1) - Session by Sabyasachi Mukhopadhyay - August 3
No ratings yet
Codes and Other Relevant Explanations For Supervised Learning (Part 1) - Session by Sabyasachi Mukhopadhyay - August 3
5 pages
Lab5 DataMining
No ratings yet
Lab5 DataMining
7 pages
RPP Akuntansi Dasar Dalam Bahasa Inggris
No ratings yet
RPP Akuntansi Dasar Dalam Bahasa Inggris
18 pages
Classification of Iris Flower Species Updated
100% (1)
Classification of Iris Flower Species Updated
5 pages
Minor Project
No ratings yet
Minor Project
21 pages
Literary Terms
No ratings yet
Literary Terms
234 pages
Solutions of Triangle Sheet
100% (2)
Solutions of Triangle Sheet
16 pages
Engineering Economics-Question Bank
0% (1)
Engineering Economics-Question Bank
2 pages
DNB Nord Bank Report. Lithuania and Baltic Countries
No ratings yet
DNB Nord Bank Report. Lithuania and Baltic Countries
182 pages
CatBoost - An In-Depth Guide Python
No ratings yet
CatBoost - An In-Depth Guide Python
33 pages
Human Behavior Insights (Ft. Kunal Shah) X
No ratings yet
Human Behavior Insights (Ft. Kunal Shah) X
64 pages
Role of Information System in Tourism PDF
No ratings yet
Role of Information System in Tourism PDF
18 pages
K-Nearest Neighbors Classifiers 2025
No ratings yet
K-Nearest Neighbors Classifiers 2025
33 pages
Chapter 17 Embankments
No ratings yet
Chapter 17 Embankments
67 pages
Machine Learning Algorithm
No ratings yet
Machine Learning Algorithm
18 pages
The Problem Background of The Study
No ratings yet
The Problem Background of The Study
61 pages
Logistic Regression For Binary Classification With Core APIs - TensorFlow Core
No ratings yet
Logistic Regression For Binary Classification With Core APIs - TensorFlow Core
22 pages
AML Lab3 2021wb15156
No ratings yet
AML Lab3 2021wb15156
13 pages
Analise Componente Principal
No ratings yet
Analise Componente Principal
22 pages
Annexes 1 - 18
No ratings yet
Annexes 1 - 18
26 pages
Data Analysis and Machine Learning On The Wisconsin Breast Cancer Dataset
No ratings yet
Data Analysis and Machine Learning On The Wisconsin Breast Cancer Dataset
11 pages
Financial Statement Analysis
No ratings yet
Financial Statement Analysis
16 pages
As3 Cs Daylo, Roque, Somera
No ratings yet
As3 Cs Daylo, Roque, Somera
5 pages
Aci 311.1
No ratings yet
Aci 311.1
1 page
AMCIS 2020 Slide Template ERF
No ratings yet
AMCIS 2020 Slide Template ERF
14 pages
Classifying Data Using Support Vector Machines (SVMS) in Python
No ratings yet
Classifying Data Using Support Vector Machines (SVMS) in Python
5 pages
ML Expt 4
No ratings yet
ML Expt 4
4 pages
Assigmnent 3 (Data Mining)
No ratings yet
Assigmnent 3 (Data Mining)
18 pages
Support Vector Machine (SVM) - Bioinformatics
No ratings yet
Support Vector Machine (SVM) - Bioinformatics
10 pages
Iris - Copy1 - Jupyter Notebook
No ratings yet
Iris - Copy1 - Jupyter Notebook
8 pages
A Computational Study On Classification of Malignant
No ratings yet
A Computational Study On Classification of Malignant
63 pages
Foreword
No ratings yet
Foreword
1,318 pages
ML Lab Manual
No ratings yet
ML Lab Manual
6 pages
IDS Project Group 11
No ratings yet
IDS Project Group 11
35 pages
Dhyey V Desai Supervised Machine Learning Approaches
No ratings yet
Dhyey V Desai Supervised Machine Learning Approaches
5 pages
ML Acti
No ratings yet
ML Acti
23 pages
ML Report2
No ratings yet
ML Report2
21 pages
(IJCST-V12I3P13) :thachayani M, Chaitanya Sai Jangam, Kalyan T, SriManjunadh Maddukuri, Sangadi Manikanta
No ratings yet
(IJCST-V12I3P13) :thachayani M, Chaitanya Sai Jangam, Kalyan T, SriManjunadh Maddukuri, Sangadi Manikanta
4 pages
On Breast Cancer Detection: An Application of Machine Learning Algorithms On The Wisconsin Diagnostic Dataset
No ratings yet
On Breast Cancer Detection: An Application of Machine Learning Algorithms On The Wisconsin Diagnostic Dataset
5 pages
JAYESH BANSAL - FinalProjectReport - Jayesh Bansal
No ratings yet
JAYESH BANSAL - FinalProjectReport - Jayesh Bansal
38 pages
International Society For Soil Mechanics and Geotechnical Engineering
No ratings yet
International Society For Soil Mechanics and Geotechnical Engineering
3 pages
On Breast Cancer Detection: An Application of Machine Learning Algorithms On The Wisconsin Diagnostic Dataset
No ratings yet
On Breast Cancer Detection: An Application of Machine Learning Algorithms On The Wisconsin Diagnostic Dataset
5 pages
ML0101EN Clas SVM Cancer Py v1
No ratings yet
ML0101EN Clas SVM Cancer Py v1
10 pages
The History of Volleyball Project
0% (1)
The History of Volleyball Project
7 pages
Classification Algorithms
No ratings yet
Classification Algorithms
16 pages
Research
No ratings yet
Research
12 pages
Amber Iris
No ratings yet
Amber Iris
23 pages
Lab Program 9
No ratings yet
Lab Program 9
5 pages
2009 Higher Maths Paper I - Questions & Answers by G Fyfe, Perth College
No ratings yet
2009 Higher Maths Paper I - Questions & Answers by G Fyfe, Perth College
16 pages
Support Vector Machine (SVM Classifier) Implemenation in Python With Scikit-Learn
No ratings yet
Support Vector Machine (SVM Classifier) Implemenation in Python With Scikit-Learn
21 pages
Module 4 - Supervised Learning - First ML Model
No ratings yet
Module 4 - Supervised Learning - First ML Model
23 pages
Practical 6
No ratings yet
Practical 6
4 pages
The Art of Support
No ratings yet
The Art of Support
203 pages
Machine Learning in Python
No ratings yet
Machine Learning in Python
5 pages
BT-2016 SEM-IV Project Report (Review 1)
No ratings yet
BT-2016 SEM-IV Project Report (Review 1)
42 pages
FDA Assignment
No ratings yet
FDA Assignment
2 pages
TLE-Fruit Bearing Trees-Week 2
No ratings yet
TLE-Fruit Bearing Trees-Week 2
6 pages
KNN Datacamp
No ratings yet
KNN Datacamp
31 pages
SVM K NN MLP With Sklearn Jupyter NoteBo
No ratings yet
SVM K NN MLP With Sklearn Jupyter NoteBo
22 pages
R Course - Part7 ML - Exercise Sheet 2024
No ratings yet
R Course - Part7 ML - Exercise Sheet 2024
8 pages
Beeswax
100% (1)
Beeswax
4 pages
Maxbox - Starter67 Machine Learning
No ratings yet
Maxbox - Starter67 Machine Learning
7 pages
Multi-Disease Prediction With Machine Learning
No ratings yet
Multi-Disease Prediction With Machine Learning
7 pages
Chap5 - Wei - Ipynb - Colab
No ratings yet
Chap5 - Wei - Ipynb - Colab
29 pages
Iris Flower Classification Final
No ratings yet
Iris Flower Classification Final
15 pages
Final Exam-Poem (Pelecio, Jhaia)
No ratings yet
Final Exam-Poem (Pelecio, Jhaia)
5 pages
Sil Proof Test Practices - 6
No ratings yet
Sil Proof Test Practices - 6
1 page
Ai/Ml Lab-4: Name: Pratik Jadhav PRN: 20190802050
No ratings yet
Ai/Ml Lab-4: Name: Pratik Jadhav PRN: 20190802050
5 pages
Major Project (Kartik Joshi)
No ratings yet
Major Project (Kartik Joshi)
4 pages
Memorandums
No ratings yet
Memorandums
2 pages
Lab 6
No ratings yet
Lab 6
4 pages
Breast Cancer Detection Using SVM Classifier With Grid Search Technique
No ratings yet
Breast Cancer Detection Using SVM Classifier With Grid Search Technique
6 pages
FDP Session 4 (Decision Tree)
No ratings yet
FDP Session 4 (Decision Tree)
1 page
Ex 6, EX 7 AIML
No ratings yet
Ex 6, EX 7 AIML
9 pages
LMHC
No ratings yet
LMHC
1 page
ChatGPT - MyLearning On Coding For Machine Learning
No ratings yet
ChatGPT - MyLearning On Coding For Machine Learning
16 pages
APA Format Research Paper Your Paper Should Have 10 Pages Minimum
No ratings yet
APA Format Research Paper Your Paper Should Have 10 Pages Minimum
3 pages
Ludic - Workshop - Iris - Copie
No ratings yet
Ludic - Workshop - Iris - Copie
5 pages
Data Science: Objectives
No ratings yet
Data Science: Objectives
10 pages
Assignment 4 R Program1
No ratings yet
Assignment 4 R Program1
11 pages
On Under Tunnel Design
100% (1)
On Under Tunnel Design
13 pages
SVM and Kmeans - Iris Dataset - Ipynb - Colab
No ratings yet
SVM and Kmeans - Iris Dataset - Ipynb - Colab
5 pages
12 Classification
No ratings yet
12 Classification
16 pages
ĐỀ SỐ 7- ĐỀ LƯƠNG THẾ VINH HÀ NỘI KHÓA 8+-CÔ PHẠM LIỄU
No ratings yet
ĐỀ SỐ 7- ĐỀ LƯƠNG THẾ VINH HÀ NỘI KHÓA 8+-CÔ PHẠM LIỄU
6 pages
211108-2017-Spouses Latonio v. McGeorge Food Industries20180221-6791-1nj34pi
No ratings yet
211108-2017-Spouses Latonio v. McGeorge Food Industries20180221-6791-1nj34pi
8 pages
High-Dimensional Covariance Estimation: With High-Dimensional Data
From Everand
High-Dimensional Covariance Estimation: With High-Dimensional Data
Mohsen Pourahmadi
No ratings yet
Kernel Methods: Fundamentals and Applications
From Everand
Kernel Methods: Fundamentals and Applications
Fouad Sabry
No ratings yet

04 SVM

Uploaded by

04 SVM

Uploaded by

Laboratorium: SVM

2 Przygotowanie danych dla klasyfikacji

from sklearn import datasets

Pierwszy zbiór zawiera dane obrazów przypadków nowotworów piersi:

Breast cancer wisconsin (diagnostic) dataset

**Data Set Characteristics:**

:Number of Instances: 569

:Number of Attributes: 30 numeric, predictive attributes and the class

===================================== ====== ======

:Missing Attribute Values: None

:Class Distribution: 212 - Malignant, 357 - Benign

:Creator: Dr. William H. Wolberg, W. Nick Street, Olvi L. Mangasarian

:Donor: Nick Street

:Date: November, 1995

This is a copy of UCI ML Breast Cancer Wisconsin (Diagnostic) datasets.

Features are computed from a digitized image of a fine needle

Separating plane described above was obtained using

The actual linear program used to obtain the separating plane

This database is also available through the UW CS ftp server:

Drugi zawiera „klasyczny” zbiór parametrów irysów:

Iris plants dataset

**Data Set Characteristics:**

:Number of Instances: 150 (50 in each of three classes)

============== ==== ==== ======= ===== ====================

:Missing Attribute Values: None

This is perhaps the best known database to be found in the

- Fisher, R.A. "The use of multiple measurements in taxonomic problems"

4 Przygotowanie danych dla regresji

<Axes: xlabel='x', ylabel='y'>

You might also like

Data Set Characteristics:

Data Set Characteristics: