0% found this document useful (0 votes)

15 views11 pages

Classification and Dimension Reduction: Load Dataset

Uploaded by

ljab858

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views11 pages

Classification and Dimension Reduction: Load Dataset

Uploaded by

ljab858

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

11/6/24, 11:58 AM HW4

Classification and dimension reduction

In this assignment, we will apply dimension reduction techniques to reduce the data dimension,
and then construct several classification models.

In [1]: # import packages

# For data manipulation
import pandas as pd
import numpy as np

# For dimensionality reduction

from sklearn.decomposition import PCA
from sklearn.decomposition import KernelPCA
from sklearn.manifold import LocallyLinearEmbedding

# For classification models

from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier, VotingClassifier
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier

# For evaluation
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

# For visualizations
import matplotlib.pyplot as plt
import seaborn as sns
from matplotlib.colors import ListedColormap

Load dataset
This is a classification dataset. For more details on the features and labels, please check this
documentation.

Run the following cell to load features (X) and labels (y).

In [2]: from sklearn.datasets import load_breast_cancer

# read data
data = load_breast_cancer()
X = data.data
y = data.target

Part 1: Train test split (5 pts)

Use 80% of the data to train the model and 20% of the data to test. (2 pts)

localhost:8891/nbconvert/html/HW4.ipynb?download=false 1/11
11/6/24, 11:58 AM HW4

In [3]: # Split the data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=

Question: Why do you need to do train_test split before your run dimension reduction
algorithms? (3 pts)

Answer: Performing train-test split before dimension reduction prevents data leakage by
ensuring that the test data remains unseen during training. If you apply dimension reduction on
the entire dataset first, information from the test set could influence the transformation, leading
to biased results. By reducing dimensions only on the training set, you ensure a fair evaluation,
with the transformation learned solely from training data and then applied to the test data for
consistent comparison.

Part 2: Dimension Reduction and visualization (25 pts)

The goal of this part is to implement PCA, kernel PCA and LLE for your training samples, and
then visualize reduced training data samples in 2D-plane.

As we may know, the best dimension reduction technique depends on the task and your data.
Therefore, we will try several methods and select the best one based on the visualization. Feel
free to use any commands from sklearn.

Sample plots:
The sample plot is given here:

On the left, it is standard PCA. In the middle, it is kernel PCA. You should use RBF kernel for this
assignment and select a good hyperparameter on your own. On the right, you will implement
LLE. Similarly, you should select the number of neighborhoods.

To simplify your code, you do not need to show how you find the hyperparameter.
However, you should include your choices in your visualization, see my sample plots. Moreover,
you should include all visualizations in one Figure using subplots. You should add informative
labels, legends, and titles to make your plots clear.

Your plots will be different from my sample plots due to the different hyperparameters and
random train-test split. But the layout (plot labels, lengends, and etc) should be similar.

localhost:8891/nbconvert/html/HW4.ipynb?download=false 2/11
11/6/24, 11:58 AM HW4

Grading policy:
1. You should implement each algorithm correctly. (5 pts each)
2. You do not need to write a function for this part, but you should add inline comments to
explain each step. (5 pts)
3. Visualization is clear and meets the requirements. (5 pts)

In [4]: # 1. Standard PCA

pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_train)

# 2. Kernel PCA with RBF kernel

# Select a good value for gamma through experimentation
kernel_pca = KernelPCA(n_components=2, kernel='rbf', gamma=0.0015)
X_kpca = kernel_pca.fit_transform(X_train)

# 3. Locally Linear Embedding (LLE)

# Choose the number of neighbors based on experimentation
lle = LocallyLinearEmbedding(n_components=2, n_neighbors=20)
X_lle = lle.fit_transform(X_train)

# Visualization
fig, axs = plt.subplots(1, 3, figsize=(15, 5))

# Plot Standard PCA

axs[0].scatter(X_pca[:, 0], X_pca[:, 1], c=y_train, cmap='coolwarm', s=30, alpha=0.7)
axs[0].set_title("PCA")
axs[0].legend(handles=axs[0].collections, labels=[0, 1])

# Plot Kernel PCA

axs[1].scatter(X_kpca[:, 0], X_kpca[:, 1], c=y_train, cmap='coolwarm', s=30, alpha=0.7
axs[1].set_title("Kernel PCA (RBF, gamma = 0.0015)")
axs[1].legend(handles=axs[1].collections, labels=[0, 1])

# Plot LLE
axs[2].scatter(X_lle[:, 0], X_lle[:, 1], c=y_train, cmap='coolwarm', s=30, alpha=0.7)
axs[2].set_title("LLE (n_neighbors=20)")
axs[2].legend(handles=axs[2].collections, labels=[0, 1])

# Add general plot labels

for ax in axs:
ax.set_xlabel("First Principle Component")
ax.set_ylabel("Second Principle Component")

plt.tight_layout()
plt.show()

localhost:8891/nbconvert/html/HW4.ipynb?download=false 3/11
11/6/24, 11:58 AM HW4

Part 3: Train classification models (50 pts)

In this part, you will train several models using reduced training samples.

First of all, based on your visualization result in part 2, please determine which dimension
reduction technique you want to use for part 3 and state the reason.

Second, please do dimension reduction on training samples using the technique you select.
(This step is the same as part 2, so you do not need to repeat the code, you can use what
you obtain from part 2)

Third, train k-nearest-neighbors, logistic regression, decision tree, random forest, and
voting classifier models (use all models mention before) on reduced training samples and
then report test accuracy.

Last, show the decision region for each model. Please look at this reference code and
visualize the decision regions. You should write a function to draw decision region for any
classification model and any data samples. Function docstring is required. (Hints: the
reference I provide is in a good shape, but you cannot use the code directly. Slight
modification is required. )

Please follow the following instructions and finish part 3. Inline comments are required for your
code.

3(a) Determine the dimension reduction technique you will use and state the
reason. (5 pts)
I chose LLE because it provides a clear separation between the classes in the 2D visualization,
which can help classification models distinguish between classes more effectively.

3(b) Do dimension reduction

You can repeat part 2 or use results from part 2

In [5]: # Apply LLE for dimension reduction

lle = LocallyLinearEmbedding(n_neighbors=20, n_components=2) # Adjust n_neighbors if

localhost:8891/nbconvert/html/HW4.ipynb?download=false 4/11
11/6/24, 11:58 AM HW4
X_train_lle = lle.fit_transform(X_train)
X_test_lle = lle.transform(X_test) # Transform the test set with the trained LLE

3(c) Write a function to draw decision region (5 pts)

Your function looks similar as the function in the sample codes, but you should modify the code
because it only works for penguin data.

Grading policy:

1. Function docstring and inline comments are required.

2. The correctness of your function will be checked later.

In [6]: def plot_decision_regions(model, X, y, feature_names=('Dim 1', 'Dim 2'), class_labels=

"""
Plot decision regions for a classification model in 2D feature space.

Parameters:
- model: Trained model with `.predict` method.
- X: 2D DataFrame or array-like of features.
- y: 1D array of labels for the dataset.
- feature_names: Tuple, names of the features in the plot (default: 'Dim 1', 'Dim
- class_labels: List of class labels for the legend (default: None).
- ax: Optional, Matplotlib axis to plot on.
"""
# Define min/max values for the grid
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1

# Create a mesh grid

xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.02),
np.arange(y_min, y_max, 0.02))

# Make predictions on the grid

Z = model.predict(np.c_[xx.ravel(), yy.ravel()]).reshape(xx.shape)

# Plot decision boundary using contour plot

if ax is None:
fig, ax = plt.subplots()
ax.contourf(xx, yy, Z, alpha=0.3, cmap=ListedColormap(('lightblue', 'lightgreen',

# Scatter plot for actual data points

scatter = ax.scatter(X[:, 0], X[:, 1], c=y, s=30, edgecolor='k', cmap=ListedColorm
ax.set_xlabel(feature_names[0])
ax.set_ylabel(feature_names[1])

# Add legend if class labels provided

if class_labels:
ax.legend(handles=scatter.legend_elements()[0], labels=class_labels)

3(d) Train k-nearest-neighbors model (10 pts)

You should try different k's and select the best one. You can either use cross-validation or train
models using different k manually.

Print the test accuracy for all models, and draw the decision region of the model with the best k.
localhost:8891/nbconvert/html/HW4.ipynb?download=false 5/11
11/6/24, 11:58 AM HW4

Grading policy: Do the following correctly, then you will receive credits

1. Try different k values and report all test accuracies (5 pts)

2. Select the best k and visualize the decision region (5 pts)

In [7]: #### Your code here

k_values = [1, 3, 5, 7, 9]
best_k, best_accuracy = 0, 0
for k in k_values:
knn = KNeighborsClassifier(n_neighbors=k)
knn.fit(X_train_lle, y_train)
accuracy = accuracy_score(y_test, knn.predict(X_test_lle))
print(f"KNN with k={k}: Test Accuracy = {accuracy:.2f}")
if accuracy > best_accuracy:
best_k, best_accuracy = k, accuracy

# Train KNN with best k and plot decision regions

best_knn = KNeighborsClassifier(n_neighbors=best_k)
best_knn.fit(X_train_lle, y_train)

# Plotting
fig, ax = plt.subplots()
plot_decision_regions(best_knn, X_train_lle, y_train, feature_names=('Dim 1', 'Dim 2')
ax.set_title(f"K-Nearest Neighbors (k={best_k}) Decision Region")
plt.show()

KNN with k=1: Test Accuracy = 0.92

KNN with k=3: Test Accuracy = 0.92
KNN with k=5: Test Accuracy = 0.93
KNN with k=7: Test Accuracy = 0.93
KNN with k=9: Test Accuracy = 0.93

localhost:8891/nbconvert/html/HW4.ipynb?download=false 6/11
11/6/24, 11:58 AM HW4

3(e) Train logistic regression model (5 pts)

No hyperparameter selection required for the logistic regression model.

Grading policy: Do the following correctly, then you will receive credits

1. Train a model, report test accuracy, and visualize the decision region (5 pts)

In [8]: #### Your code here

# Train logistic regression and evaluate
log_reg = LogisticRegression()
log_reg.fit(X_train_lle, y_train)
log_accuracy = accuracy_score(y_test, log_reg.predict(X_test_lle))
print(f"Logistic Regression Test Accuracy = {log_accuracy:.2f}")

# Plotting
fig, ax = plt.subplots()
plot_decision_regions(log_reg, X_train_lle, y_train, feature_names=('Dim 1', 'Dim 2'),
ax.set_title("Logistic Regression Decision Region")
plt.show()

Logistic Regression Test Accuracy = 0.69

localhost:8891/nbconvert/html/HW4.ipynb?download=false 7/11
11/6/24, 11:58 AM HW4

3(f) Train decision tree (10 pts)

You should try different max_depth and select the best one.

Grading policy: Do the following correctly, then you will receive credits

1. Try different max_depth and report all test accuracies (5 pts)

2. Select the best max_depth and visualize the decision region (5 pts)

In [9]: #### Your code here

# Tune max_depth in Decision Tree
depth_values = [3, 5, 7, 9]
best_depth, best_accuracy = 0, 0
for depth in depth_values:
dt = DecisionTreeClassifier(max_depth=depth)
dt.fit(X_train_lle, y_train)
accuracy = accuracy_score(y_test, dt.predict(X_test_lle))
print(f"Decision Tree with max_depth={depth}: Test Accuracy = {accuracy:.2f}")
if accuracy > best_accuracy:
best_depth, best_accuracy = depth, accuracy

# Train Decision Tree with best depth and plot decision region
best_dt = DecisionTreeClassifier(max_depth=best_depth)
best_dt.fit(X_train_lle, y_train)

# Plotting
fig, ax = plt.subplots()
plot_decision_regions(best_dt, X_train_lle, y_train, feature_names=('Dim 1', 'Dim 2'),

localhost:8891/nbconvert/html/HW4.ipynb?download=false 8/11
11/6/24, 11:58 AM HW4
ax.set_title(f"Decision Tree (max_depth={best_depth}) Decision Region")
plt.show()

Decision Tree with max_depth=3: Test Accuracy = 0.94

Decision Tree with max_depth=5: Test Accuracy = 0.94
Decision Tree with max_depth=7: Test Accuracy = 0.96
Decision Tree with max_depth=9: Test Accuracy = 0.93

3(g) Train random forest (10 pts)

You should try different max_depth and n_estimators , then select the best
hyperparameters.

Grading policy: Do the following correctly, then you will receive credits

1. Try different max_depth and n_estimators , then report all test accuracies (5 pts)
2. Select the best max_depth and n_estimators , then visualize the decision region (5 pts)

In [10]: #### Your code here

# Tune max_depth and n_estimators in Random Forest
forest_depth_values = [5, 10]
n_estimators_values = [50, 100]
best_depth, best_n_estimators, best_accuracy = 0, 0, 0
for depth in forest_depth_values:
for n_estimators in n_estimators_values:
rf = RandomForestClassifier(max_depth=depth, n_estimators=n_estimators, random
rf.fit(X_train_lle, y_train)
accuracy = accuracy_score(y_test, rf.predict(X_test_lle))
print(f"Random Forest with max_depth={depth}, n_estimators={n_estimators}: Tes
if accuracy > best_accuracy:
localhost:8891/nbconvert/html/HW4.ipynb?download=false 9/11
11/6/24, 11:58 AM HW4
best_depth, best_n_estimators, best_accuracy = depth, n_estimators, accura

# Train Random Forest with best parameters and plot decision region
best_rf = RandomForestClassifier(max_depth=best_depth, n_estimators=best_n_estimators,
best_rf.fit(X_train_lle, y_train)

# Plotting
fig, ax = plt.subplots()
plot_decision_regions(best_rf, X_train_lle, y_train, feature_names=('Dim 1', 'Dim 2'),
ax.set_title(f"Random Forest (max_depth={best_depth}, n_estimators={best_n_estimators}
plt.show()

Random Forest with max_depth=5, n_estimators=50: Test Accuracy = 0.94

Random Forest with max_depth=5, n_estimators=100: Test Accuracy = 0.94
Random Forest with max_depth=10, n_estimators=50: Test Accuracy = 0.95
Random Forest with max_depth=10, n_estimators=100: Test Accuracy = 0.95

3(h) Train max voting classifier (5 pts)

You should use the hyperparameters you select before.

Grading policy: Do the following correctly, then you will receive credits

1. Train a model, report test accuracy, and visualize the decision region (5 pts)

In [11]: #### Your code here

# Define the best models as a voting ensemble
voting_clf = VotingClassifier(estimators=[
('knn', best_knn),
('log_reg', log_reg),
('dt', best_dt),
localhost:8891/nbconvert/html/HW4.ipynb?download=false 10/11
11/6/24, 11:58 AM HW4
('rf', best_rf)
])
voting_clf.fit(X_train_lle, y_train)
voting_accuracy = accuracy_score(y_test, voting_clf.predict(X_test_lle))
print(f"Voting Classifier Test Accuracy = {voting_accuracy:.2f}")

# Plotting
fig, ax = plt.subplots()
plot_decision_regions(voting_clf, X_train_lle, y_train, feature_names=('Dim 1', 'Dim 2
ax.set_title("Voting Classifier Decision Region")
plt.show()

Voting Classifier Test Accuracy = 0.95

In [ ]:

localhost:8891/nbconvert/html/HW4.ipynb?download=false 11/11

(Feature Engineering) (Extended-Cheatsheet)
No ratings yet
(Feature Engineering) (Extended-Cheatsheet)
9 pages
List of Imported Libraries
No ratings yet
List of Imported Libraries
12 pages
Amine Unit
100% (1)
Amine Unit
69 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
9 pages
Machine
100% (1)
Machine
45 pages
HPE Smart Choice Gen 11 - Supplemental QuickSpecs-a50009219enw
No ratings yet
HPE Smart Choice Gen 11 - Supplemental QuickSpecs-a50009219enw
51 pages
Mercedes-Benz Greener Manufacturing Ai
0% (1)
Mercedes-Benz Greener Manufacturing Ai
16 pages
A.Datum Case Study
No ratings yet
A.Datum Case Study
23 pages
VK Liste 2017
No ratings yet
VK Liste 2017
29 pages
Domino Squares
100% (2)
Domino Squares
1 page
The Art of Troubleshooting - Ebook - V2
No ratings yet
The Art of Troubleshooting - Ebook - V2
356 pages
Instant Download Understanding Race and Crime 1st Edition Colin Webster PDF All Chapter
100% (3)
Instant Download Understanding Race and Crime 1st Edition Colin Webster PDF All Chapter
84 pages
Machine Learning Lab Manaul BCSL606
No ratings yet
Machine Learning Lab Manaul BCSL606
27 pages
TOR B1 Listening WS 3 Standard
No ratings yet
TOR B1 Listening WS 3 Standard
3 pages
Iii Aid - ML
No ratings yet
Iii Aid - ML
30 pages
Matthew Cabral
No ratings yet
Matthew Cabral
1 page
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
26 pages
ML Manual
No ratings yet
ML Manual
30 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
33 pages
MLLab Manual
No ratings yet
MLLab Manual
24 pages
ML Labmanual
No ratings yet
ML Labmanual
33 pages
ML Lab Manual
No ratings yet
ML Lab Manual
24 pages
The Relationship of Endodontic-Periodontic Lesions
No ratings yet
The Relationship of Endodontic-Periodontic Lesions
7 pages
Meltem Adar Essay1
No ratings yet
Meltem Adar Essay1
3 pages
MACHINE LEARNING Manual
No ratings yet
MACHINE LEARNING Manual
36 pages
ML Programs
No ratings yet
ML Programs
14 pages
MLFILE
No ratings yet
MLFILE
21 pages
Instruction For AVIC F-Series In-Dash 2.008 Firmware Update
No ratings yet
Instruction For AVIC F-Series In-Dash 2.008 Firmware Update
4 pages
Mlalllabprgs
No ratings yet
Mlalllabprgs
17 pages
MDS372 Lab4 2448001
No ratings yet
MDS372 Lab4 2448001
17 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
18 pages
Assignment 2 Documentation
No ratings yet
Assignment 2 Documentation
15 pages
Code and Output of Cancer Detection Model
No ratings yet
Code and Output of Cancer Detection Model
13 pages
Erlie M.
No ratings yet
Erlie M.
65 pages
Featureselection
No ratings yet
Featureselection
11 pages
Lab4 KNN
No ratings yet
Lab4 KNN
9 pages
Machine Learning Programs
No ratings yet
Machine Learning Programs
10 pages
ML Shristi File
No ratings yet
ML Shristi File
49 pages
Lab Extern L
No ratings yet
Lab Extern L
8 pages
BBMF2083 - Chap 2 - 6.22
No ratings yet
BBMF2083 - Chap 2 - 6.22
40 pages
1
No ratings yet
1
13 pages
Equlibrium
No ratings yet
Equlibrium
20 pages
Mlda - Lab
No ratings yet
Mlda - Lab
35 pages
LAB-4 Report
No ratings yet
LAB-4 Report
21 pages
ML Journal External
No ratings yet
ML Journal External
14 pages
#Lesson 2 - Styles of Writing
No ratings yet
#Lesson 2 - Styles of Writing
11 pages
B22EE010 Report
No ratings yet
B22EE010 Report
9 pages
Strangers
No ratings yet
Strangers
8 pages
M PDF
No ratings yet
M PDF
13 pages
Mis 09
No ratings yet
Mis 09
31 pages
Experiment 1
No ratings yet
Experiment 1
19 pages
BCOC Outstanding 24 Oktober 2023
No ratings yet
BCOC Outstanding 24 Oktober 2023
12 pages
Deep Learning Perceptron
No ratings yet
Deep Learning Perceptron
10 pages
ML Lab
No ratings yet
ML Lab
14 pages
Principal Component Analysis: #Question 1
No ratings yet
Principal Component Analysis: #Question 1
6 pages
ML Short Code - Under Updating
No ratings yet
ML Short Code - Under Updating
4 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
47 pages
V
No ratings yet
V
8 pages
ML II Lab
No ratings yet
ML II Lab
5 pages
PDSLab Manual EXP7
No ratings yet
PDSLab Manual EXP7
6 pages
Final ML File
No ratings yet
Final ML File
34 pages
ML Lab Experiment Shortened With Same Output
No ratings yet
ML Lab Experiment Shortened With Same Output
6 pages
Print Out ML - Finallllllllllllllll
No ratings yet
Print Out ML - Finallllllllllllllll
11 pages
Unit1 ML Programs
No ratings yet
Unit1 ML Programs
5 pages
Unit2 ML Programs
No ratings yet
Unit2 ML Programs
7 pages
7034 1713335587607 Dimensionality - Reduction - Ipynb Colaboratory
No ratings yet
7034 1713335587607 Dimensionality - Reduction - Ipynb Colaboratory
4 pages
7034 1713335587607 Dimensionality - Reduction - Ipynb Colaboratory
No ratings yet
7034 1713335587607 Dimensionality - Reduction - Ipynb Colaboratory
4 pages
DM LabManual Teena
No ratings yet
DM LabManual Teena
6 pages
DNN ALL Practical 28
No ratings yet
DNN ALL Practical 28
34 pages
My NoteBook
No ratings yet
My NoteBook
17 pages
Risk Assessment Table New Version
No ratings yet
Risk Assessment Table New Version
4 pages
Roll NO 2020
No ratings yet
Roll NO 2020
8 pages
Pickle Brand Auditing and Strengthening
No ratings yet
Pickle Brand Auditing and Strengthening
34 pages
QIG Quick Installation Guide DCU 305 R3
No ratings yet
QIG Quick Installation Guide DCU 305 R3
2 pages
Upgrading Cimplicity 6.1 To 8.1 License Issue
No ratings yet
Upgrading Cimplicity 6.1 To 8.1 License Issue
2 pages
16BCB0126 VL2018195002535 Pe003
No ratings yet
16BCB0126 VL2018195002535 Pe003
40 pages
Saqs Methods Cog T and D
No ratings yet
Saqs Methods Cog T and D
2 pages
November 09
No ratings yet
November 09
2 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
23 pages
ASIC Implementation of Efficient 16-Parallel Fast FIR Algorithm Filter Structure
No ratings yet
ASIC Implementation of Efficient 16-Parallel Fast FIR Algorithm Filter Structure
5 pages
5 People Who Disappeared But Would Reappear Years Later
No ratings yet
5 People Who Disappeared But Would Reappear Years Later
5 pages
Image Classification Handson-Image - Test
No ratings yet
Image Classification Handson-Image - Test
5 pages
Project Charter Template
No ratings yet
Project Charter Template
9 pages
Image Processing
No ratings yet
Image Processing
5 pages
Book Report Choice Board 1
No ratings yet
Book Report Choice Board 1
1 page
Bone Forming Tumors
No ratings yet
Bone Forming Tumors
81 pages
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
César Pérez López
No ratings yet
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: NAIVE BAYES, NEAREST NEIGHBORS and NEURAL NETWORKS: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: NAIVE BAYES, NEAREST NEIGHBORS and NEURAL NETWORKS: Examples with MATLAB
César Pérez López
No ratings yet
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
From Everand
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
César Pérez López
No ratings yet
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet

Classification and Dimension Reduction: Load Dataset

Uploaded by

Classification and Dimension Reduction: Load Dataset

Uploaded by

11/6/24, 11:58 AM HW4

Classification and dimension reduction

In [1]: # import packages

# For dimensionality reduction

# For classification models

In [2]: from sklearn.datasets import load_breast_cancer

Part 1: Train test split (5 pts)

In [3]: # Split the data into training and testing sets

Part 2: Dimension Reduction and visualization (25 pts)

In [4]: # 1. Standard PCA

# 2. Kernel PCA with RBF kernel

# 3. Locally Linear Embedding (LLE)

# Plot Standard PCA

# Plot Kernel PCA

# Add general plot labels

Part 3: Train classification models (50 pts)

3(b) Do dimension reduction

In [5]: # Apply LLE for dimension reduction

3(c) Write a function to draw decision region (5 pts)

1. Function docstring and inline comments are required.

In [6]: def plot_decision_regions(model, X, y, feature_names=('Dim 1', 'Dim 2'), class_labels=

# Create a mesh grid

# Make predictions on the grid

# Plot decision boundary using contour plot

# Scatter plot for actual data points

# Add legend if class labels provided

3(d) Train k-nearest-neighbors model (10 pts)

1. Try different k values and report all test accuracies (5 pts)

In [7]: #### Your code here

# Train KNN with best k and plot decision regions

KNN with k=1: Test Accuracy = 0.92

3(e) Train logistic regression model (5 pts)

In [8]: #### Your code here

Logistic Regression Test Accuracy = 0.69

3(f) Train decision tree (10 pts)

1. Try different max_depth and report all test accuracies (5 pts)

In [9]: #### Your code here

Decision Tree with max_depth=3: Test Accuracy = 0.94

3(g) Train random forest (10 pts)

In [10]: #### Your code here

Random Forest with max_depth=5, n_estimators=50: Test Accuracy = 0.94

3(h) Train max voting classifier (5 pts)

In [11]: #### Your code here

Voting Classifier Test Accuracy = 0.95

You might also like