0% found this document useful (0 votes)
78 views4 pages

CRT2 LDA Assignment

This document provides instructions for a peer review assignment involving linear discriminant analysis (LDA). It includes instructions to work through a notebook answering questions and solving problems related to LDA. Marking schemes are provided for theoretical questions and practical problems. The document then introduces LDA, showing how it differs from principal component analysis (PCA) in performing dimensionality reduction while maximizing class separation. Students are asked to use LDA and PCA on wine dataset to visualize the projected 2D data and compare how well each method preserves class structure.

Uploaded by

rasaraman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
78 views4 pages

CRT2 LDA Assignment

This document provides instructions for a peer review assignment involving linear discriminant analysis (LDA). It includes instructions to work through a notebook answering questions and solving problems related to LDA. Marking schemes are provided for theoretical questions and practical problems. The document then introduces LDA, showing how it differs from principal component analysis (PCA) in performing dimensionality reduction while maximizing class separation. Students are asked to use LDA and PCA on wine dataset to visualize the projected 2D data and compare how well each method preserves class structure.

Uploaded by

rasaraman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Peer Review Assignment 2 - Part II¶

Name:
Date: 12 September 2021
Instructions¶
• Work through the notebook, answer all questions, and do all problems
• You are allowed to consult the internet, and discuss on the module forum
• Your answers and solutions to the problems should be added to this notebook
• Submit your final work as an html file
• Note that the solutions to the problems used python version 3.6.4.
Marking Scheme (Theoretical Questions)¶
• All questions are marked out of 3.
• No valid answer: 0 marks
• Demonstration of grasp of basic idea: 1 mark
• 'Perfect' answer: 3 marks
Marking Scheme (Practical Problems)¶
• All problems are marked out of 5.
• No valid answer: 0 marks
• Demonstration of grasp of basic idea: 2 mark
• Working code: 5 marks

Linear Discriminant Analysis (LDA)¶


The PCA encountered in the previous exercise can be viewed as a dimensionality
reduction scheme, projecting onto the directions with maximal variance.
LDA is also a dimensionality reduction scheme but operates on a very different
principle. Now we are given data that belongs to different classes. We are given both
the data value $x$ and a class label $y$ If we have $k$ classes then $y$ will take on
$k$ labels, in Python typicall the values 0 through $k-1$.
The idea is to project the data onto a lower dimensional space in such a way that
maximal class separation is achieved in the lower dimensional space.
You can learn more about the scikit-learn implementation at http://scikit-
learn.org/stable/modules/generated/sklearn.lda.LDA.html
You will investigate the difference between PCA using the wine data set, for more
information see http://archive.ics.uci.edu/ml/datasets/Wine Since the wine dataset is
13 dimensional the difference between PCA and LDA is more pronounced than say,
with the Iris data set.
We project down to 2 dimensions for easy visualization. In fact, since there are only 3
classes, one does not retain any more information by using higher dimensions.
Import packages¶
In [1]:
%matplotlib inline
%load_ext autoreload
%autoreload 2

import numpy as np
from matplotlib import pylab as plt
from sklearn.decomposition import PCA
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis as LDA

# source
This study Import different
was downloaded modules from
by 100000803816150 forCourseHero.com
using with the notebook
on 03-13-2022 12:08:48 GMT -05:00

https://www.coursehero.com/file/131484602/CRT2-LDA-Assignmenthtml/
from IPython.display import display
from IPython.display import Image

Simple example¶
As a warmup run the example from the scikit-learn website.
In [2]:
# Create synthetic data
X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
y = np.array([1, 1, 1, 2, 2, 2])

# Instantiate & fit the model: LDA


clf = LDA()
clf.fit(X, y)

print(clf.predict([[-0.8, -1]]))
[1]

Loading the data¶


Read the data, extract the class labels from the last column, then extract the names of
the classes using the convenient set function in Python.
In [3]:
# import training data
wine_train = np.loadtxt('./data/wine/wine_train.txt',delimiter = ',')
wine_train_labels = wine_train[:,-1]
wine_train_classes = list(set(wine_train_labels))
wine_train_classes = np.array(wine_train_classes, dtype=int)
wine_train_labels = np.array(wine_train_labels, dtype = int)
wine_train = wine_train[:,:-1]

# import testing data


wine_test = np.loadtxt('./data/wine/wine_test.txt', delimiter = ',')
wine_test_labels = wine_test[:,-1]
wine_test_classes = list(set(wine_test_labels))
wine_test_classes = np.array(wine_test_classes, dtype=int)
wine_test_labels = np.array(wine_test_labels, dtype = int)
wine_test = wine_test[:, :-1]

PCA¶
Problem 1: (5 marks)¶
Project the data onto 2 PCA components and display the classes of the dimension-
reduced data.
You should see something like:
In [4]:
# Insert code to produce the image below

# fit the model on training data


pca = PCA(n_components=2)
pca.fit(wine_train)
pr_data = pca.transform(wine_test)

# Plot the 3 classes


col = ['r*','yo','k+']
for
This study clwasin
source wine_test_classes:
downloaded by 100000803816150 from CourseHero.com on 03-13-2022 12:08:48 GMT -05:00

https://www.coursehero.com/file/131484602/CRT2-LDA-Assignmenthtml/
cl_labels = np.array([wine_test_labels==cl]).flatten()
dat_cl = pr_data[cl_labels,:]
plt.plot(dat_cl[:,0],dat_cl[:,1],col[int(cl-1)])

plt.title('The projection onto 2 PCA components')


plt.show()

In [5]:
display(Image(filename='./Wine_PCA.png'))

LDA¶
Problem 2:(5 marks)¶
Fit an LDA model to the data, using 2 components and display the different classes of
the projected data.
You should see:
In [6]:
# Insert code to produce the image below

# Fit LDA on training data


lda = LDA(n_components=2)
lda.fit(wine_train, wine_train_labels)

# Transform training and test data


This study source was downloaded by 100000803816150 from CourseHero.com on 03-13-2022 12:08:48 GMT -05:00

https://www.coursehero.com/file/131484602/CRT2-LDA-Assignmenthtml/
wine_train_lda = lda.transform(wine_train)
wine_test_lda = lda.transform(wine_test)

# Plot the 3 classes


col = ['r*','yo','k+']
means = np.zeros((2,3))

for cl in wine_train_classes:
cl_labels = np.array([wine_test_labels == cl]).flatten()
wine_cl = wine_test_lda[cl_labels, :]

means[:, int(cl-1)] = np.mean(wine_cl, axis=0)

plt.plot(wine_cl[:,0], wine_cl[:,1], col[int(cl-1)])

plt.title('Transformation of test data, 2 LDA components')


plt.show()

In [7]:
display(Image(filename='./LDA_pr.png'))

That the LDA projection is much better at preserving the class structure.
This study source was downloaded by 100000803816150 from CourseHero.com on 03-13-2022 12:08:48 GMT -05:00

https://www.coursehero.com/file/131484602/CRT2-LDA-Assignmenthtml/
Powered by TCPDF (www.tcpdf.org)

You might also like