Week 2 exercises-SOLN
Week 2 exercises-SOLN
• The exercise below should be completed using tools provided by numpy arrays.
• You can use other given/developed functions from class notebook
#import numpy as np
%pylab inline
from pathlib import Path
import requests
import gzip
from sklearn.datasets import make_blobs
from scipy import stats as st
Exercise 1
• Create a function get_MNIST() that downloads (or loads) the MNIST data, reshapes it,
and stores it appropriately into the variables images and labels.
def get_MNIST():
#download data
mnist_url = "http://yann.lecun.com/exdb/mnist/"
img_file = "train-images-idx3-ubyte.gz"
labels_file = "train-labels-idx1-ubyte.gz"
return images,labels
images,labels = get_MNIST()
Exercise 2
• Create a function visualize_6_digits() that creates a figure with subpanels to visualize 6
MNIST images. The 6 indices must be passed as an input.
def visualize_6_digits(image_ids):
fig,ax = plt.subplots(1,6,figsize=(12,12))
for i in range(6):
ax[i].imshow(images[image_ids[i]].reshape(28,28),cmap="Greys")
return
image_ids = [5,2,10,44,25,5]
visualize_6_digits(image_ids)
Exercise 3
• Create a find_kNN() function and apply it to the example blob dataset to correctly find
the 10 nearest neighbors.
• [Hint, use your function from last week that computes Euclidean distances.]
def dist(X, w):
return sqrt(sum( (X-w)**2,axis=1))
def find_kNN(p,k):
return argsort(dist(X, p))[:k]
X, y = make_blobs(
n_samples = 200,
n_features = 2,
centers = 5,
cluster_std = 1,
random_state = 1
)
w = array([0, 0])
nearest_neighbors = find_kNN(p=w,k=10)
nearest_neighbors
array([ 79, 74, 43, 90, 71, 45, 185, 176, 22, 96])
scatter(X[:,0],X[:,1],c=y);
scatter(w[0],w[1],marker='*',c='black',s=100)
scatter(X[nearest_neighbors,0],X[nearest_neighbors,1],s=1,c='black',);