100% found this document useful (2 votes)
62 views9 pages

K-Nearest Neighbor On Python Ken Ocuma

The document discusses the k-nearest neighbors (k-NN) algorithm, a non-parametric classification and regression method. It explains that k-NN involves finding the k closest training examples in feature space to a new data point. For classification, a majority vote of the neighbors' classes determines the new point's class, while for regression, the new point takes the average value of its neighbors. The document also provides an example Python code to implement k-NN classification on a sample dataset.

Uploaded by

Aliyha Dionio
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
100% found this document useful (2 votes)
62 views9 pages

K-Nearest Neighbor On Python Ken Ocuma

The document discusses the k-nearest neighbors (k-NN) algorithm, a non-parametric classification and regression method. It explains that k-NN involves finding the k closest training examples in feature space to a new data point. For classification, a majority vote of the neighbors' classes determines the new point's class, while for regression, the new point takes the average value of its neighbors. The document also provides an example Python code to implement k-NN classification on a sample dataset.

Uploaded by

Aliyha Dionio
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 9

Reported by:Kenn Rolph Ocuma

BSCS 3-A
In pattern recognition, the k-nearest neighbors algorithm (k-
NN) is a non-parametric method used for classification and
regression. In both cases, the input consists of the k closest
training examples in the feature space. The output depends on
whether k-NN is used for classification or regression:
-In k-NN classification, the output is a class membership. An object is classified
by a majority vote of its neighbors, with the object being assigned to the class
most common among its k nearest neighbors (k is a positive integer, typically
small). If k = 1, then the object is simply assigned to the class of that single
nearest neighbor.

-In k-NN regression, the output is the property value for the object. This value
is the average of the values of its k nearest neighbors.
k-NN is a type of instance-based learning, or lazy learning, where the
function is only approximated locally and all computation is deferred
until classification. The k-NN algorithm is among the simplest of all
machine learning algorithms.

Both for classification and regression, a useful technique can be used to


assign weight to the contributions of the neighbors, so that the nearer
neighbors contribute more to the average than the more distant ones.
For example, a common weighting scheme consists in giving each
neighbor a weight of 1/d, where d is the distance to the neighbor.
The neighbors are taken from a set of objects for which the class (for k-
NN classification) or the object property value (for k-NN regression) is
known. This can be thought of as the training set for the algorithm,
though no explicit training step is required.

A peculiarity of the k-NN algorithm is that it is sensitive to the local


structure of the data.The algorithm is not to be confused with k-means,
another popular machine learning technique.

Lets imagine we have a scenario with 2 categories and take into


consideration 2 indipendent variables, and add a new point. Where
should it fall, in the green or red data point area?
To solve this problem we first we need to choose the number K
neighbors (usually 5) according to the euclidian distances. We can recall
from high school the Euclidean distance formula:
To implement K-N in Python we first need to create our classifier
through the sklearn.neighbors library and KNeighbors class, and
create our object classifier and specify the number of neighbors, the
metric we want to implement (in this case the Euclidean distance) and
type ‘minkowski’.
#Data Preprocessing

# Importing the Library


import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Importing the dataset


dataset= pd.read_csv('Data.csv')
X = dataset.iloc[: , [2, 3]].values
Y = dataset.iloc[: , 4].values

# Feature Scaling

from sklearn.preprocessing import


StandardScaler
sc_X = StandardScaler()
X_train = sc_X.fit_transform(X_train)
X_test = sc_X.transform(X_test)

# Fitting Classifier to the Training set


from sklearn.neighbors import
KNeighborsClassifier
classifier =
KNeighborsClassifier(n_neighbors = 5,
metric = 'minkowski', p = 2)
classifier.fit(X_train, y_train)

# Predicting the Test set results


y_pred = classifier.predict(X_test)

Next we fit our classifier to our training set and create our
confusion matrix. Finally we visualise our results.
# Data Preprocessing

# Importing the Library


import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Importing the dataset


dataset= pd.read_csv('Data.csv')
X = dataset.iloc[: , [2, 3]].values
Y = dataset.iloc[: , 4].values

# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc_X = StandardScaler()
X_train = sc_X.fit_transform(X_train)
X_test = sc_X.transform(X_test)

# Fitting Classifier to the Training set


from sklearn.neighbors import KNeighborsClassifier
classifier = KNeighborsClassifier(n_neighbors = 5, metric = 'minkowski', p = 2)
classifier.fit(X_train, y_train)

# Predicting the Test set results


y_pred = classifier.predict(X_test)

# Making the Confusion Matrix


fromsklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)

# Visualising the Training set results


from matplotlib.colors import ListedColormap
X_set, Y_set = X_train, y_train
X1, X2 = np.meshgrid(np.arrange(start = X_set[:, 0].min() - 1, stop = X_set[:, 0].max() +
1, step = 0.01),
np.arrange(start = X_set[:, 1].min() - 1, stop = X_set[:, 1].max() + 1, step = 0.01)
plt.contourf(X1, X2, classifier.predict(np.array([X1.rave(),
X2.ravel()]).T).reshape(X1.shape),
alpha = 0.75, cmap = ListedColormap(('red', 'Green')))
plt.xlim(X1.min(), X1.max())
plt.ylim(X1.min(), X1.max())
for i, j in emunerate(np.unique(y_set)):
plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1]
c = ListedColormap(('red', 'green'))(i), label = j)
plt.title('K-NN (Training set)')
plt.xlabel('Age')
plt.ylabel('Estimated Salary')
plt.legend()
plt.show()

# Visualising the Test set results


from matplotlib.colors import ListedColormap
X_set, Y_set = X_train, y_train
X1, X2 = np.meshgrid(np.arrange(start = X_set[:, 0].min() - 1, stop = X_set[:, 0].max() +
1, step = 0.01),
np.arrange(start = X_set[:, 1].min() - 1, stop = X_set[:, 1].max() + 1, step = 0.01)
plt.contourf(X1, X2, classifier.predict(np.array([X1.rave(),
X2.ravel()]).T).reshape(X1.shape),
alpha = 0.75, cmap = ListedColormap(('red', 'Green')))
plt.xlim(X1.min(), X1.max())
plt.ylim(X1.min(), X1.max())
for i, j in emunerate(np.unique(y_set)):
plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1]
c = ListedColormap(('red', 'green'))(i), label = j)
plt.title('K-NN (Test set)')
plt.xlabel('Age')
plt.ylabel('Estimated Salary')
plt.legend()
plt.show()

You might also like