0% found this document useful (0 votes)
0 views3 pages

Dynamic KNNF

This study presents a dynamic model for determining the value of 'K' in the K Nearest Neighbour (KNN) algorithm, enhancing accuracy by calculating 'K' locally for each data point. The proposed method addresses limitations of traditional KNN, particularly in large and multi-dimensional datasets, by adapting the voting criteria based on local data density. Results demonstrate that the dynamic KNN model consistently achieves equal or improved accuracy compared to classical KNN across various datasets.

Uploaded by

tanya.tripathi20
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
0 views3 pages

Dynamic KNNF

This study presents a dynamic model for determining the value of 'K' in the K Nearest Neighbour (KNN) algorithm, enhancing accuracy by calculating 'K' locally for each data point. The proposed method addresses limitations of traditional KNN, particularly in large and multi-dimensional datasets, by adapting the voting criteria based on local data density. Results demonstrate that the dynamic KNN model consistently achieves equal or improved accuracy compared to classical KNN across various datasets.

Uploaded by

tanya.tripathi20
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 3

K Nearest Neighbour: A Dynamic Model

Tanya Tripathi #1, Dr Deevena Raju *2,


#
B. TECH III rd year FST, IFHE, Hyderabad
1
[email protected]
*
Faculty of Science and Technology
IFHE, Hyderabad
2
[email protected]

Abstract- This study proposes a novel dynamic pre-processed, and then both models will be tested
model to allocate the value of 'K' in the K Nearest to compare their accuracy.
Neighbour algorithm based on geometrical II. LITERATURE REVIEW
calculations. As we know that the K-nearest The traditional K NEAREST NEIGHBOUR (KNN)
neighbour algorithm is a well-known machine-
Was first proposed by Cover and Hart in 1967
learning algorithm widely used to classify data. It
works to find the k nearest neighbours to the
faces certain disadvantages when working on a
unknown data point [1]. It takes the majority of
large dataset or data with several dimensions.
votes from all k data points and assigns the majority
One of the significant factors that affect the voted value. However, KNN model was improved
accuracy of the model is the value of K. in different ways over the years, resulting in
Fundamentally; the value of K is fixed for all data variations in the original model. This model has a
points. In this study, the value is calculated at the number of notable variations, including weighted
local level for all test points. This study performs a KNN by Saha et al in 1991 [2]. In weighted KNN,
comparative study between suggested model and all votes are not treated equally. The closer
the traditional KNN model. neighbour influences more through its voting power
I. INTRODUCTION Another method was proposed by Gongde Guo1
It's the same principle as one size does not fit all, and Hui Wang to overcome the limitation of being
so we are looking for different values of k for dependent on Fixed value of k, we outline certain
different data points and evaluating its impact on regions to represent classified data points set [3]. In
the accuracy of the model by dynamically the selection of each representative set, we use the
allocating the value of K. To improve the prediction optimal but different k allocated by the dataset itself
of unknown values, we hypothesize that removing to abolish dependency on k without the user's
the dependency on K and shifting the voting criteria intervention. Another approach includes Radius
to the density of data points surrounding the test Neighbour Classification. In this approach, a certain
subject will improve the prediction. In this model, radius is fixed for a particular dataset. All the points
we are constructing geometric constraints which are within the circle area are considered for voting and
flexible as per the location of each data point and the majority vote is applied to an unknown data
also act as barriers to avoid taking votes from point which was proposed by Leo Breiman and
outliers and reduce the impact of noise in the Jerome Friedman in 1984 [4]. More often than not,
decision-making process. The datasets will first be data is available in steady and sustained form as
values aren't classified. In such cases, we take the
mean sum of values from all the nearest voters to
evaluate the value of the test data point as suggested
by Altman in 1992 [5]. Inducing other major
geometrical findings are often fruitful for the
accuracy and efficiency of the model as in the case
of dynamic nearest neighbour queries in Euclidean
space by Mohammed Einus Ali as it revolves
around the construction of Voronoi diagrams [6].
III. METHODOLGY
As one shoe can't fit all similarly one value of K FIG 1: WORKING MODEL

isn't optimal for all test points. In this model, we The next step is to find all the training points
calculate the value of K of each data point using within a radius equivalent to twice the distance
geometry. from the nearest data point. The majority voting is
First, we calculate the distance of all points to the test taken from all points in the area. The unknown data
point. Then we find the index of the closest point and the point takes the value of the majority vote. We
distance between both the test point and the nearest repeat this process for all the test data points.
data point. Let us call this distance as "r''.
Now we calculate K as half of the value of ''r'' We The following is the step by step algorithm of the
did this because we were required to set the value process:
Step 1 : Load the data set
of K to a flexible value that is yielding and
Step 2 : Split the dataset into test and train
adaptable to the close neighbourhood of the test Step 3 : Create a function to predict the
point. If we declare a fixed value of K it might be values
3.1 : Calculate distance from test point
troublesome where the local density of data points to all data point
varies drastically. Over fitting can occur when 3.2 : Find the index of the nearest
neighbour and its distance
noisy or isolated data points dominate the 3.3 : Calculate K as half the distance to
classification choice if k is too small. Under fitting nearest neighbour
3.4 : Find all the training points within
can occur when the classification judgments are a radius of let us say 2d
influenced by data points that are too far from the 3.5 : Take a majority vote of the classes
test point if k is set too high. Conversely, if the data of the training points with in the
radius
points are more densely packed, the closest 3.6 : Make predictions for the test
neighbours would be nearer and the value of K will points
Step 4 : Calculate the accuracy of the
be less, and if the data points are more sparsely model.
packed, the nearest neighbour would be further IV. RESULTS AND DECISION
away and we require a bigger value of K.
As a next step, we will compare the predictions
of both algorithms on various data sets. To analyse
these algorithms, we have taken publically available
data sets, such as diabetes, wine, heart disease,
Seeds, and glass. All pre-processing steps have
been performed on all datasets and they are greater accuracy is overcoming the fixed nature of
adequate for the modelling process. value of K and locally determining the value of K
Given below is the table and graph for for each test point and while voting it also restricts
comparison the influence of data points that are beyond certain
range.

Table 1: This table consists of comparison in terms of accuracy


of both algorithms. VI. FUTURE SCOPE
The model can be extended to regression and
Dataset No. of KNN Dynamic KNN
Observations Accuracy Accuracy clustering algorithms in the future, and it can be
Heart Disease 303 66.7 73.3 tested on various datasets and the accuracy patterns
Wine 178 88.9 88.9 of both algorithms will be compared and also model
Diabetes 768 74 74.7 can be further extended to 10 fold cross validation.
Seeds 351 90.5 95.2
References
Glass 214 100 100
[1] T. M. Cover and P. E. Hart, "Nearest neighbor
pattern classification," in IEEE Transactions on
Information Theory, vol. 13, no. 1, pp. 21-27,
January 1967. doi: 10.1109/TIT.1967.1053964.
[2] Saha, D. W., Kibler, D., & Albert, M. K.
(1991). Instance-based learning algorithms.
Machine learning, 6(1), 37-66. doi:
10.1023/A:1022683801
[3] Guo, G., & Wang, H. (2004). A fuzzy k-nearest
neighbor algorithm. IEEE Transactions on
Systems, Man, and Cybernetics, Part B
(Cybernetics), 34(4), 2021-2033.
FIG 2: LINE GRAPH SHOWING COMPARISON BETWEEN BOTH
[4] Breiman, L., & Friedman, J. H. (1984).
V. CONCULSION Classification and regression trees. CRC press.
[5] Altman, N. S. (1992). An introduction to kernel
We have tested our model on various data and nearest-neighbor nonparametric
sets like heart diseases, wine, diabetes, seed and regression. The American Statistician, 46(3),
glass. In all the tested cases above the dynamic 175-185.
KNN has provided accuracy equal to or greater than [6] Ali, M. E. (2016). Dynamic nearest neighbor queries in euclidean
classical KNN model. One of the main reasons for space using voronoi diagrams. IEEE Access, 4, 2996-3010. doi:
10.1109/ACCESS.2016.2566778

You might also like