Enhancing DBSCAN Algorithm For Data Mining
Enhancing DBSCAN Algorithm For Data Mining
Abstract—Today data mining is widely used by companies performance. In 2016, Hahsler, Michael, and Matthew
with a strong consumer focus like retail, financial, Bolaños[11], presented a paper on Clustering data streams
communication and marketing organizations. Here technically based on shared density between micro-clusters in which
data mining is the process of extraction of required information space and time complexities are discussed.In 2013, Joshi,
from huge databases. It allows users to analyze data from many Aastha, and Rajneet Kaur[12], proposed a riview paper in
different dimensions or angles, categorize it and summarize the which a comparative study of various clustering techniques in
relationships identified. The ultimate goal of this paper is to data mining is discussed.In 2013, Nagpal , et al. [14] presented
propose a methodology for the improvement in DB-SCAN a riview paper on data clustering algorithms in which it is
algorithm to improve clustering accuracy. The proposed
observed that there is no optimal solution for handling
improvement is based on back propagation algorithm to calculate
problems with large data sets of mixed and categorical
Euclidean distance in the dynamic manner.Also this paper shows
the obtained results of implemented proposed and existing attributes. In 2012, Shah, Glory H.[17], proposed a paper in
methods and it compares the results in terms of its execution time which a new approach towards density based clustering
and accuracy. approach is discussed.In 2011, Pooja Batra, et al.[15]
presented a paper in which comparative study of density based
Keywords—Data Mining; DBSCAN; I-DBSCAN Clustering; clustering algorithms is perfomed based on several
MATLAB parameters.In 2006, Donghai, Zeng. [18], proposed a paper
which includes the Study of Clustering Algorithm Based on
Grid-Density and Spatial Partition Tree.In 2005, Moreira, et
I. INTRODUCTION al.[16] presented a paper in which density based clustering is
Data Mining is among one of the promising technology in performed on DBSCAN and SNN here, the role of the
the field of computer science[1] which is basically used for clustering algorithms is to identify clusters of POIs and then
extraction of information from a large collection of data, it use the clusters to automatically characterize geographic
mainly deals with large databases[2].Data Mining is mainly a regions.In 2004,El-Sonbaty, et al.[13], proposed a paper in
technique of analyzing data and converts that data into useful which density based clustering is performed on large
information or knowledge for decision making [3].Data datasets,Synthetic datasets are used for experimental
Mining usually takes data as its input and gives knowledge as evaluation which shows that the new clustering algorithm is
the required output. faster and more scalable than the original DBSCAN.
Data mining can be done through various approaches or by
applying a lot of algorithms available for data mining process, III. PROPOSED METHODOLOGY
among them clustering is one of the important algorithm. The density based technique is the type of algorithm in
Clustering simply means collecting and presenting similar data which density of the whole dataset is calculated and most
items [4]. The process of finding similarities between data and dense region is calculated to find similarity between the
makes groups of those similar data items into clusters is called elements of the dataset.The complete implementation of the
clustering [1]. Clustering can be performed by its various work is shown by using the flow of work in figure shown
algorithms among which some are based on density which are below :
called density based clustering algorithms, DBSCAN is also a In the existing work, technique of density based clustering
density based clustering algorithm which is used in this paper is applied in which density of whole dataset is calculated
for the process of data mining. and dense region is calculated. On the Dense region EPS
value is calculated to analyze similarity between the
II. LITERATURE REVIEW elements. The Euclidian distance is applied to analyze
similarity between the elements. The EPS is calculated in
On the basis of past literatures and articles some of the the dynamic order to achieve maximum accuracy. The
research works are discussed in this section in the domain of Euclidian distance is calculated in the static manner due to
data mining that are as follows : which accuracy is not achieved at the maximum point.In
In 2015, Ahmad M. Bakr, et al.[10],proposed a paper in this work, improvement in DBSCAN algorithm has been
which the proposed algorithm enhances the incremental proposed which calculate Euclidian distance in the iterative
clustering process which results in sig10 manner to increase accuracy of clustering.
nificant improvement in
1634
International Conference on Energy, Communication, Data Analytics and Soft Computing (ICECDS-2017)
1635
International Conference on Energy, Communication, Data Analytics and Soft Computing (ICECDS-2017)
.
Fig. 6. Euclidean distance value calculation
1636
International Conference on Energy, Communication, Data Analytics and Soft Computing (ICECDS-2017)
1637
International Conference on Energy, Communication, Data Analytics and Soft Computing (ICECDS-2017)
[7] Gupta, Swati. "A Regression Modeling Technique on Data databases."Tools with Artificial Intelligence, 2004. ICTAI 2004.
Mining." International Journal of Computer Applications 116.9 16th IEEE International Conference on. IEEE, 2004.
(2015). [14] Nagpal, Arpita, Arnan Jatain, and Deepti Gaur. "Review based on
[8] Singh, Yashpal, and Alok Singh Chauhan. "Neural networks in data clustering algorithms." Information & Communication
data mining." Journal of Theoretical and Applied Information Technologies (ICT), 2013 IEEE Conference on. IEEE, 2013.
Technology 5.6 (2009): 36-42. [15] Nagpal, Pooja Batra, and Priyanka Ahlawat Mann. "Comparative
[9] Maheshwari, Aayushi, Garima Kharbanda, and Harsh Patel. study of density based clustering algorithms." International Journal
"Association Rules in Data Mining." of Computer Applications 27.11 (2011): 421-435.
[10] Bakr, Ahmad M., Nagia M. Ghanem, and Mohamed A. Ismail. [16] Moreira, Adriano, Maribel Y. Santos, and Sofia Carneiro.
"Efficient incremental density-based algorithm for clustering large "Density-based clustering algorithms–DBSCAN and SNN."
datasets." Alexandria Engineering Journal 54.4 (2015): 1147-115 University of Minho-Portugal (2005).
[11] Hahsler, Michael, and Matthew Bolaños. "Clustering data streams [17] Shah, Glory H. "An improved DBSCAN, a density based
based on shared density between micro-clusters." IEEE clustering algorithm with parameter selection for high dimensional
Transactions on Knowledge and Data Engineering 28.6 (2016): data sets." Engineering (NUiCONE), 2012 Nirma University
1449-1461. International Conference on. IEEE, 2012.
[12] Joshi, Aastha, and Rajneet Kaur. "A review: Comparative study of [18] Donghai, Zeng. "The Study of Clustering Algorithm Based on
various clustering techniques in data mining." International Journal Grid-Density and Spatial Partition Tree." XiaMen University, PRC
of Advanced Research in Computer Science and Software (2006).
Engineering 3.3 (2013).
[13] El-Sonbaty, Yasser, M. A. Ismail, and Mohamed Farouk. "An
efficient density based clustering algorithm for large
1638