0% found this document useful (0 votes)
54 views12 pages

An Improved K-Means Cluster Algorithm Using Map Reduce Techniques To Mining of Inter and Intra Cluster Datain Big Data Analytics

The document summarizes an improved K-means clustering algorithm using MapReduce techniques for mining inter-cluster and intra-cluster data in big data analytics. It discusses related work on improving K-means clustering. It also provides a brief overview of cluster analysis methods in data mining, including partitioning and hierarchical clustering algorithms. The proposed algorithm aims to perform better at handling circularly distributed data points and overlapping clusters.

Uploaded by

jefferyleclerc
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views12 pages

An Improved K-Means Cluster Algorithm Using Map Reduce Techniques To Mining of Inter and Intra Cluster Datain Big Data Analytics

The document summarizes an improved K-means clustering algorithm using MapReduce techniques for mining inter-cluster and intra-cluster data in big data analytics. It discusses related work on improving K-means clustering. It also provides a brief overview of cluster analysis methods in data mining, including partitioning and hierarchical clustering algorithms. The proposed algorithm aims to perform better at handling circularly distributed data points and overlapping clusters.

Uploaded by

jefferyleclerc
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

International Journal of Pure and Applied Mathematics

Volume 119 No. 7 2018, 679-690


ISSN: 1311-8080 (printed version); ISSN: 1314-3395 (on-line version)
url: http://www.ijpam.eu
Special Issue
ijpam.eu

An Improved K-means Cluster algorithm using Map Reduce Techniques to


mining of inter and intra cluster datain Big Data analytics

T.Mohana Priya1 Dr.A.Saradha2


1
Research Scholar, Bharathiar University Coimbatore, Tamilnadu,
1
Dr.SNS Rajalakshmi College of Arts and Science, Coimbatore
2
Professor andHead, Department of Computer Science and Engineering Institute of Road and
Transport Technology, Erode, Tamilnadu, India

Abstract
k-means is one of the simplest unsupervised learning algorithms that solve the well known
clustering problem. The procedure follows a simple and easy way to classify a given data
set through a certain number of clusters fixed apriori. The main idea is to define k centers, one
for each cluster. These centers should be placed in a cunning way because
of different location causes different result. In this research work, Proposed algorithm will
perform better while handling clusters of circularly distributed data points and slightly
overlapped clusters.
Keywords : K-means algorithm, cluster, big data, hadoop, MapReduce, web logs

I.Introduction methods to gain a lot of information from


With the rapid development of mobile this valuable source becomes more vital.
Internet, cloud computing, Internet of things, Data mining algorithms should be processed
social network service, and other emerging via using suitable computing technique like
services, data is growing at an explosive rate distributed computing. Distributed
recently. How to achieve fast and effective computing is a model used to do high
analyses of data and then maximize the data computational processing over a set of
property’s benefits has become the focus of connected systems. Each individual system
attention. The “four Vs” model, variety, interconnected on the network is called a
volume, velocity, and value, for big data has node and the collection of many nodes that
made traditional methods of data analysis form a network is called a cluster. Clustering
inapplicable. Therefore, new techniques for is an essential method of data analysis
big data analysis such as distributed or through which the original data set can be
parallelized, feature extraction, and partitioned into several data subsets
sampling have been widely concerned. according to similarities of data points. It
becomes an underlying tool for outlier
Nowadays internet of things becoming one detection, biology, indexing, and so on. In
of the most important sources for data as the context of fuzzy clustering analysis, each
these data may be used in a lot of object in data set no longer belongs to a
application inside smart city which will help single group but possibly belongs to any
to make the life of the human more easy and group.
comfortable. The demand of data mining

679
International Journal of Pure and Applied Mathematics Special Issue

Apache Hadoop is an open-source software Likas et al. [5] proposed a global K-means
framework that supports data-intensive algorithm consisting of series of K-means
distributed applications, licensed under the clustering procedures with the number of
Apache v2 license. It supports the running of clusters varying from 1 to K. One
applications on large clusters of commodity disadvantage of the algorithm lies in the
hardware. Hadoop was derived from requirement for executing K-Means N times
Google’s MapReduce and Google File for each value of K, which causes high
System (GFS) papers. The Hadoop computational burden for large data sets.
framework transparently provides both
reliability and data motion to applications. Bradley and Fayyad [3] presented a refined
Hadoop implements a computational algorithm that utilizes K-means M times to
paradigm named MapReduce, where the M random subsets sampled from the original
application is divided into many small data. The most common initialization was
fragments of work, each of which may be proposed by Pena, Lozano et al. [6]. This
executed or re-executed on any node in the method is selecting randomly K points as
cluster. In addition, it provides a distributed centroids from the data set. The main
file system that stores data on the compute advantage of the method is simplicity and an
nodes, providing very high aggregate opportunity to cover rather well the solution
bandwidth across the cluster. Both space by multiple initialization of the
map/reduce and the distributed file system algorithm. Ball and Hall proposed the
are designed so that node failures are ISODATA algorithm [7], which is
automatically handled by the framework. estimating K dynamically. For selection of a
proper K, a sequence of clustering structures
II.Related Work can be obtained by running K-means several
times from the possible minimum Kmin to
To get more efficient and effective result of the maximum Kmax[12].
K-mean algorithm there have been a lot of These structures are then evaluated based on
research happened in previous day. All constructed indices and the expected
researchers worked on different view and clustering solution is determined by
with different idea. Krishna and Murty[4] choosing the one with the best index [8].
proposed the genetic K-means(GKA) The popular approach for evaluating the
algorithm which integrate a genetic number of clusters in K-means is the Cubic
algorithm with K-means in order to achieve Clustering Criterion [9] used in SAS
a global search and fast convergence. Enterprise Miner.

Jain and Dubes[1] recommend running the III. Cluster Analysis


algorithm several times with random initial
partitions. The clustering results on these Data mining is interdisciplinary topic which
different runs provide some insights into the can be defined in many various ways. There
quality of the ultimate clusters. Forgy’s are a number ofdata mining methods are
method [2] generates the initial partition by used to determine the types of patterns to be
first randomly selecting K points as found in data mining task. These methods
prototypes and then separating the include discrimination and characterizations,
remaining points based on their distance frequent patterns mining, correlations and
from these seeds. associations, classification and regression;
clustering analysis, outlier analysis.

680
International Journal of Pure and Applied Mathematics Special Issue

Clustering is one of the most exciting topics


in data mining. Clustering used in many Hierarchical Clustering algorithms:
application areas such as business There are two approaches to perform
intelligence, image pattern recognition, Hierarchical clustering techniques
biology, security, and Web search. The Agglomerative (top-bottom) and Divisive
objective of clustering is to explore intrinsic (bottom- top). In Agglomerative approach,
structures in data, and arrange them into initially one object is selected and
expressive subgroups. The basic concept of successively merges the neighbor objects
cluster analysis is the process of dividing based on the distance as minimum,
large data set of objects into small subsets. maximum and average. The process is
Each small subset is a single cluster, such continuous until a desired cluster is formed.
that the objects are clustered together The Divisive approach deals with set of
depending on the concept of minimizing objects as single cluster and divides the
interclass and maximizing the intraclass cluster into further clusters until desired no
similarity. Similarity and dissimilarity are of clusters are formed. BIRCH, CURE,
assessed based on the feature values ROCK, Chameleon, Echidna, Wards, SNN,
describing objects and various distance GRIDCLUST, CACTUS are some of
measures. We measures object’s similarity Hierarchical clustering algorithms in which
and dissimilarity by comparing objects with clusters of Non convex, Arbitrary Hyper
each other. These measures include distance rectangular are formed.
measures such as supremum distances,
Manhattan distance, and Euclidean distance, Density based Clustering algorithms:
between two objects of numeric data. Data objects are categorized into core
Cluster analysis is a vast topic and hence points, border points and noise points. All
there are many clustering algorithms the core points are connected together based
available to group datasets. on the densities to form cluster. Arbitrary
On the basis of implementation different shaped clusters are formed by various
clustering algorithm can be grouped together clustering algorithms such as DBSCAN,
into OPTICS, DBCLASD, GDBSCAN,
DENCLU and SUBCLU.
Partitioning Method
 K-means Grid based Clustering algorithms:
Grid based algorithm partitions the data set
 K- medoids
into no number of cells to form a grid
Hierarchical Method structure. Clusters are formed based on the
 Chameleon grid structure. To form clusters Grid
algorithm uses subspace and hierarchical
 BIRCH
clustering techniques. STING, CLIQUE,
Density Based Clustering Method Wave cluster, BANG, OptiGrid, MAFIA,
 OPTICS ENCLUS, PROCLUS, ORCLUS, FC and
 DBSCAN STIRR. Compare to all Clustering
algorithms Grid algorithms are very fast
Grid Based Clustering Method processing algorithms. Uniform grid
 CLIQUE algorithms are not sufficient to form desired
 STING clusters. To overcome these problem
Adaptive grid algorithms such as MAFIA

681
International Journal of Pure and Applied Mathematics Special Issue

and AMR Arbitrary shaped clusters are semi-structured and unstructured data,
formed by the grid cells. which is very difficult to handle with
traditional software tools. In many
IV.Methodology organizations, the volume of data is bigger
or it moves faster or it exceeds current
Big Data Analytics processing capacity. An example of big data
Big data analytics is the process of might be Petabytes (1,024 terabytes) or
examining big data to discover hidden Exabyte’s (1,024 petabytes) of data
containing billions to trillions of records of
patterns, unknown correlations and other
millions of various users—all from different
useful information that can be used to make sources such as social media, banking, web,
better decisions. To perform any kind of mobile, employees and customer’s data etc.
analysis on such large and complicated data, These types of data are typically loosely
scaling up the hardware platforms become structured data that is often incomplete and
necessary and choosing the right platforms inaccessible.
becomes a crucial decision to satisfy the
K-means algorithm
user’s requirement in fewer amounts of
time. There are various big data platforms The Lloyd's algorithm, mostly known as k-
available with different characteristics. To means algorithm, is used to solve the k-
choose a right platform for specific means clustering problem and works as
application one should have knowledge of follows. First, decide the number of clusters
the advantages and limitations of all these k. Then:
platforms. The platform you choose must be
Clustering is the process of partitioning a
able to cater to increased data processing
group of data points into a small number of
demands if it is appropriate to build the clusters. For instance, the items in a
analytics based solutions on a particular supermarket are clustered in categories
platform. (butter, cheese and milk are grouped in dairy
products). Of course this is a qualitative kind
This data comes from many different of partitioning. A quantitative approach
sources: The smart phones, the data they would be to measure certain features of the
generate and consume; sensors embedded products, say percentage of milk and others,
into everyday objects, which resulted in and products with high percentage of milk
billions of new and constantly updating data would be grouped together. In general, we
feed containing location, climate and other have n data points xi,i=1...n that have to be
information; posts to social media sites, partitioned in k clusters. The goal is to
digital photos and videos and purchase assign a cluster to each data point. K-means
transaction records. This data is called big is a clustering method that aims to find the
data. The first organizations to grab it were positions μi,i=1...k of the clusters that
online and startup firms. Firms such as minimize the distance from the data points
Facebook, Google and LinkedIn are built to the cluster. K-means clustering solves
around big data from the beginning. argminc∑i=1k∑x∈cid(x,μi)=argminc∑i=1k
∑x∈ci∥x−μi∥22
"Big Data" refers to data sets too where ci is the set of points that belong to
large and complicated containing structured, cluster i. The K-means clustering uses the

682
International Journal of Pure and Applied Mathematics Special Issue

square of the Euclidean distance  Forgy: set the positions of the k


d(x,μi)=∥x−μi∥22. This problem is not trivial clusters to k observations chosen
(in fact it is NP-hard), so the K-means randomly from the dataset.
 Random partition: assign a cluster
algorithm only hopes to find the global
randomly to each observation and
minimum, possibly getting stuck in a compute means as in step 3.
different solution.
Since the algorithm stops in a local
The K-means clustering algorithm is process
minimum, the initial position of the clusters
by usingMapReduce can be divided into the
following phases: is very important.
1. Initial
(i) The given input data set can be split into The pseudo code for k-means clustering
sub datasets. algorithm is given below:
The sub datasets are formed into <Key,
Value> lists. Input: Data points D, numbers of clusters k
And these <Key, Value> lists input into map Step 1: Slaves read their part of data
function.
(ii) Select k points randomly from the Step 2: do until global centroids to the slaves
datasets as initial Step 3: Master broadcasts the centroids to
clustering centroids.
the slaves
2. Mapper
a) Update the cluster centroids. Calculate the Step 4: Slaves assign data instances to the
distance
between the each point in given datasets and closest centroids
k Step 5: Slaves compute the new local
centroids.
b) Arrange each data to the nearest cluster centroids and local cluster sizes
until all the data Step 6: Slaves send local centroids and
have been processed.
c) Output <ai, zj> pair. And ai is the center cluster sizes to the master
of the cluster Step 7: Master aggregates local centroids
zj.
weighted by local cluster sizes into global
3. Reducer
(i) Read <ai, zj> from Map stage. Collect all centroids.
the data
records. And then output of k clusters and Output: Data points with cluster
the data memberships.
points.
(ii) Calculate the average of each cluster
which is selected
as the new cluster center.

Initializing the position of the clusters

It is really up to you! Here are some


common methods:

683
International Journal of Pure and Applied Mathematics Special Issue

of the implementation. Input is given as


<key,value> pair, where “key” is the cluster
mean and “value‟ is the serializable
implementation of a vector in the dataset.
The prerequisite to implement Map routine
and Reduce routine is to have two files. The
first one should involve clusters with their
centroids values and the other one should
have objects to be clustered. Chosen of
centroids and the objects to be clustered are
arranged in two spillited files is the initial
step to cluster data by K-means algorithm
using MapReduce method of Apache
Figure 1: K-means algorithm. Hadoop.
It can be done by following the
Training examples are shown as dots, and algorithm to implement MapReduceroutines
cluster centroids are shown as crosses. (a) for K-means clustering. The initial set of
Original dataset. (b) Random initial cluster centroid is stored in the input directory of
centroids. (c-f) Illustration of running two HDFS prior to Map routine call and they
form the “key‟ field in the <key, value>
iterations of k-means. In each iteration, we
pair. The instructions required to compute
assign each training example to the closest the distance between the given data set and
cluster centroid (shown by "painting" the cluster centroid fed as a <key, value> pair is
training examples the same color as the coded in the Mapper routine. The Mapper
cluster centroid to which is assigned); then function calculates the distance between the
we move each cluster centroid to the mean object value and each of the cluster
of the points assigned to it. centroidreferred in the cluster set and jointly
keeping track of the cluster to which the
In the clustering problem, we are given a given object is closest.Once the computation
training set x(1),...,x(m), and want to group of distances is complete the object should be
assigned to the closest cluster.
the data into a few cohesive "clusters." Here,
Once Mapper is invoked, the given
we are given feature vectors for each data object is assigned to the cluster that it is
point x(i)∈R n as usual; but no labels y(i) nearest related to. After the assignment of all
(making this an unsupervised learning objects to their associated clusters is done
problem). Our goal is to predict k centroids the centroid of each cluster is
and a label c(i) for each datapoint. recomputed.The recalculation is done by the
Reduce routine and also it restructures the
cluster to avoid generation of clusters with
V. AN OPTIMIZED K-MEANS extreme sizes. At the end, once the centroid
CLUSTERING USING MAP-REDUCE of the given cluster is revised, the new set of
TECHNIQUE objects and clusters is re-written to the
memory and is ready for the next iteration.
The first step of designing
MapReduce code Kmeans algorithm is to
express and investigate the input and output

684
International Journal of Pure and Applied Mathematics Special Issue

Algorithm 1: Repartitioning algorithm.

Data: A a1, a2,…, an , K


Result: R: an index of subsequence
Step 1:low←max ai
Step 2:high←n
Step 3:num←0
Step 4:while low <high do
Step 5:mid←low + high − low /2
Step 6:foreach ai ∈ A do
Step 7:sum←sum + a1
Step 8:if sum >mid then
Step 9:num+ +
Step 10:sum←a1
Step 11:R←R∪ i
Step 12:end
Figure 2: Acquisition of the metadata for Step 13:end
Step 14:if num ≤ K then
reduce tasks.
Step 15:high←mid − 1
Step 16:end
VI. Repartitioning Step 17:else if num >K then
The repartitioning process divides Step 18:low←mid + 1
the collected virtual partitions into new Step 19:end
partitions of the same number as reduce Step 20:end
tasks. The data size of the biggest partition Step 21: return R;
can be minimized after repartitioning
VII. Comparison of Clustering
process. It can also reduce the processing
time needed for the maximum partition, Algorithms
thereby speeding up the completion of the
Volume:
entire reduce phase and increasing the rate
It refers to the ability of an algorithm to deal
of completed jobs as well as system
with large amounts of a data. With respect to
throughput. As previously analysed, the
the Volume property the criteria for
repartitioning process recombines each
clustering algorithms to be considered is a.
virtual partition generated in the map phase.
Size of the data set b. High dimensionality c.
However, due to the limitation of available
Handling Outliers.
memory, these virtual partitions must be
Size of the data set: Data set is collection of
written to the local file system. If
attributes. The attributes are categorical,
repartitioning is not restricted, it is likely to
nominal, ordinal, interval and ratio. Many
lead to a plurality of discrete virtual
clustering algorithms support numerical and
partitions in one partition following the
categorical data.
balancing process, resulting in a non-
High dimensionality: To handle big data as
sequential read of the disk.
the size of data set increases no of
dimensions are also increases. It is the curse
of dimensionality.

685
International Journal of Pure and Applied Mathematics Special Issue

Outliers: Many clustering algorithms are system. Google File System (GFS) is
capable of handle outliers. Noise data cannot developed by Google is a distributed file
be making a group with data points. system that provide organized and adequate
access to data using large clusters of
Variety: commodity servers.
Variety refers to the ability of a clustering Map phase: The Master node accepts
algorithm to handle different types of data the input and then divides a large problem is
sets such as numerical, categorical, nominal into smaller sub-problems. It then distributes
and ordinal. A criterion for clustering these sub-problems among worker nodes in
algorithms is (a) type of data set (b) cluster a multi-level tree structure. These sub-
shape. problems are then processed by the worker
Type of data set: The size of the data nodes which execute and sent the result back
set is small or big but many of the clustering to the master node.
algorithms support large data sets for big Reduce phase: Reduce function
data mining. combines the output of all sub problems and
Cluster shape: Depends on the data collect it in master node and produces final
set size and type shape of the cluster formed. output. Each map function is associated with
a reduce function.
Velocity:
Velocity refers to the computations of
clustering algorithm based on the criteria (a)
running time complexity of a clustering
algorithm.
Time complexity: If the
computations of algorithms take very less no
then algorithm has less run time. The
algorithms the run time calculation done
based on Big O notation.

Value:
For a clustering algorithm to process the
data accurately and to form a cluster with Figure 3: Map Reduce Programming
less computation input parameter are play Model
key role. Operation mechanism of MapReduce is as
follows:
VIII. MapReduce Processing Model (1)Input: MapReduce framework based on
Hadoop MapReduce processes big Hadoop requires a pair ofMap and Reduce
data in parallel and provides output with functionsimplementing the appropriate
efficient performance. Map-reduce consist interface or abstract class,and should also be
of Map function and Reduce function. Map specified the input and output location and
function executes filtering and sorting of other operating parameters.In this stage, the
large data sets. Reduce function performs large datain theinput directory will be
the summary operation which combines the divided into several independent data blocks
result and provides the enhanced output. for the Map function of parallel processing.
Hadoop HDFS and Map-Reduce are (2)MapReduce framework puts the
delineated with the help of Google file application of the input as a set of key-value

686
International Journal of Pure and Applied Mathematics Special Issue

pairs <key, value>. In the Map stage, the Evaluation


framework will call the user-defined Map To measure the performance of the scaled k-
function to process each key-value pairs means algorithms using HadoopMapReduce,
<key, value>, while generating a new batch we have executed thealgorithms on 10
ofmiddle keyvalue pairs<key, value>. different samples of data. After execution of
(3)Shuffle:In order to ensure that the input the algorithm, we have calculated and
of Reduceoutputted by Map have been measure the inter-cluster and intra-cluster
sorted, in the Shuffle stage, the framework similarity measure.
uses HTTP to get associated key-value pairs The inter-cluster distance: distanced(i,j)
<key,value> Map outputs for each Reduce; between two clusters is measuredas the
MapReduce frameworkgroups the input of distance between the centroids of the
the Reduce phase according to the key clusters.
value. The intra-cluster distance measured
(4)Reduce : This phase will traverse the between the all pair of objects within a
intermediate data for each unique key, and cluster.
execute user-defined Reduce function. The
input parameter is < key, {a list of values} The following table and figure represent the
>, the output is the new key-value pairs< experimental results of K-means algorithm
key, value >. (5)Output : This stage will on different data samples where k=3.
write the results of the Reduce to
thespecified output directory location. Table 1: Execution results of An
IX. Results and Discussion optimized repartitioned K-means cluster
algorithm
Experimental setup
To implement the k-means algorithm we Sample Sample size Inter- Intra-
installed cluster composed of four nodes in Cluster Cluster
aws, Density Density
1. One Master Node (instance) of S1 78290 0.689142 0.556309
typem4. S2 1576718 0.740337 0.561887
2. Four slave Nodes (instance) of type
S3 2368512 0.73014 0.562767
m4.
3 Hadoop 2.4.1 S4 3153530 0.748691 0.5684
4. JDK 1.7 S5 3942470 0.802399 0.567079
This distributed environment of four S6 4732842 0.676724 0.611366
instances in AWS used to implement, S7 5522887 0.74722 0.563842
perform the optimized repartitioned k-means S8 6312932 0.783907 0.565958
clustering algorithm and to save the results. S9 7099392 0.704998 0.56797
S10 7887974 0.771926 0.572288
Data set description
To scale optimized repartitioning k-means
clustering algorithm one of smart city
dataset used . We used the pollution data set
which consists of 449 file. Each file contains
around 17500 observation of the pollutants
ratio of five attributes.

687
International Journal of Pure and Applied Mathematics Special Issue

0.9 References
0.8
0.7 [1]Anil K. Jain and Richard C. Dubes,
0.6 Michigan State University; Algorithms for
Density

0.5 Clustering Data:Prentice Hall, Englewood


Inter-Cluster
0.4 Cliffs,
Density
0.3 New Jersey 07632. ISBN: 0-13-0222278-X
0.2 Intra-Cluster [2]Forgy E (1965) Cluster analysis of
0.1 Density multivariate data; efficiency vs.
0 interpretability of classifications.
s1 s3 s5 s7 s9 Biometrics, 21: pp 768-780
[3]Bradley P, Fayyad U (1998) Refining
Data set
initial points for K-means clustering.
International conference on machine
Figure 4 : Comparison between Inter- learning (ICML-
cluster and Intra Cluster 98), pp 91-99
[4]Krishna K, Murty M (1999) Generic K-
From the results it is clear the sample s5 Means algorithm.IEEE Transactions on
shows the maximum inter-cluster density of systems, man, and cybernetics- part B:
0.552399 which indicates well separation of Cybernetics, 29(3): pp 433-439
[5]Likas A, Vlassis N, Verbeek J (2003) The
different cluster. Similarly, the inter-cluster
global K-means clustering algorithm.
density for sample s8 is calculated as Pattern recognition, 36(2), pp 451-461
0.533907, separating data clusters very well. [6]Pena JM, Lozano JA, Larranaga P (1999)
Also the results of Intra-cluster density for An empirical comparison of four
sample s1 show minimum value, which initialization methods for K-means
gives a clear indication of having the similar algorithm. Pattern
recognition letters 20: pp 1027-1040
objects in the same cluster.
[7]“Challenges And Authentication
X.Conclusion in Wireless Sensor Networks" at IEEE
In this paper Optimized repartitioned k- EDS MADRAS Chapter sponsored
International Conference on Emerging
means clustering algorithm scaled up to be
trends in VLSI, Embedded systems,Nano
applied to huge dataset which contain
Electronics and Telecommunication systems
around 10 millionobjects. Each object is a – ICEVENT 2013,at S K P Engineering
vector of six attributes. Inter and intra College. [8]Milligan G, Cooper M (1985)
cluster measurements computed to find the An examination of procedures for
maximum value of inter-cluster density and determining the number of clusters in a data
the minimum value of intra-cluster set. Psychometrika,
measurements. This research work done 50: pp 150-179
using Hadoop and MapReduce framework [9] SAS Institute Inc., SAS technical report
A-108 (1983) Cubic clustering criterion.
which gives high performance in big data Cary, NC: SAS Institute Inc., 56 pp
analysis. [10] R.Rajeshkanna, Dr A.Saradha "Cluster
Based Load Balancing Techniques to
Improve the Lifetime of Mobile Adhoc

688
International Journal of Pure and Applied Mathematics Special Issue

Networks" Published in International


Journal of Trend in Research and
Development (IJTRD), ISSN: 2394-9333,
Volume-2 | Issue-5 , October 2015.

689
690

You might also like