Enabling Lazy Learning For Uncertain Data Streams
Enabling Lazy Learning For Uncertain Data Streams
e-ISSN: 2278-0661,p-ISSN: 2278-8727, Volume 16, Issue 6, Ver. VII (Nov Dec. 2014), PP 01-07
www.iosrjournals.org
Abstract: Lazy learning concept is performing the k-nearest neighbor algorithm, Is used to classification and
similarly to clustering of k-nearest neighbor algorithm both are based on Euclidean distance based algorithm.
Lazy learning is more advantages for complex and dynamic learning on data streams. In this lazy learning
process is consumes the high memory and low prediction Efficiency .this process is less support to the data
stream applications. Lazy learning stores the trained data and the inductive process is different until a query is
appears, In the data stream applications, the data records flow is continuously in huge volume of data and the
prediction of class labels are need to be made in the timely manner. In this paper provide the systematic
solution to overcome the memory and efficiency. In this paper proposed a indexing techniques it is dynamically
maintained the historical or outdated data stream records. In this paper proposed the tree structure i.e. Novel
lazy tree simply called Lazy tree or L-tree.it is the height balanced tree or performing the tree traversing
techniques to maintain the trained data. These are help to reduce the memory consumption and prediction it
also reduces the time complexity. L-tree is continuously absorb the newly coming stream records and discarded
the historical. They are dynamically changes occurred in data streams efficiency for prediction. They are
experiments on the real world data streams and uncertain data streams. In this paper experiment on the
uncertain data streams .Our experimented uncertain data streams and real world data streams are
obtained from UCI Repository.
Key Words: data streams,clustering,Exemplar , Lazy Learning.
I.
Introduction
Data stream classification is presently increasing the more attention from the data mining. In this we
use the clustering techniques, these are playing important role in real world applications. I.e. Data stream
clustering plays an important role in spam detection, real-time intrusion [12] detection, and malicious Web page
detection. In these real world applications, data stream flow continuously and the ultimate goal is to efficiency
for prediction. In this for each incoming Data stream record in a particular time manner. some data stream
clustering models[11], such as partition cluster algorithms and another type is Incremental clustering.etc here
we perform only the lazy learning and similarly Eager learning is , the training data are materialistically
compiled into a summarizing the hypothesis model and then it is completely Discarded. Examples for eager
learning types are included such as neural networks, decision trees, and naive Bayes classifiers. Eager learning
[8] methods consumes the low memory and more predicting efficiency for answering the queries for data stream
applications.
In this Lazy learning [5] have the lazy learners are k-nearest Neighbor (kNN) classifier. It is the nonparametric and instance-based learning methods, here the trained data stream is simply stored in memory and
the inductive process is different until a query is given. Lazy learning methods incurred none or low
computational costs through training data but much higher costs to answering the queries. Compare to the lazy
learning and Eager learning is consume the less memory and efficiency is more. Similarly in lazy learning but it
efficiency is more for huge data streams also in case of eager learning less efficiency for huge data streams.
Lazy learning is also performs the greater storage chucks, not scale well to the large datasets. In Data stream
applications, The data streams come continuously in huge volumes, making it unrealistic to store the training
records. In accumulation, stream applications are the time critical. where as class prediction required to be
made in a well-timed manner. Lazy learning method is reduced in meeting. These requirements are not have
been considered for data stream classification . In this paper Proposed the novel Lazy-tree (L-tree) indexing. A
technique that is dynamically maintains the compact high level summaries of outdated stream records. L-tree is
extended from M-tree [4].When constructing the L-tree.the exemplar structure is maintain the clusters .these are
the maintain the L-tree.the root node contains the distances from the leaf nodes and it contain the exemplar
address. It performs the reduce memory consumption. An exemplar is a sphere of certain size generated
clusters. The exemplars are organized by height-balanced L-tree it is help to reduce the sub-linear predicting
time. L-trees are performs the three key operations. They are searching, deletion, insertion. Searching is
performs traverses the L-tree to retrieve the k-nearest exemplars. The insertion operation adds a new stream
www.iosrjournals.org
1 | Page
Section 3 is organized the related work in the clustering process and Exemplar structure.
Section 4 is organized the lazy learning, and the L-tree structure to organize exemplars and time manner,
and reduce the memory consumption.
Section 5 is contributing the experiments on real-world data streams and uncertain data streams.
II.
Problem Description
In the Enabling lazy learning [3] on data streams, in that mainly focus on the predicting time and
reducing memory consumption. Let us consider a data stream or uncertain dataset "S". It consists huge amount
of records like { S1,S2,S3, Sn }, and the time stamps are {T1,T2,...,....,Tn },where "n" is the depends on the
no of clusters given by the user or automatically generate the fixed no of clusters. And class labels are the {
C1,C2,...,....,Ci } and the query value or incoming record is "X". Here to avoiding the loss of generality of data
streams consider the K-NN as the lazy learning algorithm. It is one of the clustering algorithms. In data stream
environment, estimating these two probabilities is very challenging because of the predicting time and memory
consumption. In data streams, it is impractical to maintain all the (n 1) records { S1,S2,S3,.Sn-1 } for
Estimation. Thus, a memory-efficient algorithm needs to be designed for this purpose required a linear scan of
all the (n 1) records, corresponding to an O(n) time complexity, Linear scan is not acceptable for data stream
applications. so, an Efficient search algorithm is needed to reduce the predicting and time complexity
O(log(n)).But in this paper searching algorithm is Linear Scan algorithm taken the time complexity O(N).
III.
Related Work
A cluster is a group of data objects which are similar characteristics to each other within cluster and
dissimilar characteristics to other data objects belonging to other cluster [1].Data clustering is a young scientific
discipline under vigorous development. There are large number of research papers scattered in many conference
proceedings and periodicals, mostly in the fields of data mining, statistics, machine learning, spatial database,
biology, marketing, and so on, with different emphases and different techniques. Owing to large amounts of
data which is collected from different data resources, cluster analysis has recently become highly active topic in
data mining research.
www.iosrjournals.org
2 | Page
www.iosrjournals.org
3 | Page
IV.
Lazy Learning
L-Tree Structure:
L-Trees are extending from M-Trees. M-Trees [4] index objects in metric spaces such as video, image,
voice, numerical data, and text. L-trees index exemplars that are spherical spatial objects. This is different from
other spatial indexing structures such as R*-Trees and R-Trees that indexing rectangular spatial objects. An LTree Structure mainly consists of two different types of nodes: rooting nodes and leaf nodes. The root node can
be considered as a special routing node that has no parent. A leaf node contains a batch of exemplars
represented in the form of, (pointer, distance), pointer is referenced the memory location of an Exemplar and
distance indicates the distance between the exemplar and its root node or parent node. The L-Tree structure
contains the following entries in the form of, (center, radius, leaf node, distance), leaf node is a pointer that
references its child node, and distance denotes the distance of the entry to its root node or parent node.
L-Trees have the following properties:
Routing node: A routing node contains the number of entries it is the root.
Root node: The root node is contains the two entries it is centroid and distance from the leaf node.
Leaf node: A Leaf node is also similar to the root node process, and all leaf nodes are at the same level. The
leaf nodes are maintained the exemplars and its distances from the leaf nodes. The tree structure performs the
search and Insert Operations.
Search:
In this search algorithm we used the linear scan search algorithm, search operation is invokes the
Query is present in the data stream and calculate the distance from Exemplar, and similarly we also take the
different search algorithms such as binary search, branch and bound search algorithms.etc. Search algorithm is
first traverses the L-tree to find its k nearest exemplars in leaf nodes. Then it also calculates the class label for
Query is retrieved from exemplars. The exemplars in a Height balanced L-Tree can significantly reduce the
search time cost from O(N) to O(log(N)), Here m is the total number of exemplars in the L-tree. Search
method is further improved by using the branch-and-bound technique.
Example3: Let us assume the exemplar e and find the distance from the exemplar to query or incoming record
is X. it is denoted by the d (e , x).
In tree structure perform breath-first search. The main purpose of the algorithm is to retrieve the cluster results
using of comparison by making full use of the bound b. Consider the incoming record or Query x and The
search algorithm is initially pushes the entries e1 and e2 in query x. Then, it calculates the distance d(x, e1) and
d(x, e2) in this if the smaller distance value of e2 is small then it perform the tree traversing along e2. Next,
similarly this process continuously compares to Query X with leaf node entries. Compared to the linear scan
that requires m" comparisons.
Insertion:
Inserting operations are taken as to absorb the new data stream records into L-Tree, So that they can
quickly adapt to new patterns in data streams. Algorithm2 list gives the procedure of the inserting operation.
The query record or incoming record X, and search algorithm is invokes to find its Nearest Leaf node O. When
the query X is insert in the retrieved Leaf node O. and it is performs the search results are taken and inserted in
www.iosrjournals.org
4 | Page
V.
Experimental Works
All the experiments were done on 2.50GHz Intel Core i3 machine with 4GB main memory running
Window 7 operating system. We implemented the GUI Tool using Java. Below experiment results shows that
Enabling Lazy Learning for Uncertain data streams using L-Tree indexing technique algorithm Tree traverse
algorithm gives less consume the memory usage and prediction time manner.
Results for before Inserting the Query:
Figure 4 : Performance graph for Exemplar to Time using GUI Based Tool.
www.iosrjournals.org
5 | Page
Figure 5: Performance graph for Exemplar memory to Time taken for Stream records using GUI Based Tool.
Results for After Inserting the Query:
Figure 7: Performance graph for Exemplar memory to Time taken after inserting the Query in data Stream
records.
The above graph results are getting by experimented the GUI Tool on Soya bean-Large dataset [15] that
contains a set of board configurations possible at the end of game.It includes 668 instances and 36 attributes,
each stream record corresponding to the Soya bean data.
The different real world data sets, synthetic data sets are obtained from data stream mining [14] repository and
UCI data stream [13]repository.
VI.
In this paper presents the k-means algorithm for generating clusters and maintains the clusters in tree
structure. And reduce the memory and time cost of performance. In this paper is concentrate on the uncertain
data streams. Here x-means algorithm is also used instead of k-means algorithm. X-means[2] is the extension of
k-means algorithm.
Future work will analyze that we focus on different tree algorithms and searching algorithms. Similarly
using the different clustering algorithms[11] or classification techniques[6] are used to get different results.
These tests may take efficiency and reduce the time and get the Exemplar process performance and compare its
www.iosrjournals.org
6 | Page
References
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
www.iosrjournals.org
7 | Page