0% found this document useful (0 votes)
39 views

Development of An Algorithm For Reducing Errors During The Prediction of Stream Data

This document summarizes an algorithm that uses non-linear regression to predict trends in two-dimensional stream data and reduce errors during prediction. It discusses an existing FPT-DS algorithm that uses linear regression on stream data. The proposed algorithm collects real-time stream data through sliding windows, computes sequence support, and applies non-linear regression to forecast future trends. It preprocesses the two-dimensional stream data to transform it into dependent and independent variables for the regression analysis. The non-linear regression model is then used to predict uncertain items and reduce errors in the predictions.

Uploaded by

Vinod Malik
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views

Development of An Algorithm For Reducing Errors During The Prediction of Stream Data

This document summarizes an algorithm that uses non-linear regression to predict trends in two-dimensional stream data and reduce errors during prediction. It discusses an existing FPT-DS algorithm that uses linear regression on stream data. The proposed algorithm collects real-time stream data through sliding windows, computes sequence support, and applies non-linear regression to forecast future trends. It preprocesses the two-dimensional stream data to transform it into dependent and independent variables for the regression analysis. The non-linear regression model is then used to predict uncertain items and reduce errors in the predictions.

Uploaded by

Vinod Malik
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Development of an Algorithm for reducing errors during

The prediction of stream data

Pinki Sagar*, Neha Kaushik**, Poonam Katyal***


Assistant Professor ,Manav Rachna International University,
Assistant Professor , Manav Rachna International University,
Assistant Professor , Manav Rachna International University
Email: [email protected]; [email protected]; [email protected]

Abstract: This paper presents a non linear regression based be used as a prediction model to uncertain items. The
algorithm for two dimensional stream data, to forecast proposing way will exhibit its effectiveness through
sequence trends for real-time stream data, this algorithm experiment in stream data.
reduce the errors during the prediction. After gathering 2.Introduction to FPT-DS:Chai, Eun Hee Kim and Long Jin
real-time stream data through sliding window, this algorithm proposed a way FPT-DS (frequent item prediction method ) is
computes support for appointed sequence and describes non proposed to predict frequent items. Using linear regression
linear regression to forecast sequence trends in the future. model. It may be used as a prediction model for the stream
In such traditional transaction environment it is impossible data. Stream data has continuous and infinite characteristics.
to perform frequent items mining because it requires They proposed an algorithm that predicts frequent items using
analyzing which item is a frequent one to continuously simple linear regression model from one Dimensional stream
incoming stream data and which is probable to become a data. As we know Stream data is continuous and complex in
frequent item. This paper proposes a way to predict frequent time so it is only possible to access such data temporarily.
items using regression model to the continuously incoming Stream data has sequential characteristics that can be
two dimensional stream data like the time series data. By considered as time series data. Prediction of time series data
establishing thenon linear regression model from the stream gathers useful data estimating future through the analysis of
data, it may be used as a prediction model to uncertain items. data from the past. In the FPT-DS method first one
dimensional stream data is preprocessed to establish simple
Keywords: FPT-DS, Stream data, linear regression model. When the regression model is
generated, prediction process on the possibility of frequent
2.Literature review: Wei-Guang Teng and Feng Zhao,Qing- items is performed based on the regression model stream data
Hua A Li[4 ] [5]introduced an algorithm FPT-DS(Frequent is reorganized with the time in which each one dimensional
Pattern Temporal Data Stream) which uses linear regression to data is inputted .In the reorganization of data we collect the
perform prediction on two dimensional stream data. It is a input time of same data and calculate the difference in time at
very effective method. but it always omits some private which data is accrued. At the later stage pairing is to be done
exceptions, leading to failure. Regression-based algorithm, with time which is calculated from the reorganization of data.
called algorithm FPT-DS, has been more effective to mine
frequent data streams. Duck Jin Chai, Eun Hee Kim and Long 3. Features of FPT-DS: The following are the features of
Jin[7] introduced that Data mining in the stream data handles FPT-DS
quality and data analysis using extremely large and infinite 1.Algorithm FPT-DS scans online transaction flows and
amount of data and disk or memory with limited volume. In generates candidate frequent patterns in real time.
such traditional transaction environment it is impossible to 2 The other important feature of algorithm FPT-DS is on the
perform frequent items mining because it requires analyzing regression-based compact pattern representation. Specifically,
which item is a frequent one to continuously incoming stream to meet the space constraint, we devise for pattern
data and which is probable to become a frequent item. This representation a compact ATF (standing for Accumulated
paper proposes a way to predict frequent items using non Time and Frequency) form to aggregately comprise all the
linear regression model to the continuously incoming one information required for regression analysis.
dimensional stream data like the time series data. By 3. FPT-DS use the segmentation tuning and segment
establishing the regression model from the stream data, it may relaxation to enhance the Regression Technique.
4. FPT-DS perform the trend detection very effectively in
comparison of other linear regression model.
5. Based on two dimensional stream data.

4. Method of Existing FPT-DS: Existing FPT-DS consists of Input: The window size N, and the support threshold
following Steps:
Step1: Identify ids at which particular stream data sequence is Step5: fit the regression model of FIPM:
available and at specified sliding windows . ý = b0+b1x+E (5)
Step2: Calculate the support for the sequence of stream data
for every sliding window using following formula. Step6: calculate the error
Number of id at which stream data is present New support (y)= Predicted Y
F= ______________________________ Error= y -- predicted y (6)
Total number of id in stream sequence
Step3: Support for each sliding window is known as 5. Preprocessing of Stream Data in FPT-DS: FPT-
dependent variable and ending time of sliding window is DS(Frequent Pattern Temporal Data stream) Linear regression
known as independent variable. method for two-dimensional frequent stream data. In this
method we apply the preprocessing on the stream data for
∑f = 0.4+0.4+0.2=1.0 getting dependent and independent variable for regression
∑t = sum of ending time for each sliding window analysis. Stream data comprise of character/symbols/ items
for eg. 3+4+5=12 and for sliding window appearing repeated in sequential file.
[0,3] time is 3 and for [1,4] is 4 and for
[2,5] is 5. In Proposed FPT-DS we have applied the non linear
∑tf = .4*3+.4*4+.2*5= 3.8 regression method on the stream data. In this method we
Step3: Rearrange the support sequence and ending time of organize the data according to their time and ids at which they
sliding window in the form of dependent and independent are appearing. For example let <bd> sequence be appearing at
variable. With the help of preprocessing of training data, we id 2,6,9 at 0 to 3 sliding window out of total 15 ids. Support is
go it the frequency as independent variable and time as an calculated by number of ids at which data is appearing is
independent variables, and then For FTP-DS We calculate the divided by total number of ids. FPT-DS is a linear regression
coefficient using preprocessing and fit the regression model. method for the two-dimensional stream data. This algorithm is
For calculating the coefficients and regression model we used used for mining of frequent data items.
the various symbols:
∑f : Sum of all frequencies (or support) dependent It is the easiest way to predict frequent items using regression
variables (1,2----n) model for continuous incoming two-dimensional stream data.
∑t : Sum of all times independent variables (1, 2 -----n) In this method first preprocesses two dimensional stream data
∑tf : Sum of multiplication of time and frequencies (1, 2, and transforms it in to the form of sampling value for further
-------n) regression. And then linear regression is applied on the
Step 4: For calculating the regression model we use the organized stream data. FPT-DS has two major features,
following equations namely one data scan for online statistics collection and
regression based compact pattern representation. FPT-DS is
able to not only conduct mining with variable time intervals
(1) but also perform trend detection effectively. FPT-DS
algorithm is based on the linear regression Examples of data
(2) streams include computer network traffic, phone
conversations, ATM transactions, web searches, and sensor
data. In traditional transaction environment it is impossible to
(3)
perform frequent items mining because it requires analyzing
which item is a frequent one to continuously incoming stream
(4) data.

Ý = New Support Y(f) ; b0 = coefficient( a) 6. Example: In this training data table 1 we calculate the
frequencies (or support) for <bd> data sequence according to
b1 = coefficients (b) sliding windows and their ids. Sliding window 0 to 3, <bd>
data sequence is appeared in id 2 out of total 15 ids. In second
sliding window 1 to 4, <bd> data sequence is appeared at 2, 4,
6, 9, 10 ids out of total 15 ids. In each sliding window we find
ids for <bd> data sequence. Using ids we can calculate the technique. Proposed FPT-DS consists the following steps:
support for data sequence for each sliding window.
Step1: Apply the Preprocessing to the stream data same as
Table 1 : Training data for FPT-DS for <bd> sequence Existing FPT-DS

Step2 : Inputs are (xi, yi),i=(1,2,3,4-n)calculate the following


variable Y1,Y2,Y3,X1,X2,X3 and X4

Y1= sum of all dependent variables divided


by number of sliding windows.

Y2= Multiplication of dependent and


independent variable and divided by
number of sliding windows

X1= Sum of all dependent variables divided


by number of sliding windows.

X2= Sum of squared of all dependent


variables divided by number of
Sliding Windows
7. Results by Existing FPT-DS: For <bd> stream data
sequence, Support sequence (or Y), new support (or predicted X3= Sum of cube of all dependent
Y), and errors are calculated in table 2, using Existing FPT- variables divided by number of sliding
DS. Support Y is considered as dependent variable and time is Windows.
considered as independent variable. After finding dependent
and independent variable, we applied the linear regression Y1=( ∑ yi ) /n Y2= (∑ xi yi ) /n
model. Values are shown
Y3=(∑xi2 yi )/n X1= (∑ xi )/n
Table 2: Predicted y and error using existing FTPDS for <bd>
sequence X2= (∑xi2 )/n X3= (∑xi3 )/n

X4= (∑xi4 )/n

Step3: Use variables Y1, Y2, Y3, X1, X2, X3 and X4 for
calculating the coefficient b0,b1,b2 of Non linear regression
model
b2 = ( y2-x1y1)(x3-x1x2)-(y3-y2y1)(x2-x12) (7)
(x3-x1x2)2 – (x4-x22) (x2-x12)

b1 = (y2-x1y1) – b2(x3-x1x2) (8)


(x2-x12)

b0 = y1- b1x1- b2x2 (9)


Non Linear Model is
Yi = b0 + b1 xi + b2 xi2-------------------------n. (10)

Apply the non linear model on the dependent and independent


8. Proposed method: In proposed FPT-DS non linear variable shown in table( 3). After calculating predicted value
regression method is applied for frequent stream data. In we will claculate the mean values for the unique dependent
proposed FPT-DS predicted supports and errors are calculated and independent variables.
for stream data sequence using non linear regression
According to analysis of FPT-DS it is to be found that FPT-
DS using linear regression give prediction with high error.
For improving the result or we can say that for reducing the
Table 3: Predicted y and error using proposed FPT-DS for errors in FPT-DS, we proposed FPT-DS using non linear
<bd> sequence regression which gives improved results. In FPT-DS each
sequence and every time is considered using sliding windows.
Proposed FPT-DS gives prediction with low errors in
comparison of existing FPT-DS.

11.Refrences:
[1]M.Datar,A.Gionis,P.Indyket,al:Maintaining Stream
Statistics over Sliding Window, InACM-SIAM Symposium
on Discrete Algorithms (SODA), Chicago, pp.406-417, June
2002.
[2]Y.Chen, G.Dong, J.Han et,al:Multi- Dimensional
Regression Analysis of Time-Series Data Stream,
Proceedings of the 28th International Conference on Very
Large Data Base,Berlin, pp. 323-334, August 2002
[3]S.Guhaand N.Koudas: Approximating a Data Stream for
Querying and Estimation:Algorithms and Performance
9.Experimental Results : In the experimental result we have Evaluation, In Proceedings of the 16th ICDE
analyzed that error during the prediction of stream data using conference,Florida, pp.3-14, March 2002.
non linear based FPT-DS are less than the errors calculated [4] Feng Zhao,Qing-Hua A Li :A Plane Regression–Based
during the prediction of stream data using the Linear model Sequence Forecast algorithms for Stream Data;In Proceeding
based FPT-DS. Now it is clear that when prediction for stream of the Fourth International Conference on Machine Learning
data is done errors should be less. Prediction technique having and
less errors is better than the prediction technique having more Cybernetics;pp-1559-1562angzhou,18-21 August, 2005
errors. [5]Wei-Guang Teng, Ming-Syan Chen, Philip S.Yu: A
Regression-Based Temporal PatternMining Scheme for Data
Stream, Proceedings of the 29th International Conference on
very Large Data Base, Berlin, pp.607-617, August 2003.
[6]Joshi M, Karypis G: A universal formulation of sequential
patterns, Research report No.99-021, Department of
Computer Science, University of Minnesota, Minnesota, 1999.
[7] Duck Jin Chai, Eun Hee Kim and Long Jin: Prediction of
Frequent Items to One Dimensional Stream Data,
proceeding in Fifth International Conference on
Computational Science and Applications, pp 353-360,2007.
[8]Gordon K. Smyth: Nonlinear Regression,Volume 3, pp
1405–1411, Chichester,2002.
[9] XI CHEN: Recursive Least Square Method with Member
Functions; Proceedings of the Third International.
Conference on Machine Learning and Cybernetics, Shanghai,
Figure1:Analysis of errors during the prediction of 26-29 August 2004
existing FPT-DS and proposed FPT-DS [10]R. Hayward; A Basic Approach to Linear Regression;
RWJ Clinical Scholars Program; pp1-3, University of
10.Conclusion: FPT-DS algorithm is used for mining from Michigan , 2005.
stream data .this is based on linear regression technique.

You might also like