0% found this document useful (0 votes)

8 views9 pages

PP Data Stream Classification ICECIT Dec 2012

The paper discusses privacy-preserving data stream classification using data perturbation techniques, focusing on the need for privacy in data mining due to concerns from data owners. It evaluates existing methods and proposes new algorithms that minimize information loss while maximizing privacy gain, specifically for dynamic data streams. The proposed methods include sliding window-based perturbation and multiplicative perturbation using rotation, demonstrating effective classification with maintained privacy.

Uploaded by

rajipe

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views9 pages

PP Data Stream Classification ICECIT Dec 2012

Uploaded by

rajipe

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/259493155

Privacy Preserving Data Stream Classiﬁcation Using Data Perturbation

Techniques

Conference Paper · December 2012

DOI: 10.13140/2.1.1339.2964

CITATIONS READS

6 864

3 authors, including:

Dr. Hitesh Chhinkaniwala Sanjay Garg

Adani Institute of Infrastructure Engineering Jaypee University of Engineering and Technology
31 PUBLICATIONS 171 CITATIONS 113 PUBLICATIONS 795 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Dr. Hitesh Chhinkaniwala on 01 January 2014.

The user has requested enhancement of the downloaded file.

International Conference on Emerging Trends in Electrical, Electronics and Communication
Technologies-ICECIT, 2012

Privacy Preserving Data Stream Classification Using Data

Perturbation Techniques
1
Hitesh Chhinkaniwala*, 1Kiran Patel, 2Sanjay Garg
1
U V Patel College of Engineering, Kherva, Mehsana, 382711, India
2
Institute of Technology, Nirma University, Ahmedabad, 382481, India

Abstract

Data stream can be conceived as a continuous and changing sequence of data that continuously arrive at a system to store or
process. Examples of data streams include computer network traffic, phone conversations, web searches and sensor data etc.
These data sets need to be analyzed for identifying trends and patterns, which help us in isolating anomalies and predicting future
behavior. However, data owners or publishers may not be willing to exactly reveal the true values of their data due to various
reasons, most notably privacy considerations. Hence, some amount of privacy preservation needs to be done on the data before it
can be made publicly available. To preserve data privacy during data mining, the issue of privacy preserving data mining has
been widely studied and many techniques have been proposed. However, existing techniques for privacy preserving data mining
is designed for traditional static data sets and are not suitable for data streams. So the privacy preservation issue of data streams
mining is need for the time. This paper focused on describing a method that extends the process of data perturbation on data sets
to achieve privacy preservation. Classification characteristics of original data streams and perturbed data streams using proposed
algorithms have been evaluated in terms of less information loss, response time, and more privacy gain.

Keywords: Data Perturbation, Sliding Window, Orthogonal Matrix, Decision Tree, Hoeffding Tree;

1. Introduction

In the field of information processing, data mining refers to the process of extracting the useful knowledge from
the large volume of data. There is plenty of area where the data mining is widely applied such as Healthcare which
includes Medical diagnostics, Insurance claims analysis, Drug development, Business, Finance, Education, Sports
and Gambling, Stock market, Retail, Telecommunication etc. Widely used data mining techniques in such area of
application includes Clustering, Classification, Regression analysis and Association rule / Pattern mining.
The data stream paradigm has recently emerged in response to the issues and challenges related with continuous
data [2]. Mining data streams is concerned with extracting knowledge structures represented in models and patterns
in non-stopping, continuous streams (flow) of information. Algorithms written for data streams can naturally cope
with data sizes many times greater than memory, and can be extended to challenge real-time applications not
previously tackled by machine learning or data mining. The assumption of data stream processing is that training
examples can be briefly inspected during single scan of input data stream, that is, they arrive in a high speed stream,
and then must be discarded to make room for subsequent examples. The algorithm processing the stream has no
control over the order of the examples seen, and must update its model incrementally as each example is inspected.
An additional desirable property, the so-called anytime property, requires that the model is ready to be applied at
any point between training examples. Traditional data mining approaches have been used in applications that have
persistent data available and generated learning models are static in nature. Statistical information of the data

* Corresponding author. Tel.: +0-942-690-5105.

E-mail address: [email protected]
Hitesh Chhinkaniwala, Kiran Patel, Sanjay Garg/ Privacy Preserving Data Stream Classification Using Data Perturbation Techniques

distribution can be known in advance because entire data set is available before pass it to machine learning
algorithm. The task performed by the mining process is centralized and produce static learning model. But
nowadays, in the field of information processing, an emergence of applications that do not fit this data model [3]
Instead, information naturally occurs in the form of a sequence (stream) of data values. A data stream is a real-time,
continuous, and ordered sequence of items. It is not possible to control the order in which items arrive, nor feasible
to locally store a stream in its entirety. Likewise, queries over streams run continuously over a period of time and
incrementally return new results as new data arrive.

2. Privacy Concern for Data Stream

Mining data streams is concerned with extracting knowledge structures represented in models and patterns in
non-stopping streams of information. The general process of data stream mining is depicted in Figure 1.

Figure 1. General Process of data stream mining

Motivated by the privacy concerns on data mining tools, a research area called privacy-preserving data mining
has been emerged. Verykios et al. [4] classified privacy- preserving data mining techniques based on five
dimensions – data distribution, data modification, data mining algorithms, data or rule hiding, and privacy
preservation. In the dimension of data distribution, some approaches have been proposed for centralized data and
some for distributed data. Du and Zhan [5] utilized the secure union, secure sum and secure scalar product to
prevent the original data of each site from revealing during the mining process. At the end of the mining process,
every site will obtain the final result of mining the whole data. The disadvantage is that the approach requires
multiple scans of the database and hence is not suitable for data streams, which flows in fast and requires immediate
response. In the dimension of data modification, the confidential values of a database to be released to the public are
modified to preserve data privacy. Adopted approaches include perturbation, blocking, aggregation or merging,
swapping, and sampling. Agrawal and Srikant [6] used the random data perturbation technique to protect customer
data and then constructed the decision tree. For data streams, because data are produced at different time, not only
data distribution will change with time, but also the mining accuracy will decrease with perturb data. Privacy
preservation techniques can be classified into three categories namely heuristic-based techniques, cryptography-
based techniques, and reconstruction-based techniques. From the review of previous research, it can be seen that
existing techniques for privacy-preserving data mining are designed for static databases with an emphasis on data
security. These existing techniques are not suitable for data streams.
Perturbation techniques are often evaluated with two basic metrics: level of privacy guarantee and level of
model-specific data utility preserved, which is often measured by the loss of accuracy for data classification and data
clustering. An ultimate goal for all data perturbation algorithms is to optimize the data transformation process by
maximizing both data privacy and data utility achieved. Data privacy is commonly measured by the difficulty level
in estimating the original data from the perturbed data. Given a data perturbation technique, the higher level of
Hitesh Chhinkaniwala, Kiran Patel, Sanjay Garg/ Privacy Preserving Data Stream Classification Using Data Perturbation Techniques

difficulty in which the original values can be estimated from the perturbed data, the higher level of data privacy this
technique supports. Data utility typically refers to the amount of mining-task/model specific critical information
preserved about the data set after perturbation. Different data mining tasks, such as classification mining task vs.
association rule mining, or different models for the same task, such as decision tree model vs. k-Nearest- Neighbor
(kNN) classifier for classification, typically utilize different sets of data properties about the data set.

3. Privacy Preserving Data Stream Classification

3.1. Data stream classification cycle

A classification algorithm must meet several requirements in order to work with the assumptions and be suitable
for learning from data streams. Figure 2 illustrate the typical use of a data stream classification algorithm, and how
the requirements fit in. The general model of data stream classification follows three steps in a repeating cycle [11]:
The algorithm is passed the next available example from the stream (requirement1).
The algorithm processes the example, updating its data structures. It does so without exceeding the memory
bounds set on it (requirement 2), and as quickly as possible (requirement 3).
The algorithm is ready to accept the next example. On request it is able to supply a model that can be used
to predict the class of unseen examples (requirement 4).

Figure 2. Data stream classification cycle

Yabo Xu et al. [7] considered the classification problem where the training data are several private data streams.
Joining all streams violates the privacy constraint of such applications and suffers from the blow-up of join. With
this technique, one can build exactly the same Naive Bayesian Classifier as using the join stream without
exchanging private information. The processing cost is linear in the size of input streams and independent of the join
size. But there are some drawback related time and data arrival rate that is Having a much lower processing time per
input tuple, the proposed method is able to handle much higher data arrival rate and deal with the general many-to-
many join relationships of data streams. Ching-Ming Chao et al [8] proposed the method Privacy Preserving
Classification of Data Streams (PCDS) claiming better accurate result and overcome drawback cited for [7]. They
proposed two stage processes – data streams preprocessing for data perturbation and data stream mining. Data
streams preprocessing (DSP) algorithm has higher security and less information loss. In the stage of data streams
mining, Weighted Average Sliding Window (WASW) algorithm is used to mine perturbed data streams. Experiment
results of accuracy measurement showed that the error rate of the Very Fast Decision Tree Learner (VFDT)
algorithm increases constantly along with continuous arrival of the data stream but the error rate of the WASW
Hitesh Chhinkaniwala, Kiran Patel, Sanjay Garg/ Privacy Preserving Data Stream Classification Using Data Perturbation Techniques

algorithm is kept under the predetermined threshold value. Therefore, the WASW algorithm has higher accuracy. In
conclusion, the PCDS method not only can preserve data privacy but also can mine data streams accurately.
Multiplicative Perturbation has two basic forms of multiplicative noise which have been studied by the statistics
community [1]. One multiplies each data element by a random number that has a truncated Gaussian distribution
with mean one and small variance. The other takes a logarithmic transformation of the data first, adds multivariate
Gaussian noise, and then takes the exponential function exp (.) of the noise-added data. Neither of these
perturbations preserves pair wise distance among data records.
To facilitate large scale data mining applications, Liu et al. [13] proposed an approach where the data is
multiplied by a randomly generated matrix – in effect, the data is projected into a lower dimensional random space.
This technique preserves distance on expectation. Oliveira and Zaiane [14], Chen and Liu [15] discussed the use of
random rotation for privacy preserving clustering and classification. These authors observed that the distance
preserving nature of random rotation enables a third party to produce exactly the same data mining results on the
perturbed data as if on the original data. However, they did not analyze the privacy limitations of random rotation.
Liu et al. [16] addressed the privacy issues of distance preserving perturbation (including rotation) by studying how
well an attacker can recover the original data from the transformed data and prior information. They proposed two
attack techniques: the first is based on basic properties of linear algebra and the second on principal component
analysis. Their analysis explicitly illuminated scenarios where privacy can be breached.

4. Problem Description

The initial idea of it was to extend traditional data mining techniques to work with the perturbed stream data to
mask sensitive information. The key issue is to get accurate stream mining results using perturb data. The solutions
are often tightly coupled with the data stream mining algorithms under consideration.

Figure 3. Framework for privacy preserving in data stream classification

The goal is to transform a given data set D into perturbed version D’ that satisfies a given privacy requirement
and loss minimum information for the intended data analysis task. In this paper data perturbation algorithms have
been proposed for data set perturbation. Classification characteristics in terms of accuracy, response time, and
privacy gain on Hoeffding tree algorithm have been analyzed.
Hitesh Chhinkaniwala, Kiran Patel, Sanjay Garg/ Privacy Preserving Data Stream Classification Using Data Perturbation Techniques

5. Data Perturbation Algorithms

5.1. Data Perturbation Using Sliding Window Concept

The stage of data streams pre-processing uses perturbation algorithm to perturb confidential data. Users can
flexibly adjust the data attributes to be perturbed according to the privacy need. Therefore, threats and risks from
releasing data can be reduced effectively.

Algorithm: Sliding Window based Data Perturbation

Input: Data set S (.ARFF or .CSV file)
Output: Perturbed Data set S’ (.ARFF or .CSV file)
Algorithm Steps:
1. Read Original Data set S from file.
2. Display set of attributes with data type and total number of attribute that are in the data set S.
3. Display total number of tuples in data set S.
4. Select sensitive attribute (numerical only) from set of available attributes.
5. Suppose selected attribute is F* then
a) Assign window to F* attribute (window store received data set according to order of arrival).
b) Suppose size of window is w (selection of size at run time) than it contains only w tuples of selected
attribute F* values.
c) Find mean of w set of tuples.
d) Replace first tuple value of window by mean computed from step 5c.
e) Rest of the tuples’ value in w remains unchanged.
f) Popped off perturbed value and append next tuple into w. Sliding window size remains same.
6. Repeat 5a to 5f until all the values of attribute F* is perturbed.
7. Store perturbed data set S’ into new file.

5.2. Multiplicative Data Perturbation Using Rotation Perturbation

Multiplicative data perturbations include three types of perturbation techniques: Rotation Perturbation, Projection
Perturbation, and Geometric Perturbation. They all preserve (or approximately preserve) distance or inner product,
which are important to many classification and clustering models. As a result, the classification and clustering
models based on the perturbed data through multiplicative data perturbation show very similar performance to those
based on the original data. The main challenge for multiplicative data perturbations thus is how to maximize the
desired data privacy. In contrast, many other data perturbation techniques focus on seeking for the better trade-off
between the level of data utility and accuracy preserved and the level of data privacy guaranteed.
Rotation perturbation is orthogonal transformation-based data perturbation. Suppose the data owner has a private
data set , with each column of X being an attribute and each row a Record. The data owner generates an m × m
orthogonal matrix , and computes G(X) = R*X. The perturbed data G(X) is then released for future usage.
Where
- Original data set
- Random orthogonal matrix (RTR = RRT = I)
- Perturbed data set

Algorithm: Multiplicative Data Perturbation (Rotation perturbation)

Input: Data set X (.ARFF or .CSV file)
Output: Perturbed Data set X’ (.ARFF or .CSV file)
Hitesh Chhinkaniwala, Kiran Patel, Sanjay Garg/ Privacy Preserving Data Stream Classification Using Data Perturbation Techniques

Algorithm Steps:
1) Read Original Data set X file.
2) Consider only numeric data type attribute from data set X. and called Xnum
3) Select the slot of m rows (Xm) for perturbation from Xnum and consider it as a matrix. Perform following steps.
a. Suppose matrix size is m×n and create an orthogonal matrix Rm×m .
b. Multiply above both matrixes. G(X) = Rm×m*Xmxn. Derive G(X) matrix which has same number of
columns and rows of original data set.
c. Replace Xmxn with G(X)mxn
d. Select next m rows (Xm) and repeat step 3.
e. Store perturbed data set X’ into new file.
For model creation Hoeffding tree induction algorithm [9] has been used. Decision tree learning is one of the
most effective classification methods. A Decision tree is learned by recursively replacing leaves by test nodes,
starting at the root. Each node contains a test on the attributes and each branch from a node corresponds to a possible
outcome of the test and each leaf contains a class prediction. All training data stored in main memory so it’s
expensive to repeatedly read from disk when learning complex trees so our goal is design decision tree learners than
read each example at most once, and use a small constant time to process it.

6. Experimental setup and Results

We have conducted experiments to evaluate the performance of data perturbation algorithms. For experiment we
take two data sets. Agrawal data set is generated using Massive Online Analysis (MOA) – an open source
framework for data stream mining [11] with 5K instances and 10 attributes. Bank Marketing data set [12] is taken
from UCI data set repository which is related with direct marketing campaigns of a Portuguese banking institution,
and it contains 45K instances and 17 attributes. We applied data perturbation using sliding window algorithm with
window size w = 2, w = 3, and w = 4 and rotation perturbation algorithm with R = 5, R = 10 and R = 20. WEKA
(Waikato Environment for Knowledge Analysis) [10] data mining tool has been integrated with MOA to test the
accuracy of Hoeffding tree algorithm with Split Criterion is InfoGain, Tie Threshold set to 0.05 and Split
Confidence to 0. The data perturbation algorithms have been implemented in Java and integrated within MOA
framework. Result in table 2 and table 3 shows that privacy has been achieved with little over 2% of loss in
information.
Table 2. SLIDING WINDOW BASED DATA PERTURBATION

Original Perturbed Data set

DATA SET ANALYSIS
Data set Window Size
w=2 w=3 w=4
Time taken to
build model 0.09 0.09 0.10 0.11
ADULT DATA (sec)
SET
Correctly
82.52 80.46 80.49 80.50
classified (%)

Time taken to
BANK build model 0.20 0.17 0.13 0.13
MARKETING (sec)
DATA SET Correctly
89.04 88.48 82.32 86.12
classified (%)
Hitesh Chhinkaniwala, Kiran Patel, Sanjay Garg/ Privacy Preserving Data Stream Classification Using Data Perturbation Techniques

Table 3. MULTIPLICATE DATA PERTURBATION USING ROTATION PERTURBATION

Original Perturbed Data set

DATA SET ANALYSIS
Data set Rotation Perturbation
R=5 R = 10 R = 20
Time taken to
build model 0.09 0.09 0.09 0.11
ADULT DATA (sec)
SET
Correctly
82.52 81.41 81.42 81.42
classified (%)

Time taken to
BANK build model 0.20 0.14 0.13 0.14
MARKETING (sec)
DATA SET Correctly
89.04 87.23 87.23 88.40
classified (%)

7. Conclusion

An approach has been discussed for privacy-preserving classification of data streams, which consists of two
steps: date streams pre-processing and data streams mining. In the step of data streams pre-processing, we proposed
two algorithms for data perturbation that are the data perturbation using sliding window concept algorithm and
multiplicative data perturbation using rotation perturbation. Perturbation techniques are often evaluated with two
basic metrics: level of privacy guarantee and level of model-specific data utility preserved, which is often measured
by the loss of accuracy for data classification. Using data perturbation algorithm, we generate different perturbed
data set. And in the second step we apply the Hoeffding tree algorithm on perturbed data set. We carried out set of
experiments to generate classification model of original data set and perturbed data set. Classification results have
been evaluated on accuracy parameters. Proposed algorithms can perturb sensitive attributes with numerical values.
Two standard data sets have been perturbed and tested against original classification results. The classification result
of perturb data set using proposed algorithms shows data privacy with minimal information loss using proposed
algorithms.

References

1. J. J. Kim and W. E. Winkler, Multiplicative noise for masking continuous data, Statistical Research Division, U.S. Bureau of the
Census, 2003.
2. A. Bifet, G. Holmes, R. Kirkby and B. Pfahringer, Data Stream Mining-A Practical approach, 2011.
3. L. Golab and M. T. Ozsu, Data Stream Management Issues -A Survey Technical Report, 2003.
4. V.S. Verykios, K. Bertino, I. N. Fovino, L.P. Provenza, Y.Saygin and Theodoridis, State-of-the-Art in Privacy Preserving Data
Mining, ACM SIGMOD Record, Vol. 33, pp. 50-57, 2004.
5. W. Du and Z. Zhan, Building Decision Tree Classifier on Private Data, Proceedings of IEEE International Conference on Privacy
Security and Data Mining, pp. 1-8, 2002.
6. R. Agrawal and R. Srikant, Privacy-Preserving Data Mining, Proceedings of ACM SIGMOD International Conference on
Management of Data, pp. 439-450, 2000.
7. Y. Xu, K. Wang, A.W.Ch. Fu, R. She and J. Pei, Privacy-Preserving Data Stream Classification, pp. 489-510, Springer, 2008.
8. C. Chao, P. Chen and C. Sun, Privacy-Preserving Classification of Data Streams, Tamkang Journal of Science and Engineering, Vol.
12, No. 3, pp. 321-330, 2009.
9. A. Bifet, G. Holmes, R. Kirkby and B. Pfahringer, Data Stream Mining A Practical Approach, 2011.
Hitesh Chhinkaniwala, Kiran Patel, Sanjay Garg/ Privacy Preserving Data Stream Classification Using Data Perturbation Techniques

10. The Weka Machine Learning Workbench. http://www.cs.waikato.ac.nz/ml/weka.

11. A. Bifet, R. Kirkby, P. Kranen, P. Reutemann, MOA: Massive Online Analysis Manual, Journal of Machine Learning Research
(JMLR), 2010.
12. S. Moro, R. Laureano and P. Cortez. Using Data Mining for Bank Direct Marketing: An Application of the CRISP-DM Methodology,
In P. Novais et al. (Eds.), Proceedings of the European Simulation and Modelling Conference - ESM'2011, pp. 117-121, Portugal,
October, 2011.
13. K. Liu, H. Kargupta, and J. Ryan, Random projection-based multiplicative data perturbation for privacy preserving distributed data
mining, IEEE Transactions on Knowledge and Data Engineering (TKDE), vol. 18, No. 1, pp. 92–106, 2006.
14. S. R.M. Oliveira and O. R. Zaıane, Privacy preservation when sharing data for clustering, In Proceedings of the International
Workshop on Secure Data Management in a Connected World, Toronto, Canada, pp. 67–82, 2004.
15. K. Chen and L. Liu, “Privacy preserving data classification with rotation perturbation”, In Proceedings of the 5th IEEE International
Conference on Data Mining (ICDM’05), Houston, Texas, pp. 589–592, 2005.
16. K. Liu, C. Giannella, and H. Kargupta, An attacker’s view of distance preserving maps for privacy preserving data mining, In
Proceedings of the 10th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD’06), Berlin,
Germany, pp. 297–308, 2006.

View publication stats

Data Mining Arun Pujari (2037)
No ratings yet
Data Mining Arun Pujari (2037)
303 pages
Individual Differences, Factors, Benefits of Diversity and Classroom Strategies
100% (1)
Individual Differences, Factors, Benefits of Diversity and Classroom Strategies
18 pages
Payal Technical Seminar Final
No ratings yet
Payal Technical Seminar Final
23 pages
Data Mining Techniques, Arun K. Pujari
No ratings yet
Data Mining Techniques, Arun K. Pujari
303 pages
Unit-5 Data Mining AIML
No ratings yet
Unit-5 Data Mining AIML
31 pages
Mining Techniques For Streaming Data
No ratings yet
Mining Techniques For Streaming Data
14 pages
08 Data Mining Application
No ratings yet
08 Data Mining Application
19 pages
Scopus Journal
No ratings yet
Scopus Journal
7 pages
02 Synopsis
No ratings yet
02 Synopsis
16 pages
A Frame Work For Ontological Privacy Preserved Mining
No ratings yet
A Frame Work For Ontological Privacy Preserved Mining
12 pages
Short Notes On Unit 4 - Data Mining and Data Wareho
No ratings yet
Short Notes On Unit 4 - Data Mining and Data Wareho
7 pages
Data Stream Unit4
No ratings yet
Data Stream Unit4
20 pages
GFJHFN
No ratings yet
GFJHFN
21 pages
Ijcsn 2012 1 6 17
No ratings yet
Ijcsn 2012 1 6 17
5 pages
Privacy Preserving Data Mining-"A State of The Art": Mamta Narwaria Suchita Arya
No ratings yet
Privacy Preserving Data Mining-"A State of The Art": Mamta Narwaria Suchita Arya
5 pages
Mod4 DWDM BTECH
No ratings yet
Mod4 DWDM BTECH
9 pages
A Novel Methodology For Discrimination Prevention in Data Mining
No ratings yet
A Novel Methodology For Discrimination Prevention in Data Mining
21 pages
DataStreamsCRC Anjaly
No ratings yet
DataStreamsCRC Anjaly
258 pages
Mining Frequent Itemsets in Privacy-Preserving Data Streams Current Status, Challenges, and Research Directions
No ratings yet
Mining Frequent Itemsets in Privacy-Preserving Data Streams Current Status, Challenges, and Research Directions
9 pages
A25-Article 1686738490
No ratings yet
A25-Article 1686738490
9 pages
Nuicone 2015 7449597
No ratings yet
Nuicone 2015 7449597
6 pages
Improved Data Mining Approach To Find Frequent Itemset Using Support Count Table
No ratings yet
Improved Data Mining Approach To Find Frequent Itemset Using Support Count Table
7 pages
Literature Survey of Association Rule Based Techniques For Preserving Privacy
No ratings yet
Literature Survey of Association Rule Based Techniques For Preserving Privacy
6 pages
Comp Sci - Ijcse - Improve Frequent Patteren Mining in Data - Himanshu - Opaid
No ratings yet
Comp Sci - Ijcse - Improve Frequent Patteren Mining in Data - Himanshu - Opaid
12 pages
A Comprehensive Study of Data Stream Mining Techniques
No ratings yet
A Comprehensive Study of Data Stream Mining Techniques
9 pages
DWDM - Unit - VII
No ratings yet
DWDM - Unit - VII
42 pages
Data Mining - Unit-V
No ratings yet
Data Mining - Unit-V
12 pages
BI Ch02
No ratings yet
BI Ch02
29 pages
Data Warehousing Fundamentals - Unit 2
No ratings yet
Data Warehousing Fundamentals - Unit 2
38 pages
Sakhr - Chaib - Paper On Data Mining
No ratings yet
Sakhr - Chaib - Paper On Data Mining
3 pages
Hiding in The Crowd: Privacy Preservation On Evolving Streams Through Correlation Tracking
No ratings yet
Hiding in The Crowd: Privacy Preservation On Evolving Streams Through Correlation Tracking
10 pages
AS C I T T D M: Tudy ON Omputational Ntelligence Echniques O ATA Ining
No ratings yet
AS C I T T D M: Tudy ON Omputational Ntelligence Echniques O ATA Ining
13 pages
Additive Data Perturbation Approach For Privacy
No ratings yet
Additive Data Perturbation Approach For Privacy
9 pages
Data Mining - Past Present and Future - A Typical
No ratings yet
Data Mining - Past Present and Future - A Typical
10 pages
Adaptive Parameter-Free Learning From Evolving Data Streams
No ratings yet
Adaptive Parameter-Free Learning From Evolving Data Streams
12 pages
Big Data Analysis and Perturbation Using Data Mining Algorithm
No ratings yet
Big Data Analysis and Perturbation Using Data Mining Algorithm
10 pages
Advances in Data Stream Mining
No ratings yet
Advances in Data Stream Mining
7 pages
BI - Unit 5
No ratings yet
BI - Unit 5
9 pages
DM Unit V
No ratings yet
DM Unit V
20 pages
Week 1-2
No ratings yet
Week 1-2
3 pages
Eng-Improve Frequent Pattern Mining in Data Stream-Himanshu Shah
No ratings yet
Eng-Improve Frequent Pattern Mining in Data Stream-Himanshu Shah
10 pages
2856practical Decision Making Using Super Decisions v3 An Introduction To The Analytic Hierarchy Process 1st Edition Enrique Mu Download
No ratings yet
2856practical Decision Making Using Super Decisions v3 An Introduction To The Analytic Hierarchy Process 1st Edition Enrique Mu Download
57 pages
Convocation 2024 Letter Registration LIST 25112024
No ratings yet
Convocation 2024 Letter Registration LIST 25112024
28 pages
Privacy Preserving Data Mining Using Piecewise Vector Quantization
No ratings yet
Privacy Preserving Data Mining Using Piecewise Vector Quantization
3 pages
Privacy Preserving On Continuous and Discrete Data Sets - A Novel Approach
No ratings yet
Privacy Preserving On Continuous and Discrete Data Sets - A Novel Approach
7 pages
Charles Patterson - Eternal Treblinka PDF
No ratings yet
Charles Patterson - Eternal Treblinka PDF
156 pages
Data Streams: Models and Algorithms
No ratings yet
Data Streams: Models and Algorithms
372 pages
Data Mining
No ratings yet
Data Mining
37 pages
Classification of Data Streams With Skewed Distribution
No ratings yet
Classification of Data Streams With Skewed Distribution
55 pages
Privacy Preserving Decision Tree Learning PDF
No ratings yet
Privacy Preserving Decision Tree Learning PDF
12 pages
Methodologies For Stream Data Processing and Stream Data Systems
No ratings yet
Methodologies For Stream Data Processing and Stream Data Systems
20 pages
Overview of Streaming-Data Algorithms
No ratings yet
Overview of Streaming-Data Algorithms
10 pages
A Survey On Privacy Preserving Data Mining Techniques
No ratings yet
A Survey On Privacy Preserving Data Mining Techniques
5 pages
A Review On "Privacy Preservation Data Mining (PPDM)
No ratings yet
A Review On "Privacy Preservation Data Mining (PPDM)
6 pages
Privacy Preservation Techniques in Data Mining
No ratings yet
Privacy Preservation Techniques in Data Mining
5 pages
Data Mining System and Applications A Re
No ratings yet
Data Mining System and Applications A Re
13 pages
Mining Frequent Itemsets Based On CBSW Method: K Jothimani, DR Antony Selvadossthanmani
No ratings yet
Mining Frequent Itemsets Based On CBSW Method: K Jothimani, DR Antony Selvadossthanmani
5 pages
Siva Sankar
No ratings yet
Siva Sankar
6 pages
TPW Data Mining
No ratings yet
TPW Data Mining
4 pages
AB ELT AN WP20 8810 d0
No ratings yet
AB ELT AN WP20 8810 d0
195 pages
Adidas - Color Manual Version 03 - May 2021
No ratings yet
Adidas - Color Manual Version 03 - May 2021
22 pages
JENA
No ratings yet
JENA
91 pages
Experiment No.1: Aim: Case Study of Mis and Its Functional Subsystem Theory
No ratings yet
Experiment No.1: Aim: Case Study of Mis and Its Functional Subsystem Theory
32 pages
Stream
No ratings yet
Stream
30 pages
Internship Summary
No ratings yet
Internship Summary
3 pages
Song Lyrics
No ratings yet
Song Lyrics
45 pages
SAPinsider 2019 Compendium BI-Analytics-HANA
No ratings yet
SAPinsider 2019 Compendium BI-Analytics-HANA
68 pages
Analysis of Segment Reporting With Reference To Selected Software Companies
No ratings yet
Analysis of Segment Reporting With Reference To Selected Software Companies
18 pages
2024 Exercise Allomorph Der Inf
No ratings yet
2024 Exercise Allomorph Der Inf
5 pages
Forest Managemnet Assignment
No ratings yet
Forest Managemnet Assignment
3 pages
Column Slides - Chapter 9
No ratings yet
Column Slides - Chapter 9
17 pages
Literary Criticism 3Sede-A Group 5
No ratings yet
Literary Criticism 3Sede-A Group 5
25 pages
PLCC Overview
100% (2)
PLCC Overview
27 pages
Jožef Kolarič: Literary Intertextuality in The Lyrics of GZA, MF DOOM, Aesop Rock and Billy Woods
No ratings yet
Jožef Kolarič: Literary Intertextuality in The Lyrics of GZA, MF DOOM, Aesop Rock and Billy Woods
19 pages
A Simplified Method of Three Dimensional Technique For The Detection of AmpC Beta-Lactamases
No ratings yet
A Simplified Method of Three Dimensional Technique For The Detection of AmpC Beta-Lactamases
7 pages
Rate Analysis of Ms Maqbool Ahme03122021095727
No ratings yet
Rate Analysis of Ms Maqbool Ahme03122021095727
6 pages
The National Shipbuilding Research Program: 1988 Ship Production Symposium
No ratings yet
The National Shipbuilding Research Program: 1988 Ship Production Symposium
10 pages
Java-Important Questions
100% (3)
Java-Important Questions
3 pages
Final Exam-Poem (Pelecio, Jhaia)
No ratings yet
Final Exam-Poem (Pelecio, Jhaia)
5 pages
1-1. Location: 1. Background To Nairobi City
No ratings yet
1-1. Location: 1. Background To Nairobi City
9 pages
Zoo Conservation Programmes
No ratings yet
Zoo Conservation Programmes
4 pages
Chord Westlife My Love
No ratings yet
Chord Westlife My Love
2 pages
Environmental Product Declaration: Arcelormittal
No ratings yet
Environmental Product Declaration: Arcelormittal
10 pages
Nycote® Plus
No ratings yet
Nycote® Plus
2 pages
Eee4008 High-Voltage-Engine 1.0 37 Eee4008
No ratings yet
Eee4008 High-Voltage-Engine 1.0 37 Eee4008
3 pages
An Overview On Privacy Preserving Data Mining Methodologies
No ratings yet
An Overview On Privacy Preserving Data Mining Methodologies
5 pages
Fees Structure 2015 - 2016: PGDCA Courses
No ratings yet
Fees Structure 2015 - 2016: PGDCA Courses
3 pages
Mastering Data Mining Techniques
From Everand
Mastering Data Mining Techniques
Dhaanyalakshmi Ahuja
No ratings yet
Synthetic Data Generation: A Beginner’s Guide
From Everand
Synthetic Data Generation: A Beginner’s Guide
Robert Johnson
No ratings yet
"Big Data Science" Basic Concepts and Applications
From Everand
"Big Data Science" Basic Concepts and Applications
Sukanta Bhattacharya
No ratings yet

PP Data Stream Classification ICECIT Dec 2012

Uploaded by

PP Data Stream Classification ICECIT Dec 2012

Uploaded by

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Privacy Preserving Data Stream Classiﬁcation Using Data Perturbation

Conference Paper · December 2012

Dr. Hitesh Chhinkaniwala Sanjay Garg

SEE PROFILE SEE PROFILE

The user has requested enhancement of the downloaded file.

Privacy Preserving Data Stream Classification Using Data

* Corresponding author. Tel.: +0-942-690-5105.

2. Privacy Concern for Data Stream

Figure 1. General Process of data stream mining

3. Privacy Preserving Data Stream Classification

3.1. Data stream classification cycle

Figure 2. Data stream classification cycle

Figure 3. Framework for privacy preserving in data stream classification

5. Data Perturbation Algorithms

5.1. Data Perturbation Using Sliding Window Concept

Algorithm: Sliding Window based Data Perturbation

5.2. Multiplicative Data Perturbation Using Rotation Perturbation

Algorithm: Multiplicative Data Perturbation (Rotation perturbation)

6. Experimental setup and Results

Original Perturbed Data set

Table 3. MULTIPLICATE DATA PERTURBATION USING ROTATION PERTURBATION

Original Perturbed Data set

10. The Weka Machine Learning Workbench. http://www.cs.waikato.ac.nz/ml/weka.

View publication stats

You might also like