0% found this document useful (0 votes)

95 views8 pages

Unit 6 - Information Privacy and Data Mining

The document discusses information privacy and data mining. It covers data privacy principles, applications of data mining such as in financial data analysis and retail industries, and issues around ensuring privacy when performing data mining. Data mining can help discover patterns but also risks privacy if personal information is not properly protected.

Uploaded by

eskpg066

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

95 views8 pages

Unit 6 - Information Privacy and Data Mining

Uploaded by

eskpg066

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Data Mining and Data Warehousing Unit 6: Information Privacy and Data Mining

Information Privacy and Data Mining

Data privacy
Data privacy, also called information privacy, is the aspect of information technology (IT) that
deals with the ability an organization or individual has to determine what data in a computer system
can be shared with third parties. Information privacy is considered an important aspect of
information sharing. With the advancement of the digital age, personal information vulnerabilities
have increased.

Information privacy may be applied in numerous ways, including encryption, authentication and
data masking - each attempting to ensure that information is available only to those with authorized
access. These protective measures are geared toward preventing data mining and the unauthorized
use of personal information, which are illegal in many parts of the world.

Information Protection Principles (IPPs)

1. Collection
 Lawful: An agency must only collect personal information for a lawful purpose. It must be
directly related to the agency’s function or activities and necessary for that purpose.
 Direct: An agency must only collect personal information directly from you, unless you have
authorized collection from someone else, or if you are under the age of 16 and the information
has been provided by a parent or guardian.
 Open: An agency must inform you that the information is being collected, why it is being
collected, and who will be storing and using it.
 Relevant: An agency must ensure that your personal information is relevant, accurate,
complete, up-to-date and not excessive. The collection should not unreasonably intrude into
your personal affairs.
2. Storage
 Secure: An agency must store personal information securely, keep it no longer than
necessary and dispose of it appropriately. It should also be protected from unauthorized
access, use, modification or disclosure.
3. Access and accuracy

Arjun Lamichhane 1
Data Mining and Data Warehousing Unit 6: Information Privacy and Data Mining

 Transparent: An agency must provide you with details regarding the personal information
they are storing, why they are storing it and what rights you have to access it.
 Accessible: An agency must allow you to access your personal information without
excessive delay or expense.
4. Use
 Accurate: An agency must ensure that your personal information is relevant, accurate, up to
date and complete before using it.
 Limited: An agency can only use your personal information for the purpose for which it was
collected.
5. Disclosure
 Restricted: An agency can only disclose your information in limited circumstances if you
have consented or if you were told at the time they collected it that they would do so. An
agency cannot disclose your sensitive personal information without your consent, for
example, information about political opinions, religious or philosophical beliefs, medical
conditions or trade union membership.

Applications of Data Mining

The main purpose of data mining process is to discover the records of information and summarize
it in a simpler format for the purpose of others.

Figure 1 Common data mining application domains.

Arjun Lamichhane 2
Data Mining and Data Warehousing Unit 6: Information Privacy and Data Mining

i. Data Mining for Financial Data Analysis

Most banks and financial institutions offer a wide variety of banking, investment, and credit
services (the latter include business, mortgage, and automobile loans and credit cards). Some also
offer insurance and stock investment services. Financial data collected in the banking and financial
industry are often relatively complete, reliable, and of high quality, which facilitates systematic
data analysis and data mining. Following are a few typical cases.

 Loan payment prediction and customer credit policy analysis

Loan payment prediction and customer credit analysis are critical to the business of a bank. Many
factors can strongly or weakly influence loan payment performance and customer credit rating.
Data mining methods, such as attribute selection and attribute relevance ranking, may help identify
important factors and eliminate irrelevant ones.
 Classification and clustering of customers for targeted marketing
Classification and clustering methods can be used for customer group identification and targeted
marketing. For example, we can use classification to identify the most crucial factors that may
influence a customer’s decision regarding banking. Customers with similar behaviors regarding
loan payments may be identified by multidimensional clustering techniques. These can help
identify customer groups, associate a new customer with an appropriate customer group, and
facilitate targeted marketing.
 Detection of money laundering and other financial crimes
To detect money laundering and other financial crimes, it is important to integrate information
from multiple, heterogeneous databases (e.g., bank transaction databases and federal or state crime
history databases), as long as they are potentially related to the study. Multiple data analysis tools
can then be used to detect unusual patterns, such as large amounts of cash flow at certain periods,
by certain groups of customers.

ii. Data Mining for Retail and Telecommunication Industries

The retail industry is a well-fit application area for data mining, since it collects huge amounts of
data on sales, customer shopping history, goods transportation, consumption, and service. The
quantity of data collected continues to expand rapidly, especially due to the increasing availability,
ease, and popularity of business conducted on the Web, or e-commerce.

Arjun Lamichhane 3
Data Mining and Data Warehousing Unit 6: Information Privacy and Data Mining

Retail data mining can help identify customer buying behaviors, discover customer shopping
patterns and trends, improve the quality of customer service, achieve better customer retention and
satisfaction, enhance goods consumption ratios, design more effective goods transportation and
distribution policies, and reduce the cost of business.

A few examples of data mining in the retail industry are outlined as follows:

 Multidimensional analysis of sales, customers, products, time, and region

The retail industry requires timely information regarding customer needs, product sales, trends,
and fashions, as well as the quality, cost, profit, and service of commodities. It is therefore
important to provide powerful multidimensional analysis and visualization tools, including the
construction of sophisticated data cubes according to the needs of data analysis.
 Analysis of the effectiveness of sales campaigns
The retail industry conducts sales campaigns using advertisements, coupons, and various kinds of
discounts and bonuses to promote products and attract customers. Careful analysis of the
effectiveness of sales campaigns can help improve company profits. Multidimensional analysis
can be used for this purpose by comparing the amount of sales and the number of transactions
containing the sales items during the sales period versus those containing the same items before or
after the sales campaign. Moreover, association analysis may disclose which items are likely to be
purchased together with the items on sale, especially in comparison with the sales before or after
the campaign.
 Customer retention—analysis of customer loyalty
We can use customer loyalty card information to register sequences of purchases of particular
customers. Customer loyalty and purchase trends can be analyzed systematically. Goods purchased
at different periods by the same customers can be grouped into sequences. Sequential pattern
mining can then be used to investigate changes in customer consumption or loyalty and suggest
adjustments on the pricing and variety of goods to help retain customers and attract new ones.
 Product recommendation and cross-referencing of items
By mining associations from sales records, we may discover that a customer who buys a digital
camera is likely to buy another set of items. Such information can be used to form product
recommendations. Product recommendations can also be advertised on sales receipts, in weekly
flyers, or on the Web to help improve customer service, aid customers in selecting items, and

Arjun Lamichhane 4
Data Mining and Data Warehousing Unit 6: Information Privacy and Data Mining

increase sales. Similarly, information, such as “hot items this week” or attractive deals, can be
displayed together with the associative information to promote sales.
 Fraudulent analysis and the identification of unusual patterns
Fraudulent activity costs the retail industry millions of dollars per year. It is important to (1)
identify potentially fraudulent users and their atypical usage patterns; (2) detect attempts to gain
fraudulent entry or unauthorized access to individual and organizational accounts; and (3) discover
unusual patterns that may need special attention. Many of these patterns can be discovered by
multidimensional analysis, cluster analysis, and outlier analysis.

iii. Data Mining in Science and Engineering

In the past, many scientific data analysis tasks tended to handle relatively small and homogeneous
data sets. Such data were typically analyzed using a “formulate hypothesis, build model, and
evaluate results” paradigm. In these cases, statistical techniques were typically employed for their
analysis. Massive data collection and storage technologies have recently changed the landscape of
scientific data analysis.

Today, scientific data can be amassed at much higher speeds and lower costs. This has resulted in
the accumulation of huge volumes of high-dimensional data, stream data, and heterogenous data,
containing rich spatial and temporal information. Consequently, scientific applications are shifting
from the “hypothesize-and-test” paradigm toward a “collect and store data, mine for new
hypotheses, confirm with data or experimentation” process. This shift brings about new challenges
for data mining.

Following are examples of data mining in science and engineering:

 Mining complex data types

Scientific data sets are heterogeneous in nature. They typically involve semi-structured and
unstructured data, such as multimedia data and georeferenced stream data, as well as data with
sophisticated, deeply hidden semantics (e.g., genomic and proteomic data). Robust and dedicated
analysis methods are needed for handling spatiotemporal data, biological data, related concept
hierarchies, and complex semantic relationships.

Arjun Lamichhane 5
Data Mining and Data Warehousing Unit 6: Information Privacy and Data Mining

 Graph-based and network-based mining

It is often difficult or impossible to model several physical phenomena and processes due to
limitations of existing modeling approaches. Alternatively, labeled graphs and networks may be
used to capture many of the spatial, topological, geometric, biological, and other relational
characteristics present in scientific data sets.
Data mining in engineering shares many similarities with data mining in science. Both practices
often collect massive amounts of data, and require data preprocessing, data warehousing, and
scalable mining of complex types of data. Both typically use visualization and make good use of
graphs and networks. Moreover, many engineering processes need real-time responses, and so
mining data streams in real time often becomes a critical component.

iv. Data Mining for Intrusion Detection and Prevention

The security of our computer systems and data is at continual risk. The extensive growth of the
Internet and the increasing availability of tools and tricks for intruding and attacking networks
have prompted intrusion detection and prevention to become a critical component of networked
systems. An intrusion can be defined as any set of actions that threaten the integrity,
confidentiality, or availability of a network resource (e.g., user accounts, file systems, system
kernels, and so on). Intrusion detection systems and intrusion prevention systems both monitor
network traffic and/or system executions for malicious activities.

v. Data Mining and Recommender Systems

Today’s consumers are faced with millions of goods and services when shopping online.
Recommender systems help consumers by making product recommendations that are likely to be
of interest to the user such as books, CDs, movies, restaurants, online news articles, and other
services. An advantage of recommender systems is that they provide personalization for customers
of e-commerce, promoting one-to-one marketing.

Primary Aims of Data Mining

In practice, the two primary goals of data mining tend to be prediction and description. Prediction
involves using some variables or fields in the data set to predict unknown or future values of other
variables of interest. Description, on the other hand, focuses on finding patterns describing the data
that can be interpreted by humans.

Arjun Lamichhane 6
Data Mining and Data Warehousing Unit 6: Information Privacy and Data Mining

Following are primary data mining tasks:

1. Classification: Discovery of a predictive learning function that classifies a data item into one of
several predefined classes.

2. Regression: Discovery of a predictive learning function that maps a data item to a real value
prediction variable.

3. Clustering: A common descriptive task in which one seeks to identify a finite set of categories
or clusters to describe the data.

4. Optimization: enhance the use limited resources such as time space or material

Disadvantages of data mining

i. Privacy Issues
The concerns about the personal privacy have been increasing enormously recently especially
when the internet is booming with social networks, e-commerce, forums, blogs etc. Because of
privacy issues, people are afraid of their personal information is collected and used in an unethical
way that potentially causing them a lot of troubles. Businesses collect information about their
customers in many ways for understanding their purchasing behaviors trends. However businesses
don’t last forever, some days they may be acquired by other or gone. At this time, the personal
information they own probably is sold to other or leak.

ii. Security issues

Security is a big issue. Businesses own information about their employees and customers including
social security number, birthday, payroll and etc. However how properly this information is taken
care is still in questions. There have been a lot of cases that hackers accessed and stole big data of
customers from the big corporation such as Ford Motor Credit Company, Sony etc with so much
personal and financial information available, the credit card stolen and identity theft has also
become a big problem.

iii. Misuse of information/inaccurate information

Information is collected through data mining intended for the ethical purposes can be misused.
This information may be exploited by unethical people or businesses to take benefits of vulnerable
people or discriminate against a group of people.
Arjun Lamichhane 7
Data Mining and Data Warehousing Unit 6: Information Privacy and Data Mining

iv. Other Disadvantages:

 Excessive work intensity may require investment in high performance teams and staff
training.
 The difficulty of collecting the data. Depending on the type of data that you want to collect
can be a lot of work.
 Although less and less, the requirement of a large investment can also be considered an
inconvenience. Sometimes, the necessary technologies to carry out the data collection, is not
an easy task and consumes many resources that could suppose a high cost.
 Data mining is not a perfect process, if the information is inaccurate, it would affect the
outcome of the decision making process.

Pitfalls of Data Mining

1. The amount of available and relevant data may be much less than initially supposed
2. Data mining can occasionally take place with no clear goals and no idea of how results will
be used. So early planning and definition of business goal is required
3. Insufficient business and data knowledge
4. The more factors in a data set the system considers, the more likely the program is to find
the relation, valid or not.
5. Faulty assumptions such as no customer can hold different kinds of bank account.

References

[1] J. Han and K. Micheline, Data Mining: Concepts and Techniques, San Francisco: Elsevier Inc., 2006.

Arjun Lamichhane 8

D Mart
No ratings yet
D Mart
35 pages
Nescafe
73% (11)
Nescafe
49 pages
Datawarehousing and Data Mining
No ratings yet
Datawarehousing and Data Mining
46 pages
Business Plan Beauty Products
100% (6)
Business Plan Beauty Products
31 pages
Jack Wills Analysis
No ratings yet
Jack Wills Analysis
11 pages
Magazine 5
No ratings yet
Magazine 5
7 pages
Lecture 8 Applications of Data Mining
No ratings yet
Lecture 8 Applications of Data Mining
16 pages
DM Material
No ratings yet
DM Material
98 pages
Final Document
No ratings yet
Final Document
25 pages
WK9 Data Mining
No ratings yet
WK9 Data Mining
7 pages
Cs655 Unit V
No ratings yet
Cs655 Unit V
15 pages
5.1 Applications of Data Mining: Unit V - Data Warehousing and Data Mining - Ca5010 1
No ratings yet
5.1 Applications of Data Mining: Unit V - Data Warehousing and Data Mining - Ca5010 1
16 pages
KM Notes Unit-3
No ratings yet
KM Notes Unit-3
20 pages
Data Mining First Draft
No ratings yet
Data Mining First Draft
84 pages
Annotating Full Document
No ratings yet
Annotating Full Document
48 pages
Unit 5 DWDM
No ratings yet
Unit 5 DWDM
42 pages
Data Mining
No ratings yet
Data Mining
9 pages
L - 1 Data Mining
No ratings yet
L - 1 Data Mining
17 pages
Applications and Trends in Data Mining
100% (1)
Applications and Trends in Data Mining
20 pages
Data Mining Tools and Applications
No ratings yet
Data Mining Tools and Applications
5 pages
Understanding Data Warehousing and Data Mining
No ratings yet
Understanding Data Warehousing and Data Mining
7 pages
Data Mining Tutorial
No ratings yet
Data Mining Tutorial
30 pages
Unit 2 (DWDM)
No ratings yet
Unit 2 (DWDM)
40 pages
Totolan Data Mining
No ratings yet
Totolan Data Mining
25 pages
Applications of Data Mining
No ratings yet
Applications of Data Mining
2 pages
Absract:: Data, Information, and Knowledge
No ratings yet
Absract:: Data, Information, and Knowledge
7 pages
Data Mining and Warehousing Concepts: Hapter
No ratings yet
Data Mining and Warehousing Concepts: Hapter
7 pages
Data Mining1
No ratings yet
Data Mining1
37 pages
Data Mining Unit 1 (MSC Ds 3 Sem)
No ratings yet
Data Mining Unit 1 (MSC Ds 3 Sem)
119 pages
Synopsis Data Warehouse and Data Mining
No ratings yet
Synopsis Data Warehouse and Data Mining
4 pages
Data Mining: A Competitive Tool in The Banking and Retail Industries
No ratings yet
Data Mining: A Competitive Tool in The Banking and Retail Industries
7 pages
Data Mining M1
No ratings yet
Data Mining M1
64 pages
Unit IV DWDM
No ratings yet
Unit IV DWDM
40 pages
Data Mining Notes
No ratings yet
Data Mining Notes
25 pages
Notes DATA MINING MBA III
No ratings yet
Notes DATA MINING MBA III
8 pages
Lec 1
No ratings yet
Lec 1
7 pages
Data Mining
No ratings yet
Data Mining
89 pages
Bit 4101 Business Data Minning and Warehousing
No ratings yet
Bit 4101 Business Data Minning and Warehousing
11 pages
Explain How Files and Databases Are Used in Organizations
No ratings yet
Explain How Files and Databases Are Used in Organizations
5 pages
Data Mining and Data Warehousing Unit 3 Part 1
No ratings yet
Data Mining and Data Warehousing Unit 3 Part 1
13 pages
Data Mining-Introduction
No ratings yet
Data Mining-Introduction
8 pages
Data Warehousing&Dat Mining
No ratings yet
Data Warehousing&Dat Mining
12 pages
Data Mining
No ratings yet
Data Mining
130 pages
Introduction To Data Mining - 125604
No ratings yet
Introduction To Data Mining - 125604
7 pages
Chapter 4 Predictive Analytics I Data Mining Process J Methods J and Algorithms
No ratings yet
Chapter 4 Predictive Analytics I Data Mining Process J Methods J and Algorithms
10 pages
Unit 4 New Database Applications and Environments: by Bhupendra Singh Saud
No ratings yet
Unit 4 New Database Applications and Environments: by Bhupendra Singh Saud
14 pages
Data Mining
No ratings yet
Data Mining
14 pages
Mca-Ii Year-Mcas4110Data Warehousing and Mining
No ratings yet
Mca-Ii Year-Mcas4110Data Warehousing and Mining
63 pages
Unit 1
No ratings yet
Unit 1
27 pages
Data Mining
No ratings yet
Data Mining
69 pages
Data Mining Seminar
50% (2)
Data Mining Seminar
21 pages
Data Warehousing & Data Mining Slides
No ratings yet
Data Warehousing & Data Mining Slides
23 pages
1 ST Review Document
No ratings yet
1 ST Review Document
37 pages
Aispre 3 Lesson 3 Msword
No ratings yet
Aispre 3 Lesson 3 Msword
3 pages
UNIT 1 - Lecture 1 - Introduction To Data Mining
No ratings yet
UNIT 1 - Lecture 1 - Introduction To Data Mining
62 pages
Group 4
No ratings yet
Group 4
16 pages
Unit 1 Data Warehouse and Data Mining
No ratings yet
Unit 1 Data Warehouse and Data Mining
13 pages
Sri Venkateswara Engineering College
No ratings yet
Sri Venkateswara Engineering College
15 pages
Motivation of Data Mining
No ratings yet
Motivation of Data Mining
4 pages
08 Data Mining Application
No ratings yet
08 Data Mining Application
19 pages
Lps Week 16 Iatb
No ratings yet
Lps Week 16 Iatb
5 pages
Data Mining: What Is Data Mining?: Correlations or Patterns Among Fields in Large Relational Databases
No ratings yet
Data Mining: What Is Data Mining?: Correlations or Patterns Among Fields in Large Relational Databases
6 pages
Data Mining AND Warehousing: Abstract
No ratings yet
Data Mining AND Warehousing: Abstract
12 pages
Unit 11-Entrepreneurial Supply Chain
No ratings yet
Unit 11-Entrepreneurial Supply Chain
16 pages
Unit 4-Coordination in A Supply Chain
No ratings yet
Unit 4-Coordination in A Supply Chain
26 pages
Unit 7 - Advanced Application
No ratings yet
Unit 7 - Advanced Application
5 pages
Unit 4 - Association Analysis
100% (1)
Unit 4 - Association Analysis
12 pages
Marketing Project On Magnum and Cornetto
50% (2)
Marketing Project On Magnum and Cornetto
37 pages
Group 4 Amazon 45k01.1
No ratings yet
Group 4 Amazon 45k01.1
73 pages
Consys International - Company Profile
No ratings yet
Consys International - Company Profile
8 pages
Unit 1 E-Commerce: Structure
100% (1)
Unit 1 E-Commerce: Structure
60 pages
Consumer Psychology
100% (2)
Consumer Psychology
18 pages
Case Study: Tea Tablets: Submitted To Dr. Harishchandra Singh Rathod Subject: Integrated Marketing Communication
100% (1)
Case Study: Tea Tablets: Submitted To Dr. Harishchandra Singh Rathod Subject: Integrated Marketing Communication
8 pages
Unit 14
No ratings yet
Unit 14
7 pages
Cash Credit System of HDFC Bank
No ratings yet
Cash Credit System of HDFC Bank
38 pages
5 Force Model
No ratings yet
5 Force Model
2 pages
Project Title
No ratings yet
Project Title
2 pages
Ecommerce Report
No ratings yet
Ecommerce Report
18 pages
Retailing Management Practices of Giordano International
No ratings yet
Retailing Management Practices of Giordano International
12 pages
Sensory Marketing Influences Comestic Industry
No ratings yet
Sensory Marketing Influences Comestic Industry
351 pages
BD Food-2
100% (1)
BD Food-2
56 pages
AVM Farms Business Plan
No ratings yet
AVM Farms Business Plan
3 pages
Brand Packaging Jun2013
No ratings yet
Brand Packaging Jun2013
40 pages
Final
No ratings yet
Final
45 pages
Theme Park Case Study PDF
No ratings yet
Theme Park Case Study PDF
8 pages
Impact of Social Media Marketing On Online Impulse Buying Behavior Bangladesh 2020
No ratings yet
Impact of Social Media Marketing On Online Impulse Buying Behavior Bangladesh 2020
10 pages
Literature Review On Online Shopping in India
100% (1)
Literature Review On Online Shopping in India
4 pages
Rudra Sen Report
No ratings yet
Rudra Sen Report
50 pages
Customer Service: Mcgraw-Hill/Irwin Retailing Management, 6/E
No ratings yet
Customer Service: Mcgraw-Hill/Irwin Retailing Management, 6/E
26 pages
Beepxtra Presentation Sales Version
No ratings yet
Beepxtra Presentation Sales Version
24 pages
Walmart
No ratings yet
Walmart
16 pages
Ctoe Case Study: New Balance Athletic Shoe Inc
100% (1)
Ctoe Case Study: New Balance Athletic Shoe Inc
12 pages
Cambodia Real Estate Highlights h2 2024 11928
No ratings yet
Cambodia Real Estate Highlights h2 2024 11928
19 pages

Unit 6 - Information Privacy and Data Mining

Uploaded by

Unit 6 - Information Privacy and Data Mining

Uploaded by

Data Mining and Data Warehousing Unit 6: Information Privacy and Data Mining

Information Privacy and Data Mining

Information Protection Principles (IPPs)

Applications of Data Mining

Figure 1 Common data mining application domains.

i. Data Mining for Financial Data Analysis

 Loan payment prediction and customer credit policy analysis

ii. Data Mining for Retail and Telecommunication Industries

 Multidimensional analysis of sales, customers, products, time, and region

iii. Data Mining in Science and Engineering

Following are examples of data mining in science and engineering:

 Mining complex data types

 Graph-based and network-based mining

iv. Data Mining for Intrusion Detection and Prevention

v. Data Mining and Recommender Systems

Primary Aims of Data Mining

Following are primary data mining tasks:

Disadvantages of data mining

ii. Security issues

iii. Misuse of information/inaccurate information

iv. Other Disadvantages:

Pitfalls of Data Mining

You might also like