0% found this document useful (0 votes)
21 views3 pages

Understanding Organizations and Its Data

The document discusses different topics related to data mining including describing data mining, differentiating between a database, data warehouse and dataset, limitations and addressing limitations of data mining, differences between operational and organizational data, ethical issues in data mining and how to address them, out-of-synch data and how to remedy it, and normalization in OLTP and OLAP systems.

Uploaded by

Allence Dy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views3 pages

Understanding Organizations and Its Data

The document discusses different topics related to data mining including describing data mining, differentiating between a database, data warehouse and dataset, limitations and addressing limitations of data mining, differences between operational and organizational data, ethical issues in data mining and how to address them, out-of-synch data and how to remedy it, and normalization in OLTP and OLAP systems.

Uploaded by

Allence Dy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

1. Describe what is data mining in general terms?

Data mining is the process of finding anomalies, patterns and correlations within
large data sets to predict outcomes.

2. Differentiate between a database, a data warehouse, and a data set?

➔ Database - a database is an organized collection of structured information,


or data, typically stored electronically in a computer system. The data can
then be easily accessed, managed, modified, updated, controlled, and
organized.
➔ Data Warehouse - a data warehouse is a type of data management
system that is designed to enable and support business intelligence (BI)
activities, especially analytics. Data warehouses are solely intended to
perform queries and analysis and often contain large amounts of historical
data.
➔ Data Set - a data set is a collection off related, discrete items of related
data that may be accessed individually or in combination or managed as a
whole entity. A data set is organized into some type of data structure.

3. Describe the limitations of data mining? How can we address those limitations?

Along with all the great benefits that data mining may offer, they equally share
limitations or disadvantages. Some of limitations experienced with data mining
may include: (1.) violation of user privacy, (2.) additional irrelevant information,
(3) Misuse of information, and (4) accuracy of data.
The best way to address the limitations of data mining is to use the same
disadvantages and adapt to use them in your favor while simultaneously taking
advantage of all of the benefits.

4. Describe the difference between operational and organizational data? What


are the pros and cons of each?

Operational Data is the data that is produced by an organization’s day to day


operations. Including data on customers, inventory, and purchase data all fall
under the operational data category. While organizational data is data related to
the fundamental characteristics of an organization such as segmentation of sales
and marketing prospects.
Operational Data

Pros Cons

● Versatile systems ● Learning curve &


required training

● Security ● Timely installation

● Economical ●

Organizational Data

Pros Cons

● Ability to detect ● Requires


anomalies significant
investment of
money

● Improvement of ● Requires data


operations in scientist to
customer service process the
information

● Obtain better ● Threat of Privacy


understanding of
customer behavior

5. Describe the ethical issues we face in data mining? How can they be addressed?

One of the more common ethical issues with data mining is that, if an individual is
not aware that the information is being collected or how it will be used, that
individual has no opportunity to give consent for its collection and use. This can
be addressed by sharing a disclaimer beforehand.
6. Explain what is meant by out-of-synch data? How can this situation be remedied?

Out-of-synch-data refers to any supply of data that is inconsistent and breaks the
data integrity of an existing database. The best way to remedy this is to prioritize
the data integrity by sorting through and syncing the data within an existing
database and updating them to maintain consistency within systems.

7. What is normalization? What are some reasons why it is a good thing in OLTP
systems, but not so good in OLAP systems?
Data normalization or data pre processing is a basic element of data mining. It
means transforming the data, namely converting the source data in to another
format that allows for processing data effectively. The main purpose of
normalization is to minimize or even exclude duplicated data.
For a transactional database in OLTP the data is typically created in a normalized
manner, However, You can’t use a highly normalized database in OLAP. Because
the normalization can quickly lead to issues with performance.

You might also like