Data Mining Process

Uploaded by

Gonzalo Porro Rosas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

0% found this document useful (0 votes)

17 views2 pages

Data Mining Process

Uploaded by

Gonzalo Porro Rosas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

You are on page 1/ 2

11/24, 1926 about blank Course Text Book: 'Getting Started with Data Science’ Publisher: IBM Press; 1 edition (Dec 13 2015) Print. Author: Murtaza Haider Prescribed Reading: Chapter 12 Pg. 529-531 Establishing Data Mining Goals The first step in data mining requires you to set up goals for the exercise. Obviously, you must identify the key questions that need to be answered. However, going beyond identifying the key questions are the concerns about the costs and benefits of the exercise. Furthermore, you must determine, in advance, the expected level of accuracy and usefulness of the results obtained from data mining. If money were no object, you could throw as many funds as necessary to get the answers required. However, the cost-benefit trade-off is always instrumental in determining the goals and scope of the data mining exercise. The level of accuracy expected from the results also influences the costs. High levels of accuracy from data mining would cost more and vice versa. Furthermore, beyond a certain level of accuracy, you do not gain much from the exercise, given the diminishing returns. Thus, the cost-benefit trade-offs for the desired level of accuracy are important considerations for data mining goals. Selecting Data The output of a data-mining exercise largely depends upon the quality of data being used. At times, data are readily available for further processing. For instance, retailers often possess large databases of customer purchases and demographics. On the other hand, data may not be readily available for data mining. In such cases, you must identify other sources of data or even plan new data collection initiatives, including surveys. The type of data, its size, and frequency of collection have a direct bearing on the cost of data mining exercise, Therefore, identifying the right kind of data needed for data mining that could answer the questions at reasonable c« is critical. Preprocessing Data Preprocessing data is an important step in data mining. Often raw data are messy, containing erroneous or irrelevant data. In addition, even with relevant data, information is sometimes missing. In the preprocessing stage, you identify the irrelevant attributes of data and expunge such attributes from further consideration, At the same time, identifying the erroneous aspects of the data set and flagging them as such is necessary. For instance, human error might lead to inadvertent merging or incorrect parsing of information between columns, Data should be subject to checks to ensure integrity. Lastly, you must develop a formal method of dealing with missing data and determine whether the data are missing randomly or systematically. If the data were missing randomly, a simple set of solutions would suffice. However, when data are missing in a systematic way, you must determine the impact of missing data on the results. For instance, a particular subset of individuals in a large data set may have refused to disclose their income. Findings relying on an individual's income as input would exclude details of those individuals whose income was not reported. This would lead to systematic biases in the analysis. Therefore, you must consider in advance if observations or variables containing missing data be excluded from the entire analysis or parts of it, Transforming Data After the relevant attributes of data have been retained, the next step is to determine the appropriate format in which data must be stored. An important consideration in data mining is to reduce the number of attributes needed to explain the phenomena. This may require transforming data Data reduction algorithms, such as about blank we(24, 1928 aboutblank Principal Component Analysis (demonstrated and explained later in the chapter), can reduce the number of attributes without a significant loss in information. In addition, variables may need to be transformed to help explain the phenomenon being studied. For instance, an individual's income may be recorded in the data set as wage income; income from other sources, such as rental properties; support payments from the government, and the like. Aggregating income from all sources will develop a representative indicator for the individual income. Often you need to transform variables from one type to another. It may be prudent to transform the continuous variable for income into a categorical variable where each record in the database is identified as low, medium, and high-income individual. This could help capture the non-linearities in the underlying behaviors. Storing Data The transformed data must be stored in a format that makes it conducive for data mining. The data must be stored in a format that gives unrestricted and immediate read/write privileges to the data scientist. During data mining, new variables are created, which are written back to the original database, which is why the data storage scheme should facilitate efficiently reading from and writing to the database. It is also important to store data on servers or storage media that keeps the data secure and also prevents the data mining algorithm from unnecessarily searching for pieces of data scattered on different servers or storage media, Data safety and privacy should be a prime concer for storing data, Mining Data After data is appropriately processed, transformed, and stored, it is subject to data mining. This step covers data analysis methods, including parametric and non-parametric methods, and machine-learning algorithms. A good starting point for data mining is data visualization. Multidimensional views of the data using the advanced graphing capabilities of data mining software are very helpful in developing a preliminary understanding of the trends hidden in the data set. Later sections in this chapter detail data mining algorithms and methods Evaluating Mining Results Afier results have been extracted from data mining, you do a formal evaluation of the results. Formal evaluation could include testing the predictive capabilities of the models on observed data to see how effective and efficient the algorithms have been in reproducing data. This is known as an "in-sample forecast", In addition, the results are shared with the key stakeholders for feedback, which is then incorporated in the later iterations of data mining to improve the proc Data mining and evaluating the results becomes an iterative process such that the analysts use better and improved algorithms to improve the quality of results generated in light of the feedback received from the key stakeholders. Skills Network about blank 22

Data Mining Notes
100% (1)
Data Mining Notes
75 pages
Unit 7
67% (3)
Unit 7
43 pages
Establishing Data Mining Goals
No ratings yet
Establishing Data Mining Goals
2 pages
Data Mining Notes
No ratings yet
Data Mining Notes
82 pages
Data Mine
No ratings yet
Data Mine
14 pages
Unit 3
No ratings yet
Unit 3
34 pages
Unit-I Data Mining
No ratings yet
Unit-I Data Mining
28 pages
Unit-1 Notes Onl
No ratings yet
Unit-1 Notes Onl
25 pages
DMDW Lecture Notes
No ratings yet
DMDW Lecture Notes
24 pages
Big Data Analytics Notes
No ratings yet
Big Data Analytics Notes
15 pages
Data Mining Notes
No ratings yet
Data Mining Notes
14 pages
Data Mining Module 2
No ratings yet
Data Mining Module 2
23 pages
Unit-2 Bi
No ratings yet
Unit-2 Bi
26 pages
Data Mining and Warehousing-1
No ratings yet
Data Mining and Warehousing-1
43 pages
Chapter 3-IB
No ratings yet
Chapter 3-IB
69 pages
DWM Notes Class by Proff
No ratings yet
DWM Notes Class by Proff
88 pages
Data Mining
No ratings yet
Data Mining
44 pages
Introduction To Data Mining For Business Analytics
No ratings yet
Introduction To Data Mining For Business Analytics
51 pages
Unit-1 Introduction To Data Mining
No ratings yet
Unit-1 Introduction To Data Mining
33 pages
Intro 2
No ratings yet
Intro 2
3 pages
Data Mining Notes
No ratings yet
Data Mining Notes
25 pages
Data Mining - KTUweb PDF
No ratings yet
Data Mining - KTUweb PDF
82 pages
Unit2 Notes
No ratings yet
Unit2 Notes
8 pages
Data Mining Mod 1 Notes
No ratings yet
Data Mining Mod 1 Notes
25 pages
DM Notes-1
No ratings yet
DM Notes-1
71 pages
Chapter 1
No ratings yet
Chapter 1
6 pages
Data Science Module 1 Notes
No ratings yet
Data Science Module 1 Notes
16 pages
A) Data Cleaning
No ratings yet
A) Data Cleaning
7 pages
Unit 3 DWM Notes
No ratings yet
Unit 3 DWM Notes
17 pages
Modul 1 CertDA
No ratings yet
Modul 1 CertDA
8 pages
Data Mining
No ratings yet
Data Mining
15 pages
Data Mining 3
No ratings yet
Data Mining 3
31 pages
Unit 1 DMW
No ratings yet
Unit 1 DMW
41 pages
Data Structures: Notes For Lecture 12 Introduction To Data Mining by Samaher Hussein Ali
No ratings yet
Data Structures: Notes For Lecture 12 Introduction To Data Mining by Samaher Hussein Ali
4 pages
Devi Ahilya Vishwavidyalaya, Indore: Session 2018 - 19
No ratings yet
Devi Ahilya Vishwavidyalaya, Indore: Session 2018 - 19
5 pages
Data Mining
No ratings yet
Data Mining
5 pages
Unit 3
No ratings yet
Unit 3
18 pages
Unit 1 Data Mining
No ratings yet
Unit 1 Data Mining
15 pages
LECTURE 3-BDM 411 Data Analytics and BIG Data
No ratings yet
LECTURE 3-BDM 411 Data Analytics and BIG Data
49 pages
Notes For DMDWH - Module1
No ratings yet
Notes For DMDWH - Module1
21 pages
Data Mining and Data Analysis UNIT-1 Notes For Print
No ratings yet
Data Mining and Data Analysis UNIT-1 Notes For Print
22 pages
AnIntroductiontoDataMining PDF
No ratings yet
AnIntroductiontoDataMining PDF
40 pages
Data Mining
No ratings yet
Data Mining
15 pages
BI - Unit 5
No ratings yet
BI - Unit 5
9 pages
Unit II Data Mining
No ratings yet
Unit II Data Mining
8 pages
Fundamentals of Datascience1
No ratings yet
Fundamentals of Datascience1
83 pages
Fundamentals of Datascience
No ratings yet
Fundamentals of Datascience
80 pages
Data Mining and Constraints: An Overview: (Vgrossi, Pedre) @di - Unipi.it, Turini@unipi - It
No ratings yet
Data Mining and Constraints: An Overview: (Vgrossi, Pedre) @di - Unipi.it, Turini@unipi - It
25 pages
Question Bank DMC
No ratings yet
Question Bank DMC
28 pages
Business Understanding This Step Involves Understanding The Problem That Needs To Be Solved and Defining The Objectives of The Data Mining Project
No ratings yet
Business Understanding This Step Involves Understanding The Problem That Needs To Be Solved and Defining The Objectives of The Data Mining Project
5 pages
DWH Unit 3
No ratings yet
DWH Unit 3
7 pages
Lecture 1428550844
No ratings yet
Lecture 1428550844
87 pages
Bi Lesson 6
No ratings yet
Bi Lesson 6
36 pages
1.data Mining Functionalities
No ratings yet
1.data Mining Functionalities
14 pages
Data Mining
No ratings yet
Data Mining
8 pages
Data Mining - Prashant
No ratings yet
Data Mining - Prashant
10 pages
R Programming Unit-2
No ratings yet
R Programming Unit-2
29 pages
DMML Notes
No ratings yet
DMML Notes
89 pages
Mining
No ratings yet
Mining
7 pages

Data Mining Process

Uploaded by

Data Mining Process

Uploaded by

You might also like