0% found this document useful (0 votes)
109 views15 pages

Data Mining Myths and Mistakes

Uploaded by

vinit Patidar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
109 views15 pages

Data Mining Myths and Mistakes

Uploaded by

vinit Patidar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 15

DATA MINING MYTHS AND

MISTAKES

-1 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall


Data Mining Myths
 Data mining …
 provides instant solutions/predictions
 is not yet viable for business applications
 requires a separate, dedicated database
 can only be done by those with advanced
degrees
 is only for large firms that have lots of
customer data
 is another name for the good-old statistics

-2 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall


Data mining misconceptions

 A critical point to note is that data


mining is a business process-a way of
finding patterns in your data that
provide insight you can use to conduct
your business more effectively.
 Data mining also makes predictions to
guide customer interactions and other
business decisions

-3 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall


Myth #1: Data mining is all about
algorithms
 Generally people form the impression
that data mining is all about advanced
data analysis algorithms.
 This misconception might be
summarized as follows: "All you need
for data mining is good algorithms. The
better your algorithms, the better your
data mining;

-4 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall


Myth #1: Data mining is all about
algorithms……………….. The reality…
 Data mining is a process consisting of many elements,
such as formulating business goals, mapping business
goals to data mining goals, acquiring, understanding,
and pre-processing the data, evaluating and
presenting the results of analysis and deploying these
results to achieve business benefits.
 This is not to minimize the importance of new or
improved data mining algorithms. The problem occurs
when data miners focus too much on the algorithms
and ignore the other 90-95 percent of the data mining
process.
-5 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
Myth #2: Data mining is all about
predictive accuracy

 While data mining is not all about data analysis


algorithms, there is a part of data mining that is
about algorithms. This raises the question, "How
can you judge the quality of an algorithm?"
 You might think that the main criterion would be
the predictive accuracy of the models it generates.
 This view, however, misrepresents the role of
algorithms in the data mining process.

-6 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall


Myth #2: Data mining is all about
predictive accuracy …….the reality…
 It is true that a predictive model should have some
degree of accuracy, because this demonstrates that it
has truly discovered patterns in the data.
 However, the usefulness of an algorithm or model is
also determined by a number of other properties, one
of which is whether understanding the resulting model
requires deep technical knowledge or is something that
can be understood by a typical analyst.
 Data miners who believe that predictive accuracy is the
primary criterion of algorithm evaluation might use
algorithms that can only be used by technology
-7
experts.
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
Myth #3: Data mining requires a
data warehouse
 Business people often think that a data
warehouse is a prerequisite for data
mining.
 This is a subtle misconception about the
relationship between the two
technologies.

-8 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall


Myth #3: Data mining requires a
data warehouse………...the reality….
 It is true that data mining can benefit from
warehoused data that is well organized, relatively
clean, and easy to access
 This is particularly true if the warehouse has been
constructed with data mining specifically in mind.
 However, the warehoused data may be less useful
for data mining than the source or operational data.
 warehoused data may be completely useless if only
summary data are stored.

-9 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall


Myth #4: Data mining is all about
vast quantities of data
 Early explanations of data mining often
began with statements like, "We now collect
more data than ever, yet how are we to
benefit from these vast data stores?"
 Focusing on the size of data stores provided
a convenient introduction to the topic of data
mining, but subtly misrepresented its nature.

-10 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall


Myth #4: Data mining is all about vast
quantities of data…………..the reality…
 While there are many large datasets that
organizations can benefit from mining, it
would be a mistake to believe that these
should be the sole focus of data mining.
 Many useful data mining projects are
performed on small or medium-sized
datasets-some, for example, containing only
a few hundreds or thousands of records

-11 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall


Myth #5: Data mining should be
done by a technology expert
 Data mining uses advanced technology, and
its workings, particularly those of modeling
techniques, are unlikely to be understood by
the wider IT community.
 Does this mean that data mining should be
conducted only by those who understand
every nuance of the technology that is
involved?

-12 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall


Myth #5: Data mining should be done
by a technology expert………the reality..
 Quite the opposite is true, due to the
paramount importance of business
knowledge in data mining.
 When performed without business
knowledge, data mining can produce
nonsensical or useless results
 It is essential that data mining be performed
by someone with extensive knowledge of the
business problem.
-13 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
Common Data Mining Mistakes
1. Selecting the wrong problem for data mining
2. Ignoring what your sponsor thinks data
mining is and what it really can/cannot do
3. Not leaving insufficient time for data
acquisition, selection and preparation
4. Looking only at aggregated results and not
at individual records/predictions
5. Being sloppy about keeping track of the data
mining procedure and results

-14 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall


Common Data Mining Mistakes
6. Ignoring suspicious (good or bad) findings
and quickly moving on
7. Running mining algorithms repeatedly and
blindly, without thinking about the next stage
8. Naively believing everything you are told
about the data
9. Naively believing everything you are told
about your own data mining analysis
10. Measuring your results differently from the
way your sponsor measures them
-15 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall

You might also like