50% found this document useful (2 votes)
398 views44 pages

Data Mining Report (Final) 1

DATA MINING a PARTIAL FULFILMENT of the REQUIERMENT for the AWARD of DEGREE of MASTER of TECHNOLOGY in Computer ENGINEERING.

Uploaded by

Andrew Allen
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
50% found this document useful (2 votes)
398 views44 pages

Data Mining Report (Final) 1

DATA MINING a PARTIAL FULFILMENT of the REQUIERMENT for the AWARD of DEGREE of MASTER of TECHNOLOGY in Computer ENGINEERING.

Uploaded by

Andrew Allen
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 44

ISSUES & TECHNIQUES

IN
DATA MINING

A
Minor Project

SUBMITTED IN THE PARTIAL FULFILMENT OF THE REQUIERMENT


FOR THE AWARD OF DEGREE OF

MASTER OF TECHNOLOGY
IN
COMPUTER ENGINEERING
Under the Guidance of Submitted by

S. Gianetan Singh Sekhon Hans Raj

Lecturer, CE Roll. No.07-MCP-004

YCoE, Talwandi Sabo

DEPARTMENT OF COMPUTER ENGINEERING,


YADAVINDRA COLLEGE OF ENGINEERING
PUNJABI UNIVERSITY GURU KASHI CAMPUS,
TALWANDI SABO

2
Table of Contents

Chapter 1 Introduction
Chapter 2 Literature Survey
Chapter 3 Tools/Technologies to be used
Chapter 4 Various Data Mining Techaniques
Chapter 5 Applications Of Data Mining
Future Scope and Work
List of References

3
List of figures

Figure 1.1 Histogram shows the number of customers with various eye colors

Figure 1.2 Histogram shows the number of customers of different ages and quickly tells
the viewer that the majority of customers are over the age of 50.

Figure 1.3 Linear regression is similar to the task of finding the line that minimizes the
total distance to a set of data.

Figure 2.1 A decision tree is a predictive model that makes a prediction on the basis of a
series of decision much like the game of 20 questions.

Figure 2.2 Neural networks can be used for data compression and feature extraction

4
List of Tables

Table 1.1. An Example Database of Customers with Different Predictor Types

Table 1.2 Some Commercially Available Cluster Tags

Table 1.3 A Simple Example Database

Table 2.1 Decision tree algorithm segment. This segment cannot be split further except
by using the predictor "name".

5
ACKNOWLEDGEMENT

A journey is easier when you travel together. Interdependence is certainly more valuable
than independence. It is a pleasant aspect that I have now the opportunity to express my
gratitude for all of them.

I would like to express my deep and sincere gratitude to my supervisors Er. Gianetan
Singh Sekhon, Lecturer, Computer Science & Engineering Department, YCoE, Talwandi
Sabo. Their wide knowledge and logical way of thinking have been of great value for me.
Their understanding, encouraging and personal guidance even beyond the duty hours have
provided a good basis for the present project work. I would have been lost without them.

I am also thankful to Punjabi University, Patiala for providing a platform to undertake


postgraduate program for ‘in service candidates’ under highly experienced and enlightened
faculty in a splendid way. The faculty and whole staff of Department of Computer Science
& Engineering at Yadavindra College of Engineering, Talwandi Sabo also requires a
special mention for their wholehearted support.

(Hans Raj)

6
Chapter 1

Introduction

Data mining is primarily used today by companies with a strong consumer focus - retail,
financial, communication, and marketing organizations. It enables these companies to determine
relationships among "internal" factors such as price, product positioning, or staff skills, and
"external" factors such as economic indicators, competition, and customer demographics. And, it
enables them to determine the impact on sales, customer satisfaction, and corporate profits.
Finally, it enables them to "drill down" into summary information to view detail transactional
data. With data mining, a retailer could use point-of-sale records of customer purchases to send
targeted promotions based on an individual's purchase history. By mining demographic data
from comment or warranty cards, the retailer could develop products and promotions to appeal
to specific customer segments. Data mining is the process of extracting patterns from data. As
more data are gathered, with the amount of data doubling every three years, [1] data mining is
becoming an increasingly important tool to transform these data into information. It is
commonly used in a wide range of profiling practices, such as marketing, surveillance, fraud
detection and scientific discovery.While data mining can be used to uncover patterns in data
samples, it is important to be aware that the use of non-representative samples of data may
produce results that are not indicative of the domain. Similarly, data mining will not find
patterns that may be present in the domain, if those patterns are not present in the sample being
"mined". There is a tendency for insufficiently knowledgeable "consumers" of the results to
attribute "magical abilities" to data mining, treating the technique as a sort of all-seeing crystal
ball. Like any other tool, it only functions in conjunction with the appropriate raw material: in
this case, indicative and representative data that the user must first collect. Further, the discovery
of a particular pattern in a particular set of data does not necessarily mean that pattern is
representative of the whole population from which that data was drawn. Hence, an important
part of the process is the verification and validation of patterns on other samples of data.

7
Chapter 2

Literature Survey

Data mining, the extraction of hidden predictive information from large databases, is a
powerful new technology with great potential to help companies focus on the most
important information in their data warehouses. Data mining tools predict future trends and
behaviors, allowing businesses to make proactive, knowledge-driven decisions. The
automated, prospective analyses offered by data mining move beyond the analyses of past
events provided by retrospective tools typical of decision support systems. Data mining
tools can answer business questions that traditionally were too time consuming to resolve.
They scour databases for hidden patterns, finding predictive information that experts may
miss because it lies outside their expectations.Most companies already collect and refine
massive quantities of data. Data mining techniques can be implemented rapidly on existing
software and hardware platforms to enhance the value of existing information resources,
and can be integrated with new products and systems as they are brought on-line. When
implemented on high performance client/server or parallel processing computers, data
mining tools can analyze massive databases to deliver answers to questions such as,
"Which clients are most likely to respond to my next promotional mailing, and why?"

8
Chapter 3
Tools/Technologies to be used

1. Window XP
2. Net Framework 2.0
3. Crytal Report
4. My SQL Server
5. VB .Net

9
CHAPTER 4
VARIOUS DATA MINING TECHANIQUES
1.2 DATA MINING TECHANIQUES
some of the most common data mining algorithms in use today.

1.2.1 Classical Techniques: Statistics, Neighborhoods and Clustering

1.2.2 Next Generation Techniques: Trees, Networks and Rules

Each section will describe a number of data mining algorithms at a high level, focusing on
the "big picture" so that the reader will be able to understand how each algorithm fits into
the landscape of data mining techniques. Overall, six broad classes of data mining
algorithms are covered. Although there are a number of other algorithms and many
variations of the techniques described, one of the algorithms from this group of six is
almost always used in real world deployments of data mining systems.

1.2.1 Classical Techniques: Statistics, Neighborhoods and Clustering

1.2.1.1 The Classics


These two sections have been broken up based on when the data mining technique was
developed and when it became technically mature enough to be used for business,
especially for aiding in the optimization of customer relationship management systems.
Thus this section contains descriptions of techniques that have classically been used for
decades the next section represents techniques that have only been widely used since the
early 1980s.This section should help the user to understand the rough differences in the
techniques and at least enough information to be dangerous and well armed enough to not
be baffled by the vendors of different data mining tools.
The main techniques that we will discuss here are the ones that are used 99.9% of the time
on existing business problems. There are certainly many other ones as well as proprietary
techniques from particular vendors - but in general the industry is converging to those
techniques that work consistently and are understandable and explainable.

10
1.2.1.2 Statistics
By strict definition "statistics" or statistical techniques are not data mining. They were
being used long before the term data mining was coined to apply to business applications.
However, statistical techniques are driven by the data and are used to discover patterns and
build predictive models. And from the users perspective you will be faced with a
conscious choice when solving a "data mining" problem as to whether you wish to attack it
with statistical methods or other data mining techniques. For this reason it is important to
have some idea of how statistical techniques work and how they can be applied.
Difference between statistics and data mining?
I flew the Boston to Newark shuttle recently and sat next to a professor from one the
Boston area Universities. He was going to discuss the drosophila (fruit flies) genetic
makeup to a pharmaceutical company in New Jersey. He had compiled the world's largest
database on the genetic makeup of the fruit fly and had made it available to other
researchers on the internet through Java applications accessing a larger relational database.
He explained to me that they not only now were storing the information on the flies but
also were doing "data mining" adding as an aside "which seems to be very important these
days whatever that is". I mentioned that I had written a book on the subject and he was
interested in knowing what the difference was between "data mining" and statistics. There
was no easy answer. The techniques used in data mining, when successful, are successful
for precisely the same reasons that statistical techniques are successful (e.g. clean data, a
well defined target to predict and good validation to avoid overfitting). And for the most
part the techniques are used in the same places for the same types of problems (prediction,
classification discovery). In fact some of the techniques that are classical defined as "data
mining" such as CART and CHAID arose from statisticians.
So what is the difference? Why aren't we as excited about "statistics" as we are about data
mining? There are several reasons. The first is that the classical data mining techniques
such as CART, neural networks and nearest neighbor techniques tend to be more robust to
both messier real world data and also more robust to being used by less expert users. But
that is not the only reason. The other reason is that the time is right. Because of the use of
computers for closed loop business data storage and generation there now exists large
quantities of data that is available to users. IF there were no data - there would be no

11
interest in mining it. Likewise the fact that computer hardware has dramatically upped the
ante by several orders of magnitude in storing and processing the data makes some of the
most powerful data mining techniques feasible today.
The bottom line though, from an academic standpoint at least, is that there is little practical
difference between a statistical technique and a classical data mining technique. Hence we
have included a description of some of the most useful in this section.

What is statistics?
Statistics is a branch of mathematics concerning the collection and the description of data.
Usually statistics is considered to be one of those scary topics in college right up there with
chemistry and physics. However, statistics is probably a much friendlier branch of
mathematics because it really can be used every day. Statistics was in fact born from very
humble beginnings of real world problems from business, biology, and gambling!
Knowing statistics in your everyday life will help the average business person make better
decisions by allowing them to figure out risk and uncertainty when all the facts either
aren’t known or can’t be collected. Even with all the data stored in the largest of data
warehouses business decisions still just become more informed guesses. The more and
better the data and the better the understanding of statistics the better the decision that can
be made.
Statistics has been around for a long time easily a century and arguably many centuries
when the ideas of probability began to gel. It could even be argued that the data collected
by the ancient Egyptians, Babylonians, and Greeks were all statistics long before the field
was officially recognized. Today data mining has been defined independently of statistics
though “mining data” for patterns and predictions is really what statistics is all about.
Some of the techniques that are classified under data mining such as CHAID and CART
really grew out of the statistical profession more than anywhere else, and the basic ideas of
probability, independence and causality and overfitting are the foundation on which both
data mining and statistics are built.

Data, counting and probability


One thing that is always true about statistics is that there is always data involved, and
usually enough data so that the average person cannot keep track of all the data in their

12
heads. This is certainly more true today than it was when the basic ideas of probability
and statistics were being formulated and refined early this century. Today people have to
deal with up to terabytes of data and have to make sense of it and glean the important
patterns from it. Statistics can help greatly in this process by helping to answer several
important questions about your data:

• What patterns are there in my database?


• What is the chance that an event will occur?
• Which patterns are significant?
• What is a high level summary of the data that gives me some idea of what is
contained in my database?

Certainly statistics can do more than answer these questions but for most people today
these are the questions that statistics can help answer. Consider for example that a large
part of statistics is concerned with summarizing data, and more often than not, this
summarization has to do with counting. One of the great values of statistics is in
presenting a high level view of the database that provides some useful information without
requiring every record to be understood in detail. This aspect of statistics is the part that
people run into every day when they read the daily newspaper and see, for example, a pie
chart reporting the number of US citizens of different eye colors, or the average number of
annual doctor visits for people of different ages. Statistics at this level is used in the
reporting of important information from which people may be able to make useful
decisions. There are many different parts of statistics but the idea of collecting data and
counting it is often at the base of even these more sophisticated techniques. The first step
then in understanding statistics is to understand how the data is collected into a higher
level form - one of the most notable ways of doing this is with the histogram.

Histograms
One of the best ways to summarize data is to provide a histogram of the data. In the
simple example database shown in Table 1.1 we can create a histogram of eye color by
counting the number of occurrences of different colors of eyes in our database. For this
example database of 10 records this is fairly easy to do and the results are only slightly

13
more interesting than the database itself. However, for a database of many more records
this is a very useful way of getting a high level understanding of the database.
ID Name Prediction Age Balance Income Eyes Gender
1 Amy No 62 $0 Medium Brown F
2 Al No 53 $1,800 Medium Green M
3 Betty No 47 $16,543 High Brown F
4 Bob Yes 32 $45 Medium Green M
5 Carla Yes 21 $2,300 High Blue F
6 Carl No 27 $5,400 High Brown M
7 Donna Yes 50 $165 Low Blue F
8 Don Yes 46 $0 High Blue M
9 Edna Yes 27 $500 Low Blue F
10 Ed No 68 $1,200 Low Blue M

Table 1.1. An Example Database of Customers with Different Predictor Types

This histogram shown in figure 1.1 depicts a simple predictor (eye color) which will have
only a few different values no matter if there are 100 customer records in the database or
100 million. There are, however, other predictors that have many more distinct values and
can create a much more complex histogram. Consider, for instance, the histogram of ages
of the customers in the population. In this case the histogram can be more complex but
can also be enlightening. Consider if you found that the histogram of your customer data
looked as it does in figure 1.1

14
Figure 1.1 This histogram shows the number of customers with various eye colors. This
summary can quickly show important information about the database such as that blue
eyes are the most frequent.

Figure 1.2 This histogram shows the number of customers of different ages and quickly
tells the viewer that the majority of customers are over the age of 50.

By looking at this second histogram the viewer is in many ways looking at all of the data
in the database for a particular predictor or data column. By looking at this histogram it is
also possible to build an intuition about other important factors. Such as the average age of
the population, the maximum and minimum age. All of which are important. These
values are called summary statistics. Some of the most frequently used summary statistics
include:

• Max - the maximum value for a given predictor.


• Min - the minimum value for a given predictor.
• Mean - the average value for a given predictor.
• Median - the value for a given predictor that divides the database as nearly as
possible into two databases of equal numbers of records.
• Mode - the most common value for the predictor.
• Variance - the measure of how spread out the values are from the average value.

When there are many values for a given predictor the histogram begins to look smoother
and smoother (compare the difference between the two histograms above). Sometimes the

15
shape of the distribution of data can be calculated by an equation rather than just
represented by the histogram. This is what is called a data distribution. Like a histogram a
data distribution can be described by a variety of statistics. In classical statistics the belief
is that there is some “true” underlying shape to the data distribution that would be formed
if all possible data was collected. The shape of the data distribution can be calculated for
some simple examples. The statistician’s job then is to take the limited data that may have
been collected and from that make their best guess at what the “true” or at least most likely
underlying data distribution might be.
Many data distributions are well described by just two numbers, the mean and the
variance. The mean is something most people are familiar with, the variance, however,
can be problematic. The easiest way to think about it is that it measures the average
distance of each predictor value from the mean value over all the records in the database.
If the variance is high it implies that the values are all over the place and very different. If
the variance is low most of the data values are fairly close to the mean. To be precise the
actual definition of the variance uses the square of the distance rather than the actual
distance from the mean and the average is taken by dividing the squared sum by one less
than the total number of records. In terms of prediction a user could make some guess at
the value of a predictor without knowing anything else just by knowing the mean and also
gain some basic sense of how variable the guess might be based on the variance.

Statistics for Prediction


In this book the term “prediction” is used for a variety of types of analysis that may
elsewhere be more precisely called regression. We have done so in order to simplify some
of the concepts and to emphasize the common and most important aspects of predictive
modeling. Nonetheless regression is a powerful and commonly used tool in statistics and it
will be discussed here.

Linear regression
In statistics prediction is usually synonymous with regression of some form. There are a
variety of different types of regression in statistics but the basic idea is that a model is
created that maps values from predictors in such a way that the lowest error occurs in
making a prediction. The simplest form of regression is simple linear regression that just

16
contains one predictor and a prediction. The relationship between the two can be mapped
on a two dimensional space and the records plotted for the prediction values along the Y
axis and the predictor values along the X axis. The simple linear regression model then
could be viewed as the line that minimized the error rate between the actual prediction
value and the point on the line (the prediction from the model). Graphically this would
look as it does in Figure 1.3. The simplest form of regression seeks to build a predictive
model that is a line that maps between each predictor value to a prediction value. Of the
many possible lines that could be drawn through the data the one that minimizes the
distance between the line and the data points is the one that is chosen for the predictive
model.
On average if you guess the value on the line it should represent an acceptable compromise
amongst all the data at that point giving conflicting answers. Likewise if there is no data
available for a particular input value the line will provide the best guess at a reasonable
answer based on similar data.

Figure 1.3 Linear regression is similar to the task of finding the line that minimizes the
total distance to a set of data.

The predictive model is the line shown in Figure 1.3. The line will take a given value for a
predictor and map it into a given value for a prediction. The actual equation would look
something like: Prediction = a + b * Predictor. Which is just the equation for a line Y = a
+ bX. As an example for a bank the predicted average consumer bank balance might equal

17
$1,000 + 0.01 * customer’s annual income. The trick, as always with predictive modeling,
is to find the model that best minimizes the error. The most common way to calculate the
error is the square of the difference between the predicted value and the actual value.
Calculated this way points that are very far from the line will have a great effect on moving
the choice of line towards themselves in order to reduce the error. The values of a and b in
the regression equation that minimize this error can be calculated directly from the data
relatively quickly.

1.2.1.3. Nearest Neighbor


Clustering and the Nearest Neighbor prediction technique are among the oldest techniques
used in data mining. Most people have an intuition that they understand what clustering is
- namely that like records are grouped or clustered together. Nearest neighbor is a
prediction technique that is quite similar to clustering - its essence is that in order to predict
what a prediction value is in one record look for records with similar predictor values in
the historical database and use the prediction value from the record that it “nearest” to the
unclassified record.

A simple example of clustering


A simple example of clustering would be the clustering that most people perform when
they do the laundry - grouping the permanent press, dry cleaning, whites and brightly
colored clothes is important because they have similar characteristics. And it turns out
they have important attributes in common about the way they behave (and can be ruined)
in the wash. To “cluster” your laundry most of your decisions are relatively
straightforward. There are of course difficult decisions to be made about which cluster
your white shirt with red stripes goes into (since it is mostly white but has some color and
is permanent press). When clustering is used in business the clusters are often much more
dynamic - even changing weekly to monthly and many more of the decisions concerning
which cluster a record falls into can be difficult.

18
A simple example of nearest neighbor
A simple example of the nearest neighbor prediction algorithm is that if you look at the
people in your neighborhood (in this case those people that are in fact geographically near
to you). You may notice that, in general, you all have somewhat similar incomes. Thus if
your neighbor has an income greater than $100,000 chances are good that you too have a
high income. Certainly the chances that you have a high income are greater when all of
your neighbors have incomes over $100,000 than if all of your neighbors have incomes of
$20,000. Within your neighborhood there may still be a wide variety of incomes possible
among even your “closest” neighbors but if you had to predict someone’s income based
on only knowing their neighbors you’re best chance of being right would be to predict the
incomes of the neighbors who live closest to the unknown person.
The nearest neighbor prediction algorithm works in very much the same way except that
“nearness” in a database may consist of a variety of factors not just where the person
lives. It may, for instance, be far more important to know which school someone attended
and what degree they attained when predicting income. The better definition of “near”
might in fact be other people that you graduated from college with rather than the people
that you live next to.
Nearest Neighbor techniques are among the easiest to use and understand because they
work in a way similar to the way that people think - by detecting closely matching
examples. They also perform quite well in terms of automation, as many of the algorithms
are robust with respect to dirty data and missing data. Lastly they are particularly adept at
performing complex ROI calculations because the predictions are made at a local level
where business simulations could be performed in order to optimize ROI. As they enjoy
similar levels of accuracy compared to other techniques the measures of accuracy such as
lift are as good as from any other.

How to use Nearest Neighbor for Prediction


One of the essential elements underlying the concept of clustering is that one particular
object (whether they be cars, food or customers) can be closer to another object than can
some third object. It is interesting that most people have an innate sense of ordering placed
on a variety of different objects. Most people would agree that an apple is closer to an

19
orange than it is to a tomato and that a Toyota Corolla is closer to a Honda Civic than to a
Porsche. This sense of ordering on many different objects helps us place them in time and
space and to make sense of the world. It is what allows us to build clusters - both in
databases on computers as well as in our daily lives. This definition of nearness that seems
to be ubiquitous also allows us to make predictions.
The nearest neighbor prediction algorithm simply stated is:
Objects that are “near” to each other will have similar prediction values as well. Thus if
you know the prediction value of one of the objects you can predict it for it’s nearest
neighbors.

Where has the nearest neighbor technique been used in business?


One of the classical places that nearest neighbor has been used for prediction has been in
text retrieval. The problem to be solved in text retrieval is one where the end user defines
a document (e.g. Wall Street Journal article, technical conference paper etc.) that is
interesting to them and they solicit the system to “find more documents like this one”.
Effectively defining a target of: “this is the interesting document” or “this is not
interesting”. The prediction problem is that only a very few of the documents in the
database actually have values for this prediction field (namely only the documents that the
reader has had a chance to look at so far). The nearest neighbor technique is used to find
other documents that share important characteristics with those documents that have been
marked as interesting.

Using nearest neighbor for stock market data


As with almost all prediction algorithms, nearest neighbor can be used in a variety of
places. Its successful use is mostly dependent on the pre-formatting of the data so that
nearness can be calculated and where individual records can be defined. In the text
retrieval example this was not too difficult - the objects were documents. This is not
always as easy as it is for text retrieval. Consider what it might be like in a time series
problem - say for predicting the stock market. In this case the input data is just a long
series of stock prices over time without any particular record that could be considered to be
an object. The value to be predicted is just the next value of the stock price.

20
The way that this problem is solved for both nearest neighbor techniques and for some
other types of prediction algorithms is to create training records by taking, for instance, 10
consecutive stock prices and using the first 9 as predictor values and the 10th as the
prediction value. Doing things this way, if you had 100 data points in your time series you
could create 10 different training records.
You could create even more training records than 10 by creating a new record starting at
every data point. For instance in the you could take the first 10 data points and create a
record. Then you could take the 10 consecutive data points starting at the second data
point, then the 10 consecutive data point starting at the third data point. Even though
some of the data points would overlap from one record to the next the prediction value
would always be different. In our example of 100 initial data points 90 different training
records could be created this way as opposed to the 10 training records created via the
other method.

1.2.1.4. Clustering

Clustering for Clarity


Clustering is the method by which like records are grouped together. Usually this is done
to give the end user a high level view of what is going on in the database. Clustering is
sometimes used to mean segmentation - which most marketing people will tell you is
useful for coming up with a birds eye view of the business. Two of these clustering
systems are the PRIZM™ system from Claritas corporation and MicroVision™ from
Equifax corporation. These companies have grouped the population by demographic
information into segments that they believe are useful for direct marketing and sales. To
build these groupings they use information such as income, age, occupation, housing and
race collect in the US Census. Then they assign memorable “nicknames” to the clusters.
Some examples are shown in Table 1.2.
Name Income Age Education Vendor
Blue Blood Estates Wealthy 35-54 College Claritas
Prizm™
Shotguns and Pickups Middle 35-64 High School Claritas
Prizm™
Southside City Poor Mix Grade School Claritas

21
Prizm™
Living Off the Land Middle-Poor School Age Low Equifax
Families MicroVision™
University USA Very low Young - Medium to Equifax
Mix High MicroVision™
Sunset Years Medium Seniors Medium Equifax
MicroVision™

Table 1.2 Some Commercially Available Cluster Tags

This clustering information is then used by the end user to tag the customers in their
database. Once this is done the business user can get a quick high level view of what is
happening within the cluster. Once the business user has worked with these codes for some
time they also begin to build intuitions about how these different customers clusters will
react to the marketing offers particular to their business. For instance some of these
clusters may relate to their business and some of them may not. But given that their
competition may well be using these same clusters to structure their business and
marketing offers it is important to be aware of how you customer base behaves in regard to
these clusters.

Finding the ones that don't fit in - Clustering for Outliers


Sometimes clustering is performed not so much to keep records together as to make it
easier to see when one record sticks out from the rest. For instance:
Most wine distributors selling inexpensive wine in Missouri and that ship a certain volume
of product produce a certain level of profit. There is a cluster of stores that can be formed
with these characteristics. One store stands out, however, as producing significantly lower
profit. On closer examination it turns out that the distributor was delivering product to but
not collecting payment from one of their customers.
A sale on men’s suits is being held in all branches of a department store for southern
California . All stores with these characteristics have seen at least a 100% jump in
revenue since the start of the sale except one. It turns out that this store had, unlike the
others, advertised via radio rather than television.

22
How is clustering like the nearest neighbor technique?
The nearest neighbor algorithm is basically a refinement of clustering in the sense that they
both use distance in some feature space to create either structure in the data or predictions.
The nearest neighbor algorithm is a refinement since part of the algorithm usually is a way
of automatically determining the weighting of the importance of the predictors and how the
distance will be measured within the feature space. Clustering is one special case of this
where the importance of each predictor is considered to be equivalent.

How to put clustering and nearest neighbor to work for prediction


To see clustering and nearest neighbor prediction in use let’s go back to our example
database and now look at it in two ways. First let’s try to create our own clusters - which if
useful we could use internally to help to simplify and clarify large quantities of data (and
maybe if we did a very good job sell these new codes to other business users). Secondly
let’s try to create predictions based on the nearest neighbor.
First take a look at the data. How would you cluster the data in Table 1.3?
ID Name Prediction Age Balance Income Eyes Gender
1 Amy No 62 $0 Medium Brown F
2 Al No 53 $1,800 Medium Green M
3 Betty No 47 $16,543 High Brown F
4 Bob Yes 32 $45 Medium Green M
5 Carla Yes 21 $2,300 High Blue F
6 Carl No 27 $5,400 High Brown M
7 Donna Yes 50 $165 Low Blue F
8 Don Yes 46 $0 High Blue M
9 Edna Yes 27 $500 Low Blue F
10 Ed No 68 $1,200 Low Blue M

Table 1.3 A Simple Example Database

If these were your friends rather than your customers (hopefully they could be both) and
they were single, you might cluster them based on their compatibility with each other.
Creating your own mini dating service. If you were a pragmatic person you might cluster
your database as follows because you think that marital happiness is mostly dependent on
financial compatibility and create three clusters as shown in Table 1.4.

23
ID Name Prediction Age Balance Income Eyes Gender

3 Betty No 47 $16,543 High Brown F


5 Carla Yes 21 $2,300 High Blue F
6 Carl No 27 $5,400 High Brown M
8 Don Yes 46 $0 High Blue M

1 Amy No 62 $0 Medium Brown F


2 Al No 53 $1,800 Medium Green M
4 Bob Yes 32 $45 Medium Green M

7 Donna Yes 50 $165 Low Blue F


9 Edna Yes 27 $500 Low Blue F
10 Ed No 68 $1,200 Low Blue M

Table 1.4. A Simple Clustering of the Example Database

1.2.2 Next Generation Techniques: Trees, Networks and Rules

1.2.2.1. The Next Generation


The data mining techniques in this section represent the most often used techniques that
have been developed over the last two decades of research. They also represent the vast
majority of the techniques that are being spoken about when data mining is mentioned in
the popular press. These techniques can be used for either discovering new information
within large databases or for building predictive models. Though the older decision tree
techniques such as CHAID are currently highly used the new techniques such as CART are
gaining wider acceptance.

1.2.2.2. Decision Trees

What is a Decision Tree?


A decision tree is a predictive model that, as its name implies, can be viewed as a tree.
Specifically each branch of the tree is a classification question and the leaves of the tree are
partitions of the dataset with their classification. For instance if we were going to classify
customers who churn (don’t renew their phone contracts) in the Cellular Telephone
Industry a decision tree might look something like that found in Figure 2.1.

24
Figure 2.1 A decision tree is a predictive model that makes a prediction on the basis of a
series of decision much like the game of 20 questions.

You may notice some interesting things about the tree:

• It divides up the data on each branch point without losing any of the data (the
number of total records in a given parent node is equal to the sum of the records
contained in its two children).
• The number of churners and non-churners is conserved as you move up or down
the tree
• It is pretty easy to understand how the model is being built (in contrast to the
models from neural networks or from standard statistics).
• It would also be pretty easy to use this model if you actually had to target those
customers that are likely to churn with a targeted marketing offer.

You may also build some intuitions about your customer base. E.g. “customers who have
been with you for a couple of years and have up to date cellular phones are pretty loyal”.

25
Viewing decision trees as segmentation with a purpose
From a business perspective decision trees can be viewed as creating a segmentation of the
original dataset (each segment would be one of the leaves of the tree). Segmentation of
customers, products, and sales regions is something that marketing managers have been
doing for many years. In the past this segmentation has been performed in order to get a
high level view of a large amount of data - with no particular reason for creating the
segmentation except that the records within each segmentation were somewhat similar to
each other.
In this case the segmentation is done for a particular reason - namely for the prediction of
some important piece of information. The records that fall within each segment fall there
because they have similarity with respect to the information being predicted - not just that
they are similar - without similarity being well defined. These predictive segments that are
derived from the decision tree also come with a description of the characteristics that
define the predictive segment. Thus the decision trees and the algorithms that create them
may be complex, the results can be presented in an easy to understand way that can be
quite useful to the business user.

Applying decision trees to Business


Because of their tree structure and ability to easily generate rules decision trees are the
favored technique for building understandable models. Because of this clarity they also
allow for more complex profit and ROI models to be added easily in on top of the
predictive model. For instance once a customer population is found with high predicted
likelihood to attrite a variety of cost models can be used to see if an expensive marketing
intervention should be used because the customers are highly valuable or a less expensive
intervention should be used because the revenue from this sub-population of customers is
marginal.
Because of their high level of automation and the ease of translating decision tree models
into SQL for deployment in relational databases the technology has also proven to be easy
to integrate with existing IT processes, requiring little preprocessing and cleansing of the
data, or extraction of a special purpose file specifically for data mining.

26
Where can decision trees be used?
Decision trees are data mining technology that has been around in a form very similar to
the technology of today for almost twenty years now and early versions of the algorithms
date back in the 1960s. Often times these techniques were originally developed for
statisticians to automate the process of determining which fields in their database were
actually useful or correlated with the particular problem that they were trying to
understand. Partially because of this history, decision tree algorithms tend to automate the
entire process of hypothesis generation and then validation much more completely and in a
much more integrated way than any other data mining techniques. They are also
particularly adept at handling raw data with little or no pre-processing. Perhaps also
because they were originally developed to mimic the way an analyst interactively performs
data mining they provide a simple to understand predictive model based on rules (such as
“90% of the time credit card customers of less than 3 months who max out their credit
limit are going to default on their credit card loan.”).
Because decision trees score so highly on so many of the critical features of data mining
they can be used in a wide variety of business problems for both exploration and for
prediction. They have been used for problems ranging from credit card attrition prediction
to time series prediction of the exchange rate of different international currencies. There
are also some problems where decision trees will not do as well. Some very simple
problems where the prediction is just a simple multiple of the predictor can be solved much
more quickly and easily by linear regression. Usually the models to be built and the
interactions to be detected are much more complex in real world problems and this is
where decision trees excel.

Using decision trees for Exploration


The decision tree technology can be used for exploration of the dataset and business
problem. This is often done by looking at the predictors and values that are chosen for
each split of the tree. Often times these predictors provide usable insights or propose
questions that need to be answered. For instance if you ran across the following in your
database for cellular phone churn you might seriously wonder about the way your telesales
operators were making their calls and maybe change the way that they are compensated:

27
“IF customer lifetime < 1.1 years AND sales channel = telesales THEN chance of churn is
65%.

Using decision trees for Data Preprocessing


Another way that the decision tree technology has been used is for preprocessing data for
other prediction algorithms. Because the algorithm is fairly robust with respect to a variety
of predictor types (e.g. number, categorical etc.) and because it can be run relatively
quickly decision trees can be used on the first pass of a data mining run to create a subset
of possibly useful predictors that can then be fed into neural networks, nearest neighbor
and normal statistical routines - which can take a considerable amount of time to run if
there are large numbers of possible predictors to be used in the model.

Decision tress for Prediction


Although some forms of decision trees were initially developed as exploratory tools to
refine and preprocess data for more standard statistical techniques like logistic regression.
They have also been used and more increasingly often being used for prediction. This is
interesting because many statisticians will still use decision trees for exploratory analysis
effectively building a predictive model as a by product but then ignore the predictive
model in favor of techniques that they are most comfortable with. Sometimes veteran
analysts will do this even excluding the predictive model when it is superior to that
produced by other techniques. With a host of new products and skilled users now
appearing this tendency to use decision trees only for exploration now seems to be
changing.

The first step is Growing the Tree


The first step in the process is that of growing the tree. Specifically the algorithm seeks to
create a tree that works as perfectly as possible on all the data that is available. Most of the
time it is not possible to have the algorithm work perfectly. There is always noise in the
database to some degree (there are variables that are not being collected that have an
impact on the target you are trying to predict). The name of the game in growing the tree is
in finding the best possible question to ask at each branch point of the tree. At the bottom
of the tree you will come up with nodes that you would like to be all of one type or the

28
other. Thus the question: “Are you over 40?” probably does not sufficiently distinguish
between those who are churners and those who are not - let’s say it is 40%/60%. On the
other hand there may be a series of questions that do quite a nice job in distinguishing
those cellular phone customers who will churn and those who won’t. Maybe the series of
questions would be something like: “Have you been a customer for less than a year, do you
have a telephone that is more than two years old and were you originally landed as a
customer via telesales rather than direct sales?” This series of questions defines a segment
of the customer population in which 90% churn. These are then relevant questions to be
asking in relation to predicting churn.The process in decision tree algorithms is very
similar when they build trees. These algorithms look at all possible distinguishing
questions that could possibly break up the original training dataset into segments that are
nearly homogeneous with respect to the different classes being predicted. Some decision
tree algorithms may use heuristics in order to pick the questions or even pick them at
random. CART picks the questions in a very unsophisticated way: It tries them all. After
it has tried them all CART picks the best one uses it to split the data into two more
organized segments and then again asks all possible questions on each of those new
segments individually.
If the decision tree algorithm just continued growing the tree like this it could conceivably
create more and more questions and branches in the tree so that eventually there was only
one record in the segment. To let the tree grow to this size is both computationally
expensive but also unnecessary. Most decision tree algorithms stop growing the tree when
one of three criteria are met:

• The segment contains only one record. (There is no further question that you could
ask which could further refine a segment of just one.)
• All the records in the segment have identical characteristics. (There is no reason to
continue asking further questions segmentation since all the remaining records are
the same.)
• The improvement is not substantial enough to warrant making the split.

Consider the following example shown in Table 2.1 of a segment that we might want to
split further which has just two examples. It has been created out of a much larger

29
customer database by selecting only those customers aged 27 with blue eyes and salaries
between $80,000 and $81,000.
Name Age Eyes Salary Churned?
Steve 27 Blue $80,000 Yes
Alex 27 Blue $80,000 No

Table 2.1 Decision tree algorithm segment. This segment cannot be split further except
by using the predictor "name".

In this case all of the possible questions that could be asked about the two customers turn
out to have the same value (age, eyes, salary) except for name. It would then be possible
to ask a question like: “Is the customer’s name Steve?” and create the segments which
would be very good at breaking apart those who churned from those who did not:
The problem is that we all have an intuition that the name of the customer is not going to
be a very good indicator of whether that customer churns or not. It might work well for
this particular 2 record segment but it is unlikely that it will work for other customer
databases or even the same customer database at a different time. This particular example
has to do with overfitting the model - in this case fitting the model too closely to the
idiosyncrasies of the training data. This can be fixed later on but clearly stopping the
building of the tree short of either one record segments or very small segments in general is
a good idea.
After the tree has been grown to a certain size (depending on the particular stopping
criteria used in the algorithm) the CART algorithm has still more work to do. The
algorithm then checks to see if the model has been overfit to the data. It does this in
several ways using a cross validation approach or a test set validation approach. Basically
using the same mind numbingly simple approach it used to find the best questions in the
first place - namely trying many different simpler versions of the tree on a held aside test
set. The tree that does the best on the held aside data is selected by the algorithm as the
best model. The nice thing about CART is that this testing and selection is all an integral
part of the algorithm as opposed to the after the fact approach that other techniques use.

30
1.2.2.3. Neural Networks

Neural Network
When data mining algorithms are talked about these days most of the time people are
talking about either decision trees or neural networks. Of the two neural networks have
probably been of greater interest through the formative stages of data mining technology.
As we will see neural networks do have disadvantages that can be limiting in their ease of
use and ease of deployment, but they do also have some significant advantages. Foremost
among these advantages is their highly accurate predictive models that can be applied
across a large number of different types of problems.
To be more precise with the term “neural network” one might better speak of an “artificial
neural network”. True neural networks are biological systems (a k a brains) that detect
patterns, make predictions and learn. The artificial ones are computer programs
implementing sophisticated pattern detection and machine learning algorithms on a
computer to build predictive models from large historical databases. Artificial neural
networks derive their name from their historical development which started off with the
premise that machines could be made to “think” if scientists found ways to mimic the
structure and functioning of the human brain on the computer. Thus historically neural
networks grew out of the community of Artificial Intelligence rather than from the
discipline of statistics. Despite the fact that scientists are still far from understanding the
human brain let alone mimicking it, neural networks that run on computers can do some of
the things that people can do.
It is difficult to say exactly when the first “neural network” on a computer was built.
During World War II a seminal paper was published by McCulloch and Pitts which first
outlined the idea that simple processing units (like the individual neurons in the human
brain) could be connected together in large networks to create a system that could solve
difficult problems and display behavior that was much more complex than the simple
pieces that made it up. Since that time much progress has been made in finding ways to
apply artificial neural networks to real world prediction problems and in improving the
performance of the algorithm in general. In many respects the greatest breakthroughs in
neural networks in recent years have been in their application to more mundane real world
problems like customer response prediction or fraud detection rather than the loftier goals

31
that were originally set out for the techniques such as overall human learning and computer
speech and image understanding.
Because of the origins of the techniques and because of some of their early successes the
techniques have enjoyed a great deal of interest. To understand how neural networks can
detect patterns in a database an analogy is often made that they “learn” to detect these
patterns and make better predictions in a similar way to the way that human beings do.
This view is encouraged by the way the historical training data is often supplied to the
network - one record (example) at a time. Neural networks do “learn” in a very real sense
but under the hood the algorithms and techniques that are being deployed are not truly
different from the techniques found in statistics or other data mining algorithms. It is for
instance, unfair to assume that neural networks could outperform other techniques because
they “learn” and improve over time while the other techniques are static. The other
techniques if fact “learn” from historical examples in exactly the same way but often times
the examples (historical records) to learn from a processed all at once in a more efficient
manner than neural networks which often modify their model one record at a time.
A common claim for neural networks is that they are automated to a degree where the user
does not need to know that much about how they work, or predictive modeling or even the
database in order to use them. The implicit claim is also that most neural networks can be
unleashed on your data straight out of the box without having to rearrange or modify the
data very much to begin with.
Just the opposite is often true. There are many important design decisions that need to be
made in order to effectively use a neural network such as:

• How should the nodes in the network be connected?


• How many neuron like processing units should be used?
• When should “training” be stopped in order to avoid overfitting?

There are also many important steps required for preprocessing the data that goes into a
neural network - most often there is a requirement to normalize numeric data between 0.0
and 1.0 and categorical predictors may need to be broken up into virtual predictors that are
0 or 1 for each value of the original categorical predictor. And, as always, understanding
what the data in your database means and a clear definition of the business problem to be

32
solved are essential to ensuring eventual success. The bottom line is that neural networks
provide no short cuts.

Applying Neural Networks to Business


Neural networks are very powerful predictive modeling techniques but some of the power
comes at the expense of ease of use and ease of deployment. As we will see in this section,
neural networks, create very complex models that are almost always impossible to fully
understand even by experts. The model itself is represented by numeric values in a
complex calculation that requires all of the predictor values to be in the form of a number.
The output of the neural network is also numeric and needs to be translated if the actual
prediction value is categorical (e.g. predicting the demand for blue, white or black jeans for
a clothing manufacturer requires that the predictor values blue, black and white for the
predictor color to be converted to numbers).
Because of the complexity of these techniques much effort has been expended in trying to
increase the clarity with which the model can be understood by the end user. These efforts
are still in there infancy but are of tremendous importance since most data mining
techniques including neural networks are being deployed against real business problems
where significant investments are made based on the predictions from the models (e.g.
consider trusting the predictive model from a neural network that dictates which one
million customers will receive a $1 mailing).
There are two ways that these shortcomings in understanding the meaning of the neural
network model have been successfully addressed:

• The neural network is package up into a complete solution such as fraud


prediction. This allows the neural network to be carefully crafted for one particular
application and once it has been proven successful it can be used over and over
again without requiring a deep understanding of how it works.
• The neural network is package up with expert consulting services. Here the neural
network is deployed by trusted experts who have a track record of success. Either
the experts are able to explain the models or they are trusted that the models do
work.

33
The first tactic has seemed to work quite well because when the technique is used for a
well defined problem many of the difficulties in preprocessing the data can be automated
(because the data structures have been seen before) and interpretation of the model is less
of an issue since entire industries begin to use the technology successfully and a level of
trust is created. There are several vendors who have deployed this strategy (e.g. HNC’s
Falcon system for credit card fraud prediction and Advanced Software Applications
ModelMAX package for direct marketing).
Packaging up neural networks with expert consultants is also a viable strategy that avoids
many of the pitfalls of using neural networks, but it can be quite expensive because it is
human intensive. One of the great promises of data mining is, after all, the automation of
the predictive modeling process. These neural network consulting teams are little different
from the analytical departments many companies already have in house. Since there is not
a great difference in the overall predictive accuracy of neural networks over standard
statistical techniques the main difference becomes the replacement of the statistical expert
with the neural network expert. Either with statistics or neural network experts the value of
putting easy to use tools into the hands of the business end user is still not achieved.

Where to Use Neural Networks


Neural networks are used in a wide variety of applications. They have been used in all
facets of business from detecting the fraudulent use of credit cards and credit risk
prediction to increasing the hit rate of targeted mailings. They also have a long history of
application in other areas such as the military for the automated driving of an unmanned
vehicle at 30 miles per hour on paved roads to biological simulations such as learning the
correct pronunciation of English words from written text.

Neural Networks for clustering


Neural networks of various kinds can be used for clustering and prototype creation. The
Kohonen network described in this section is probably the most common network used for
clustering and segmentation of the database. Typically the networks are used in a
unsupervised learning mode to create the clusters. The clusters are created by forcing the
system to compress the data by creating prototypes or by algorithms that steer the system

34
toward creating clusters that compete against each other for the records that they contain,
thus ensuring that the clusters overlap as little as possible.

Neural Networks for Outlier Analysis


Sometimes clustering is performed not so much to keep records together as to make it
easier to see when one record sticks out from the rest. For instance:
Most wine distributors selling inexpensive wine in Missouri and that ship a certain volume
of product produce a certain level of profit. There is a cluster of stores that can be formed
with these characteristics. One store stands out, however, as producing significantly lower
profit. On closer examination it turns out that the distributor was delivering product to but
not collecting payment from one of their customers.
A sale on men’s suits is being held in all branches of a department store for southern
California . All stores with these characteristics have seen at least a 100% jump in
revenue since the start of the sale except one. It turns out that this store had, unlike the
others, advertised via radio rather than television.

Neural Networks for feature extraction


One of the important problems in all of data mining is that of determining which predictors
are the most relevant and the most important in building models that are most accurate at
prediction. These predictors may be used by themselves or they may be used in
conjunction with other predictors to form “features”. A simple example of a feature in
problems that neural networks are working on is the feature of a vertical line in a computer
image. The predictors, or raw input data are just the colored pixels that make up the
picture. Recognizing that the predictors (pixels) can be organized in such a way as to
create lines, and then using the line as the input predictor can prove to dramatically
improve the accuracy of the model and decrease the time to create it.
Some features like lines in computer images are things that humans are already pretty good
at detecting, in other problem domains it is more difficult to recognize the features. One
novel way that neural networks have been used to detect features is the idea that features
are sort of a compression of the training database. For instance you could describe an
image to a friend by rattling off the color and intensity of each pixel on every point in the
picture or you could describe it at a higher level in terms of lines, circles - or maybe even

35
at a higher level of features such as trees, mountains etc. In either case your friend
eventually gets all the information that they need in order to know what the picture looks
like, but certainly describing it in terms of high level features requires much less
communication of information than the “paint by numbers” approach of describing the
color on each square millimeter of the image.
If we think of features in this way, as an efficient way to communicate our data, then
neural networks can be used to automatically extract them. The neural network shown in
Figure 2.2 is used to extract features by requiring the network to learn to recreate the input
data at the output nodes by using just 5 hidden nodes. Consider that if you were allowed
100 hidden nodes, that recreating the data for the network would be rather trivial - simply
pass the input node value directly through the corresponding hidden node and on to the
output node. But as there are fewer and fewer hidden nodes, that information has to be
passed through the hidden layer in a more and more efficient manner since there are less
hidden nodes to help pass along the information.

Figure 2.2 Neural networks can be used for data compression and feature extraction.

In order to accomplish this the neural network tries to have the hidden nodes extract
features from the input nodes that efficiently describe the record represented at the input
layer. This forced “squeezing” of the data through the narrow hidden layer forces the
neural network to extract only those predictors and combinations of predictors that are best
at recreating the input record. hidden nodes are effectively creating features that are
combinations of the input nodes values.

36
Chapter 5
APPLICATIONS OF DATA MINING

A wide range of companies have deployed successful applications of data mining. While
early adopters of this technology have tended to be in information-intensive industries such
as financial services and direct mail marketing, the technology is applicable to any
company looking to leverage a large data warehouse to better manage their customer
relationships. Two critical factors for success with data mining are: a large, well-integrated
data warehouse and a well-defined understanding of the business process within which
data mining is to be applied (such as customer prospecting, retention, campaign
management, and so on).

Some successful application areas include:

• A pharmaceutical company can analyze its recent sales force activity and their
results to improve targeting of high-value physicians and determine which
marketing activities will have the greatest impact in the next few months. The data
needs to include competitor market activity as well as information about the local
health care systems. The results can be distributed to the sales force via a wide-area
network that enables the representatives to review the recommendations from the
perspective of the key attributes in the decision process. The ongoing, dynamic
analysis of the data warehouse allows best practices from throughout the
organization to be applied in specific sales situations.
• A credit card company can leverage its vast warehouse of customer transaction data
to identify customers most likely to be interested in a new credit product. Using a
small test mailing, the attributes of customers with an affinity for the product can
be identified. Recent projects have indicated more than a 20-fold decrease in costs
for targeted mailing campaigns over conventional approaches.
• A diversified transportation company with a large direct sales force can apply data
mining to identify the best prospects for its services. Using data mining to analyze
its own customer experience, this company can build a unique segmentation

37
identifying the attributes of high-value prospects. Applying this segmentation to a
general business database such as those provided by Dun & Bradstreet can yield a
prioritized list of prospects by region.
• A large consumer package goods company can apply data mining to improve its
sales process to retailers. Data from consumer panels, shipments, and competitor
activity can be applied to understand the reasons for brand and store switching.
Through this analysis, the manufacturer can select promotional strategies that best
reach their target customer segments.

Each of these examples have a clear common ground. They leverage the knowledge about
customers implicit in a data warehouse to reduce costs and improve the value of customer
relationships. These organizations can now focus their efforts on the most important
(profitable) customers and prospects, and design targeted marketing strategies to best reach
them.

Games

Since the early 1960s, with the availability of oracles for certain combinatorial games, also
called tablebases (e.g. for 3x3-chess) with any beginning configuration, small-board dots-
and-boxes, small-board-hex, and certain endgames in chess, dots-and-boxes, and hex; a
new area for data mining has been opened up. This is the extraction of human-usable
strategies from these oracles. Current pattern recognition approaches do not seem to fully
have the required high level of abstraction in order to be applied successfully. Instead,
extensive experimentation with the tablebases, combined with an intensive study of
tablebase-answers to well designed problems and with knowledge of prior art, i.e. pre-
tablebase knowledge, is used to yield insightful patterns. Berlekamp in dots-and-boxes etc.
and John Nunn in chess endgames are notable examples of researchers doing this work,
though they were not and are not involved in tablebase generation.

38
Business

Data mining in customer relationship management applications can contribute significantly


to the bottom line.[citation needed]
Rather than randomly contacting a prospect or customer
through a call center or sending mail, a company can concentrate its efforts on prospects
that are predicted to have a high likelihood of responding to an offer. More sophisticated
methods may be used to optimize resources across campaigns so that one may predict
which channel and which offer an individual is most likely to respond to — across all
potential offers. Additionally, sophisticated applications could be used to automate the
mailing. Once the results from data mining (potential prospect/customer and channel/offer)
are determined, this "sophisticated application" can either automatically send an e-mail or
regular mail. Finally, in cases where many people will take an action without an offer,
uplift modeling can be used to determine which people will have the greatest increase in
responding if given an offer. Data clustering can also be used to automatically discover the
segments or groups within a customer data set.

Data Mining is a highly effective tool in the catalog marketing industry. Catalogers have a
rich history of customer transactions on millions of customers dating back several years.
Data mining tools can identify patterns among customers and help identify the most likely
customers to respond to upcoming mailing campaigns.

Related to an integrated-circuit production line, an example of data mining is described in


the paper "Mining IC Test Data to Optimize VLSI Testing."[13] In this paper the application
of data mining and decision analysis to the problem of die-level functional test is
described. Experiments mentioned in this paper demonstrate the ability of applying a
system of mining historical die-test data to create a probabilistic model of patterns of die
failure which are then utilized to decide in real time which die to test next and when to stop
testing. This system has been shown, based on experiments with historical test data, to
have the potential to improve profits on mature IC products.

39
Science and engineering

In recent years, data mining has been widely used in area of science and engineering, such
as bioinformatics, genetics, medicine, education and electrical power engineering.

In the area of study on human genetics, the important goal is to understand the mapping
relationship between the inter-individual variation in human DNA sequences and
variability in disease susceptibility. In lay terms, it is to find out how the changes in an
individual's DNA sequence affect the risk of developing common diseases such as cancer.
This is very important to help improve the diagnosis, prevention and treatment of the
diseases. The data mining technique that is used to perform this task is known as
multifactor dimensionality reduction.[14]

In the area of electrical power engineering, data mining techniques have been widely used
for condition monitoring of high voltage electrical equipment. The purpose of condition
monitoring is to obtain valuable information on the insulation's health status of the
equipment. Data clustering such as self-organizing map (SOM) has been applied on the
vibration monitoring and analysis of transformer on-load tap-changers(OLTCS). Using
vibration monitoring, it can be observed that each tap change operation generates a signal
that contains information about the condition of the tap changer contacts and the drive
mechanisms. Obviously, different tap positions will generate different signals. However,
there was considerable variability amongst normal condition signals for the exact same tap
position. SOM has been applied to detect abnormal conditions and to estimate the nature of
the abnormalities.[15]

Data mining techniques have also been applied for dissolved gas analysis (DGA) on power
transformers. DGA, as a diagnostics for power transformer, has been available for many
years. Data mining techniques such as SOM has been applied to analyse data and to
determine trends which are not obvious to the standard DGA ratio techniques such as
Duval Triangle.[15]

40
Future Scope and Work

Data mining derives its name from the similarities between searching for valuable business
information in a large database — for example, finding linked products in gigabytes of
store scanner data — and mining a mountain for a vein of valuable ore. Both processes
require either sifting through an immense amount of material, or intelligently probing it to
find exactly where the value resides. Given databases of sufficient size and quality, data
mining technology can generate new business opportunities by providing these
capabilities:

• Automated prediction of trends and behaviors. Data mining automates the


process of finding predictive information in large databases. Questions that
traditionally required extensive hands-on analysis can now be answered directly
from the data — quickly. A typical example of a predictive problem is targeted
marketing. Data mining uses data on past promotional mailings to identify the
targets most likely to maximize return on investment in future mailings. Other
predictive problems include forecasting bankruptcy and other forms of default, and
identifying segments of a population likely to respond similarly to given events.

• Automated discovery of previously unknown patterns. Data mining tools sweep


through databases and identify previously hidden patterns in one step. An example
of pattern discovery is the analysis of retail sales data to identify seemingly
unrelated products that are often purchased together. Other pattern discovery
problems include detecting fraudulent credit card transactions and identifying
anomalous data that could represent data entry keying errors.

41
References

1. Lyman, Peter; Hal R. Varian (2003). "How Much Information".


http://www.sims.berkeley.edu/how-much-info-2003. Retrieved 2008-12-17.
2. Kantardzic, Mehmed (2003). Data Mining: Concepts, Models, Methods, and
Algorithms. John Wiley & Sons. ISBN 0471228524. OCLC 50055336.
3. The Data Mining Group (DMG). The DMG is an independent, vendor led group
which develops data mining standards, such as the Predictive Model Markup
Language (PMML).
4. PMML Project Page
5. Alex Guazzelli, Michael Zeller, Wen-Ching Lin, Graham Williams. PMML: An
Open Standard for Sharing Models. The R Journal, vol 1/1, May 2009.
6. Y. Peng, G. Kou, Y. Shi, Z. Chen (2008). "A Descriptive Framework for the Field
of Data Mining and Knowledge Discovery". International Journal of Information
Technology and Decision Making, Volume 7, Issue 4 7: 639 – 682.
doi:10.1142/S0219622008003204.
7. Proceedings, International Conferences on Knowledge Discovery and Data Mining,
ACM, New York.
8. SIGKDD Explorations, ACM, New York.
9. International Conference on Data Mining: 5th (2009); 4th (2008); 3rd (2007); 2nd
(2006); 1st (2005)
10. IEEE International Conference on Data Mining: ICDM09, Miami, FL; ICDM08,
Pisa (Italy); ICDM07, Omaha, NE; ICDM06, Hong Kong; ICDM05, Houston, TX;
ICDM04, Brighton (UK); ICDM03, Melbourne, FL; ICDM02, Maebashi City
(Japan); ICDM01, San Jose, CA.
11. Fayyad, Usama; Gregory Piatetsky-Shapiro, and Padhraic Smyth (1996). "From
Data Mining to Knowledge Discovery in Databases".
http://www.kdnuggets.com/gpspubs/aimag-kdd-overview-1996-Fayyad.pdf.
Retrieved 2008-12-17.

42
12. Ellen Monk, Bret Wagner (2006). Concepts in Enterprise Resource Planning,
Second Edition. Thomson Course Technology, Boston, MA. ISBN 0-619-21663-8.
OCLC 224465825.
13. Tony Fountain, Thomas Dietterich & Bill Sudyka (2000) Mining IC Test Data to
Optimize VLSI Testing, in Proceedings of the Sixth ACM SIGKDD International
Conference on Knowledge Discovery & Data Mining. (pp. 18-25). ACM Press.
14. Xingquan Zhu, Ian Davidson (2007). Knowledge Discovery and Data Mining:
Challenges and Realities. Hershey, New Your. pp. 18. ISBN 978-159904252-7.
15. A.J. McGrail, E. Gulski et al.. "Data Mining Techniques to Asses the Condition of
High Voltage Electrical Plant". CIGRE WG 15.11 of Study Committee 15.
16. R. Baker. "Is Gaming the System State-or-Trait? Educational Data Mining Through
the Multi-Contextual Application of a Validated Behavioral Model". Workshop on
Data Mining for User Modeling 2007.
17. J.F. Superby, J-P. Vandamme, N. Meskens. "Determination of factors influencing
the achievement of the first-year university students using data mining methods".
Workshop on Educational Data Mining 2006.
18. Xingquan Zhu, Ian Davidson (2007). Knowledge Discovery and Data Mining:
Challenges and Realities. Hershey, New York. pp. 163–189. ISBN 978-
159904252-7.
19. ibid. pp. 31–48.
20. Yudong Chen, Yi Zhang, Jianming Hu, Xiang Li. "Traffic Data Analysis Using
Kernel PCA and Self-Organizing Map". Intelligent Vehicles Symposium, 2006
IEEE.
21. Bate A, Lindquist M, Edwards IR, Olsson S, Orre R, Lansner A, De Freitas RM. A
Bayesian neural network method for adverse drug reaction signal generation. Eur J
Clin Pharmacol. 1998 Jun;54(4):315-21.
22. Norén GN, Bate A, Hopstadius J, Star K, Edwards IR. Temporal Pattern Discovery
for Trends and Transient Effects: Its Application to Patient Records. Proceedings
of the Fourteenth International Conference on Knowledge Discovery and Data
Mining SIGKDD 2008, pages 963-971. Las Vegas NV, 2008.

43
23. Healey, R., 1991, Database Management Systems. In Maguire, D., Goodchild,
M.F., and Rhind, D., (eds.), Geographic Information Systems: Principles and
Applications (London: Longman).
24. Câmara, A. S. and Raper, J., (eds.), 1999, Spatial Multimedia and Virtual Reality,
(London: Taylor and Francis).
25. Miller, H. and Han, J., (eds.), 2001, Geographic Data Mining and Knowledge
Discovery, (London: Taylor & Francis).
26. Government Accountability Office, Data Mining: Early Attention to Privacy in
Developing a Key DHS Program Could Reduce Risks, GAO-07-293, Washington,
D.C.: February 2007.
27. Secure Flight Program report, MSNBC.
28. "Total/Terrorism Information Awareness (TIA): Is It Truly Dead?". Electronic
Frontier Foundation (official website). 2003.
http://w2.eff.org/Privacy/TIA/20031003_comments.php. Retrieved 2009-03-15.

44

You might also like