Predictive Analytics and Data Mining: Segmentation Using Clustering
Predictive Analytics and Data Mining: Segmentation Using Clustering
Predictive Analytics and Data Mining: Segmentation Using Clustering
ACD is a tool
• No preclassified training data set
• No distinction between independent and dependent variables
• Marketing clusters referred to as “segments”
• Customer segmentation is a popular application of clustering
However, business data points are not about points in space but about
purchases, phone calls, airplane trips, car registrations, etc. which
have no obvious connection to the dots in a cluster diagram
Similarity & Difference
If two points are close in geometric sense then they represent similar
data in the database
Similarity & Difference
- “If you’ve got terabytes of data, and you’re relying on data mining to
find interesting things in there for you, you’ve lost before you’ve
even begun. You really need people who understand what it is they
are looking for – and what they can do with it once they find it.”
- Herb Edelstein, President of Two Crows Corporation
Inputs (Desirable Charateristics)
Relatively independent
Limited in number
* The sum of the squared differences of each element from the mean
** The total variance divided by the size of the cluster
Evaluating Clusters
Caveats:
(1) Random chance can often produce apparent clusters
(2) Different cluster methods produce different results
Solutions:
47