Data Mining Cluster

This document discusses using data mining techniques to analyze data on Football Bowl Subdivision schools and their characteristics. It describes applying hierarchical and k-means clustering to group schools based on attributes like stadium size, location, enrollment, and endowment. The analysis aims to form balanced conferences but faces challenges with uneven cluster sizes. Market basket analysis on web browsing data finds strong associations between Facebook, Twitter, and YouTube that can guide targeted advertising.

Uploaded by

api-315994488

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

50% found this document useful (2 votes)

1K views

Data Mining Cluster

Uploaded by

api-315994488

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 4

BUS 443: Business Analytics

Data Mining Case

PART 1: DATA MINING TECHNIQUES TO FIND PATTERNS UNSUPERVISED LEARNING
Problem 1: Hierarchical Cluster Analysis with the Football Bowl Subdivision (FBS)
We started this example in class and will now do some further analysis. The Football Bowl Subdivision
(FBS) of the National Collegiate Athletic Association (NCAA) consists of over 100 schools. Most of
these schools belong to one of several conferences, or collections of schools, that compete with each other
on a regular basis in collegiate sports. Suppose the NCAA has commissioned a study that will propose the
formation of conferences based on the similarities of the constituent schools.
1. Open the FBS file (found in the Chapter 6 textbook files) that contains rows of information on
constituent FBS schools. Apply hierarchical clustering with 10 clusters using football stadium
capacity, latitude, longitude, endowment, and enrollment as variables. Use Wards method as the
clustering algorithm. Be sure to normalize the data. Copy the assigned cluster column to the data
sheet.
2. Use a Pivot Table on the data in the HC_Clusters sheet to identify the cluster with the largest
average football stadium capacity. Which cluster and school have the highest?
a. Cluster 2 has the largest average stadium capacity
b. Tennessee has the largest stadium capacity
3. How would you characterize the universities in this cluster?
a. The schools in this conference are in the SE and have high capacity in their stadiums as
well as large enrollment numbers
4. What is the smallest cluster (the one with the fewest observations) and what makes it unique?
a. The smallest cluster was cluster 4 (Stanford)
b. Stanford has a large endowment and it is the only school in its cluster
5. Examine the dendrogram on the HC_Dendrogram worksheet (as well as the sequence of clustering
stages in the HC_Output sheet). What number of clusters seems to be the most natural fit based on
the distance?
a. After examining the dendrogram we found that somewhere between 9 & 11 clusters would
be ideal
6. Create another pivot table and count the number of schools per cluster. Analyze the results. Why
arent these cluster results appropriate, or (restated) why should we rerun the cluster analysis using
different variables or a different number of clusters?
a. We had one cluster with 30 schools and another with only 1. This is unacceptable because
clusters are supposed to group things together and there is not a high level of uniformity
across the various clusters.
b. This included in our large pivot table and was highlighted red.
7. Apply hierarchical clustering again with 10 clusters using just latitude and longitude as the
variables. Be sure to normalize the data and specify single linkage as the clustering method. Use a
Pivot Table on the data in HC_Clusters. You can also visualize the clusters with a scatter plot with
longitude as the x-variable and latitude as the y-variable. Compare the clusters to the previous
method. Which is the better method?
a. We found that using Wards method was the superior clustering technique. Under the
current technique, data was not very distributed and we had one large cluster consisting of
98 schools. There were also clusters with only one school. Ultimately, longitude and

latitude alone are not good variables to cluster colleges by and single linkage clustering
yielded a poor result.
Problem 2: k-Means Cluster Analysis with the Football Bowl Subdivision (FBS)
1. Open the FBS file used in Problem 1 and copy the data to a new workbook. Delete the cluster
column from the hierarchical clustering in Problem 1.
2. Apply k-Means clustering with k=10 using football stadium capacity, latitude, longitude,
endowment, and enrollment as variables. Specify 50 iterations and 10 random starts and
normalize the data.
3. Analyze the resultant clusters. What is the smallest cluster (the one with the fewest observations)?
a. The smallest cluster is cluster 5
4. What is the least dense (aka most diverse) cluster, as measured by the largest average distance in
the cluster? What makes the least dense cluster so diverse?
a. Cluster 1 is the least dense
b. It is so diverse because there are multiple observations and they are more spread out than
a highly concentrated cluster group. The density is low because of this distance apart and
the relatively small number of observations to group these 5 universities together.
5. What problems do you see with the plan of defining the school membership of the 10 conferences
directly with these 10 clusters?
a. Cluster 2 only has 3 schools which would be awful for a FBS conference
b. Cluster 5 is also too small with only 1 school in that division
c. Cluster 7 is an outlier with 27 schools in the division
d. Overall the range of the sizes of these clusters span a large distance. It spans form 1 to 27
which makes for a lot of variance.
Problem 3: Both Types of Cluster Analysis with the Football Bowl Subdivision (FBS)
The NCAA has a preference for conferences consisting of similar schools with respect to their
endowment, enrollment, and football stadium size, but these conferences must be in the same geographic
region to reduce traveling costs. Take the following steps to address this desire.
1. Apply k-means clustering again (in a new worksheet) using latitude and longitude as variables
with k=3. Be sure to normalize and specific 50 iterations and 10 random starts. Then create one
distinct data set (one spreadsheet) for each of the three regional clusters (east, west, and south).
2. For the west cluster, apply hierarchical clustering with Wards method and use normalized data to
form two sub-clusters using football stadium capacity, endowment, and enrollment as variables.
Use a PivotTable on the data in HC_Clusters to report the characteristics of each cluster.
Row Labels
1
2
Grand Total

Average of
Enrollment
26589.2381
19945
26287.2272
7

Average of
StadiumCapacit
y
49088.71429
50000

Average of
Endowment
($000)
842519.4762
16502606

49130.13636

1554341.591

Count of
SubCluster
21
1
22

Cluster1has21schoolswhilecluster2onlyhas1school.Cluster1hashighersignificantlyhigher
endowment.

3. Do the same for the east cluster, using three sub-clusters.

Row Labels
1
2
3
Grand Total

Average of
Stadium
Capacity
63568.4
63347.66667
34350.73077
50217.80702

Average of
Endowment
($000)
1336091.8
5866583.5
193019.3462
1291584.193

Average of
Enrollment
32963.4
21313
24231.80769
27754.21053

Count of
Sub-Cluster
25
6
26
57

Cluster1and3hassimilarnumberofschoolsinthereclusterswhilecluster2ismadeupofonly6
schools.
a. Cluster 1
4. Do the same for the south cluster, using four sub-clusters.
Row
Labels
1
2
3
4
Grand
Total

Count of
SubCluster
17
2
21
8

Average of
StadiumCapacity
39736.11765
85812
66754.7619
66461.125

Average of
Endowment ($000)
113253.5882
3652205.5
547584.5238
1191190.375

Average of Enrollment
25873.17647
22330.5
29637.04762
22726

57930.77083

630385.8333

26847.72917

Cluster2onlyhas2schools.Therangeofthenumberofschoolsineachclusterisntbalanced.

5. What problems do you see with this plan? How could this approach be tweaked to solve the
problem?
a. The latitude and longitudes doesnt necessarily pick up the proximity of the schools to
each other. For example Hawaii was in the south division when logically they should be
in the west. It might be necessary to manually alter some of the clusters because of this.
b. Within each region there is still an uneven number of schools within each sub-cluster.
This problem could be improved by adding a North region. Creating more geographical
regions besides East, South, West, and North could expand this solution further. Getting
more data on each school would better help cluster them such as ranking.
Problem 4: Market Basket Analysis on Cookie Monster, Inc. (Problem 8 in our Textbook)
Cookie Monster Inc. is a company that specializes in the development of software that tracks Web
browsing history of individuals.
1. Open the CookieMonster file and review the binary matrix format. The entry in row and column
indicates whether the column website was visited by the row user. Using a minimum support of
800 transactions and a minimum confidence of 50%, use XLMiner to generate a list of
association rules.
2. Review the top 14 rules. What information does this analysis provide Cookie Monster regarding
the online behavior of individuals? Be sure to address the lift ratios (and the meaning of the lift
ratios) in common terms that a business user would immediately understand.
a. The lift ratio is a measure of the usefulness of a rule. Lift ratio is made by the support of
(antecedent and consequent) divided by support of the antecedent. This information
regarding online behavior indicates that there is a correlation between Facebook, Twitter,
and YouTube. The highest lift ratios come from any combination of two of these, which
leads to the third. This also allows us to determine the ones with low lift ratios, which are

less effective of measuring customers click patterns. If you know customers are going to
go to all three of these sites you could save money by only advertising on one or flood
the market by advertising on all three.

Salesforce AI Associate Dumps
100% (4)
Salesforce AI Associate Dumps
60 pages
Sunbeam Popcorn Maker FPSBPP7310 FPSBPP7316
60% (10)
Sunbeam Popcorn Maker FPSBPP7310 FPSBPP7316
9 pages
Wiley - Operations Management - An Integrated Approach, 7th Edition - 978-1-119-49706-6
No ratings yet
Wiley - Operations Management - An Integrated Approach, 7th Edition - 978-1-119-49706-6
3 pages
Ford Escape 4wd Workshop Manual v6 3 0l 2008
100% (4)
Ford Escape 4wd Workshop Manual v6 3 0l 2008
7,556 pages
2019 Book EssentialsOfBusinessAnalytics PDF
93% (14)
2019 Book EssentialsOfBusinessAnalytics PDF
971 pages
AP Statistics Chapter 3
0% (1)
AP Statistics Chapter 3
3 pages
Udemy 2024 Learning Trends Top 100 Surging Skills Infographic
100% (1)
Udemy 2024 Learning Trends Top 100 Surging Skills Infographic
1 page
Richard v. McCarthy - Applying Predictive Analytics - Finding Value in Data-Springer (2021)
0% (1)
Richard v. McCarthy - Applying Predictive Analytics - Finding Value in Data-Springer (2021)
282 pages
Chapter 4 Random Variables
No ratings yet
Chapter 4 Random Variables
180 pages
Machine Learning Assignment
No ratings yet
Machine Learning Assignment
5 pages
CS5460 ChadMaughan Assignment4
No ratings yet
CS5460 ChadMaughan Assignment4
3 pages
Pima Indian Diabetes Questions
No ratings yet
Pima Indian Diabetes Questions
6 pages
Building Recommendation System Using Movielens Data
No ratings yet
Building Recommendation System Using Movielens Data
6 pages
Data Mining Case
No ratings yet
Data Mining Case
8 pages
Home Depot Strategy
100% (1)
Home Depot Strategy
8 pages
Anomaly Detection Report
No ratings yet
Anomaly Detection Report
33 pages
Data Mining Worksheet One
No ratings yet
Data Mining Worksheet One
2 pages
30 Hrs Deep Learning CV Images Video
No ratings yet
30 Hrs Deep Learning CV Images Video
6 pages
C1a - Anomaly Detection
No ratings yet
C1a - Anomaly Detection
12 pages
Data Mining Clustering
No ratings yet
Data Mining Clustering
76 pages
Presented By:: Huffman Coding
No ratings yet
Presented By:: Huffman Coding
7 pages
Elseviers Cas Latex Double Column Template
No ratings yet
Elseviers Cas Latex Double Column Template
4 pages
Data Mining
No ratings yet
Data Mining
15 pages
Rapid Minder Assignment
No ratings yet
Rapid Minder Assignment
38 pages
Computer Science Textbook Solutions - 5
No ratings yet
Computer Science Textbook Solutions - 5
31 pages
Installation and Configuration - SAS Enterprise Miner
No ratings yet
Installation and Configuration - SAS Enterprise Miner
36 pages
Machine Learning & Data Mining: Understanding
No ratings yet
Machine Learning & Data Mining: Understanding
7 pages
Answers To Problems For Data Mining and Predictive Analytics (2nd Edition) by Larose
No ratings yet
Answers To Problems For Data Mining and Predictive Analytics (2nd Edition) by Larose
12 pages
Intrusion Detection System in Software Defined Networks Using Machine Learning Approach
No ratings yet
Intrusion Detection System in Software Defined Networks Using Machine Learning Approach
8 pages
Statistics For Management - 3
No ratings yet
Statistics For Management - 3
32 pages
Logistic Regression
No ratings yet
Logistic Regression
47 pages
Detection of Abnormalities in Real-Time Computer Network Traffic Empowered by Machine Learning
No ratings yet
Detection of Abnormalities in Real-Time Computer Network Traffic Empowered by Machine Learning
8 pages
Module 1 Quiz
No ratings yet
Module 1 Quiz
7 pages
DBSCAN
No ratings yet
DBSCAN
18 pages
Chap 1 Data Preprocessing
No ratings yet
Chap 1 Data Preprocessing
17 pages
Game Theory and Machine Learning For Cyber Security (Charles A. Kamhoua (Editor) Etc.) (Z-Library)
No ratings yet
Game Theory and Machine Learning For Cyber Security (Charles A. Kamhoua (Editor) Etc.) (Z-Library)
547 pages
RMM Unit-I Introdution To Data Mining
No ratings yet
RMM Unit-I Introdution To Data Mining
129 pages
Data Mining Project Proposal
No ratings yet
Data Mining Project Proposal
7 pages
Cheatsheet Midterms 2 - 3
No ratings yet
Cheatsheet Midterms 2 - 3
2 pages
K Means R and Rapid Miner Patient and Mall Case Study
No ratings yet
K Means R and Rapid Miner Patient and Mall Case Study
80 pages
Q.1. Why Is Data Preprocessing Required?
100% (1)
Q.1. Why Is Data Preprocessing Required?
26 pages
12 Outlier
No ratings yet
12 Outlier
55 pages
Data Preprocessing ML Lab
No ratings yet
Data Preprocessing ML Lab
6 pages
Diabetes Prediction Report
No ratings yet
Diabetes Prediction Report
16 pages
DSBDAL - Assignment No 9
No ratings yet
DSBDAL - Assignment No 9
12 pages
Rayleigh Model
No ratings yet
Rayleigh Model
9 pages
Cluster Analysis Chapter 8 Solution
No ratings yet
Cluster Analysis Chapter 8 Solution
8 pages
Chapter 5 - Data Exploration and Visualization With
No ratings yet
Chapter 5 - Data Exploration and Visualization With
39 pages
Chapter 5 - Classification Problems
100% (1)
Chapter 5 - Classification Problems
25 pages
Untitled
No ratings yet
Untitled
1,326 pages
Cluster Analysis in Python Chapter2 PDF
No ratings yet
Cluster Analysis in Python Chapter2 PDF
30 pages
Predictive Analytics For Future Life Expectancy Using Machine Learning
No ratings yet
Predictive Analytics For Future Life Expectancy Using Machine Learning
6 pages
Hierarchical Cluster Analysis
No ratings yet
Hierarchical Cluster Analysis
4 pages
Machine Learning Techniques For Heart Disease Prediction: A. Lakshmanarao, Y.Swathi, P.Sri Sai Sundareswar
No ratings yet
Machine Learning Techniques For Heart Disease Prediction: A. Lakshmanarao, Y.Swathi, P.Sri Sai Sundareswar
4 pages
ML L8 Decision Tree
No ratings yet
ML L8 Decision Tree
109 pages
Unit II Visualizing Using Matplotlib
No ratings yet
Unit II Visualizing Using Matplotlib
24 pages
Duda Solutions PDF
No ratings yet
Duda Solutions PDF
77 pages
TF Idf Algorithm
No ratings yet
TF Idf Algorithm
4 pages
12-Exploratory Data Analysis, Anomaly Detection-28!03!2023
No ratings yet
12-Exploratory Data Analysis, Anomaly Detection-28!03!2023
79 pages
Big Data Unit5
No ratings yet
Big Data Unit5
57 pages
Rapid Miner
No ratings yet
Rapid Miner
24 pages
Programming Test: Learning Activations in Neural Networks: Monk AI
No ratings yet
Programming Test: Learning Activations in Neural Networks: Monk AI
2 pages
Week 8-Association Rules Part 1
No ratings yet
Week 8-Association Rules Part 1
31 pages
Application of Business Analytics in HR
No ratings yet
Application of Business Analytics in HR
3 pages
Data Mining: Concepts and Techniques: - Slides For Textbook - Chapter 6
No ratings yet
Data Mining: Concepts and Techniques: - Slides For Textbook - Chapter 6
82 pages
Data Wrangling
No ratings yet
Data Wrangling
13 pages
Smart Disease Prediction Using Machine Learning
No ratings yet
Smart Disease Prediction Using Machine Learning
5 pages
Essentials of Business Analytics 1st Edition Camm Solutions Manual Download
100% (21)
Essentials of Business Analytics 1st Edition Camm Solutions Manual Download
35 pages
Chapter13 Slides
No ratings yet
Chapter13 Slides
24 pages
Resume Updated
100% (3)
Resume Updated
2 pages
Consumer Reports Buying Guide 2021
100% (1)
Consumer Reports Buying Guide 2021
227 pages
(PDF) Introduction To Selling Value - Course-Final
No ratings yet
(PDF) Introduction To Selling Value - Course-Final
75 pages
Political Analysis
No ratings yet
Political Analysis
11 pages
A Collection of Fraud Schemes
67% (3)
A Collection of Fraud Schemes
54 pages
GRE Text Completion and Sentence Equivalence Practice Questions
100% (2)
GRE Text Completion and Sentence Equivalence Practice Questions
32 pages
TED Talks List
100% (2)
TED Talks List
15 pages
Outdoor Living Skills (PDFDrive) PDF
No ratings yet
Outdoor Living Skills (PDFDrive) PDF
157 pages
ATS Resume Template PDF
No ratings yet
ATS Resume Template PDF
1 page
Online Casino Software For Sale and Casino Software Solutions
No ratings yet
Online Casino Software For Sale and Casino Software Solutions
2 pages
Globalization Strategy Playbook: Document Revision History
100% (2)
Globalization Strategy Playbook: Document Revision History
93 pages
Focus Investing PDF
No ratings yet
Focus Investing PDF
18 pages
Data Analytics Concepts Techniques and A PDF
100% (11)
Data Analytics Concepts Techniques and A PDF
451 pages
QuickBooks Online Core Certification Self Study Workbook V21.2.2
100% (1)
QuickBooks Online Core Certification Self Study Workbook V21.2.2
55 pages
SAP GTS Case Study - Citrix - Systems
100% (1)
SAP GTS Case Study - Citrix - Systems
2 pages
Guidance On Good Data and Record Management Practices
No ratings yet
Guidance On Good Data and Record Management Practices
44 pages
2015 Book IntroductionToNursingInformati
100% (1)
2015 Book IntroductionToNursingInformati
456 pages
Cyber Resilience Blueprint
No ratings yet
Cyber Resilience Blueprint
12 pages
The Chemical Engineer - Issue 983 - May 2023
No ratings yet
The Chemical Engineer - Issue 983 - May 2023
68 pages
Whitepaper - Third-Party Risk Management Services
No ratings yet
Whitepaper - Third-Party Risk Management Services
24 pages
NIST 2 Framework
100% (1)
NIST 2 Framework
32 pages
Microsoft AppSource Partner Listing Guidelines PDF
No ratings yet
Microsoft AppSource Partner Listing Guidelines PDF
10 pages
Introduction To Clustering Procedures: Sas/Stat User's Guide
No ratings yet
Introduction To Clustering Procedures: Sas/Stat User's Guide
48 pages
East West Airlines Output
No ratings yet
East West Airlines Output
33 pages
Machine Learning 3
No ratings yet
Machine Learning 3
65 pages
XL Miner User Guide
No ratings yet
XL Miner User Guide
420 pages
Lab Assignment 3 Ai
No ratings yet
Lab Assignment 3 Ai
1 page
Comprehensive Review of K-Means Clustering Algorithms
No ratings yet
Comprehensive Review of K-Means Clustering Algorithms
5 pages
An Introduction To Data Mining IIT Bombay
No ratings yet
An Introduction To Data Mining IIT Bombay
48 pages
Lecture+Notes+ +clustering
No ratings yet
Lecture+Notes+ +clustering
13 pages
Cluster 3.0 Manual: Michael Eisen Updated by Michiel de Hoon
No ratings yet
Cluster 3.0 Manual: Michael Eisen Updated by Michiel de Hoon
32 pages
Consumer Segmentation of The Affordable Luxury Apparel Market in India
No ratings yet
Consumer Segmentation of The Affordable Luxury Apparel Market in India
7 pages
Papenbrock 2011, Asset Clustering
No ratings yet
Papenbrock 2011, Asset Clustering
102 pages
Unit IV Recommender System
No ratings yet
Unit IV Recommender System
5 pages
Chp10 Cluster Analysis Basic Concepts and Methods
No ratings yet
Chp10 Cluster Analysis Basic Concepts and Methods
24 pages
Clustering A Data Recovery Approach Second Edition Boris Mirkin (Author) download
100% (7)
Clustering A Data Recovery Approach Second Edition Boris Mirkin (Author) download
53 pages
Clustering-Part1
No ratings yet
Clustering-Part1
79 pages
Diversity and Community Structure of Plants in Selected Areas Within Lake Mainit Watershed
No ratings yet
Diversity and Community Structure of Plants in Selected Areas Within Lake Mainit Watershed
19 pages
Ambo University Inistitute of Technology Department of Computer Science
No ratings yet
Ambo University Inistitute of Technology Department of Computer Science
13 pages
Birchwood MDS Brochure (2) - Min
No ratings yet
Birchwood MDS Brochure (2) - Min
18 pages
Data Mining: Concepts and Techniques: Cluster Analysis
No ratings yet
Data Mining: Concepts and Techniques: Cluster Analysis
97 pages
Survey of Clustering Data Mining Techniques: Pavel Berkhin
100% (1)
Survey of Clustering Data Mining Techniques: Pavel Berkhin
56 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
83 pages
DWM Musa
No ratings yet
DWM Musa
4 pages
Algorithem Cheat Sheet
No ratings yet
Algorithem Cheat Sheet
25 pages
Computer Vision Lecture Notes All
No ratings yet
Computer Vision Lecture Notes All
18 pages
Lecture Notes For Chapter 8: by Tan, Steinbach, Kumar
No ratings yet
Lecture Notes For Chapter 8: by Tan, Steinbach, Kumar
48 pages
Data Mining and Data Warehousing 2023
No ratings yet
Data Mining and Data Warehousing 2023
2 pages
Speaker Diarization WJ
No ratings yet
Speaker Diarization WJ
16 pages
9248-Article Text-33828-1-10-20111216 PDF
No ratings yet
9248-Article Text-33828-1-10-20111216 PDF
8 pages
UNIT5
No ratings yet
UNIT5
60 pages

Data Mining Cluster

Uploaded by

Data Mining Cluster

Uploaded by

BUS 443: Business Analytics

Data Mining Case

3. Do the same for the east cluster, using three sub-clusters.

You might also like