0% found this document useful (0 votes)
171 views9 pages

Sas#23-Acc 117

The document provides instructions for performing cluster analysis in Microsoft Excel, including identifying starting points, calculating the distance between data points and cluster centers, assigning cases to initial clusters, calculating the mean of each cluster, and repeating the distance calculation using the cluster means as the new centers. The goal is to group similar data points into clusters in a way that minimizes the distance between points and their assigned cluster center.

Uploaded by

crpa.lina.coc
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
171 views9 pages

Sas#23-Acc 117

The document provides instructions for performing cluster analysis in Microsoft Excel, including identifying starting points, calculating the distance between data points and cluster centers, assigning cases to initial clusters, calculating the mean of each cluster, and repeating the distance calculation using the cluster means as the new centers. The goal is to group similar data points into clusters in a way that minimizes the distance between points and their assigned cluster center.

Uploaded by

crpa.lina.coc
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

ACC 117: Statistical Analysis with Software Application

Students Activity Sheet #23

Name: ____________________________________________________________ Class number: _______


Section: ____________ Schedule:______________________________________ Date: ______________

Lesson Title: Microsoft Excel Application: Cluster Analysis and Materials:


Cell References FLM Student Activity Sheets
Lesson Objectives:
1. 1. Identify the cluster analysis in Microsoft Excel References:
2. 2. Identify and describe the cell references in MS Excel https://www.clusteranalysis4market
3. ing.com/technical-aspects-cluster-
analysis/how-to-run-cluster-
analysis-in-excel/cell reference

Write Everything Down. An easy memory trick is to write everything


down in class. Our brain tends to remember the things we write down
much more than the things we hear. Taking notes, makes the words
more visual and helps them store in your long-term memory.

A. LESSON PREVIEW/REVIEW
INTRODUCTION (2 minutes)
Are you expert in using MS Excel? I know you already know on basic menus and functions but
for the next lesson you will learn more of its cluster analysis and cell references in MS Excel.
Get ready and learn from it!

Activity 1: What I Know Chart Part 1 (3 minutes)


What do you know about the Microsoft Excel Application? Try answering the questions below by
writing your ideas under the What I Know column. You may use key words or phrases that you
think are related to the questions.

What I Know Questions: What I Learned (Activity 4)


What is cluster analysis in MS
Excel?

What is a cell reference in MS


Excel?

B. MAIN LESSON
Activity 2: Content Notes (13 minutes)
Below are the notes about Microsoft Excel Application. You may underline or highlight words or
phrases that you think is the main focus of the lesson.

1
This document is the property of PHINMA EDUCATION
ACC 117: Statistical Analysis with Software Application
Students Activity Sheet #23

Name: ____________________________________________________________ Class number: _______


Section: ____________ Schedule:______________________________________ Date: ______________

CLUSTER ANALYSIS IN MS EXCEL


This is a step by step guide on how to run k-means cluster
analysis on an Excel spread sheet from start to finish. Please
note that there is an Excel template that automatically runs
cluster analysis available for free download on this website. But
if you want to know how to run a k-means clustering on Excel
yourself, then this article is for you.

Step One – Start with your data set


For this example I am using 15 cases (or respondents), where
we have the data for three variables – generically labeled X, Y
and Z.
You should notice that the data is scaled 1-5 in this example.
Your data can be in any form except for a nominal data scale
(please see article of what data to use).
Step Two – If just two variables, use a scatter graph on Excel
In this cluster analysis example we are using three variables – but if you have just two variables to
cluster, then a scatter chart is an excellent way to start. And, at times, you can cluster the data via
visual means.
As you can see in this scatter graph, each individual case (what I‟m calling a consumer for this
example) has been mapped, along with the average
(mean) for all cases (the red circle).
Depending upon how you view the data/graph – there
appears to be a number of clusters. In this case, you
could identify three or four relatively distinct clusters – as
shown in this next chart.

With this next graph, I have visibly identified probable


cluster and circled them. As I have suggested, a good
approach when there are only two variables to
consider – but is this case we have three variables
(and you could have more), so this visual approach

2
This document is the property of PHINMA EDUCATION
ACC 117: Statistical Analysis with Software Application
Students Activity Sheet #23

Name: ____________________________________________________________ Class number: _______


Section: ____________ Schedule:______________________________________ Date: ______________

will only work for basic data sets – so now let‟s look at how to do the Excel calculation for k-means
clustering.
Step Three – Calculate the distance from each data point to the center of a cluster
For this walk-through example, let‟s assume that we
want to identify three segments/clusters only. Yes,
there are four clusters evident in the diagram above,
but that only looks at two of the variables. Please note
that you can use this Excel approach to identify as
many clusters as you like – just follow the same
concept as explained below.
For k-means clustering you typically pick some random
cases (starting points or seeds) to get the analysis
started.
In this example – as I want to create three clusters,
then I will need three starting points. For these start
points I have selected cases 6, 9 and 15 – but any
random points could also be suitable.
The reason I selected these cases is because – when
looking at variable X only – case 6 was the median,
case 9 was the maximum and case 15 was the minimum. This suggests that these three cases are
somewhat different to each other, so good starting points as they are spread out.
How does the calculation work?
Let‟s look at the first number in the table – case 1,
start 1 = 10.54.
Remember that we have arbitrarily designated
Case 6 to be our random start point for Cluster 1.
We want to calculate the distance and we use the
sum of squares method – as shown here. We
calculate the difference between each of the three data points in the set, and then square the
differences, and then sum them.We can do it “mechanically” as shown here – but Excel has a built-in
formula to use: SUMXMY2 – this is far more efficient to
use.
Referring back to Figure 4, we then find the minimum
distance for each case from each of the three start
points – this tells us which cluster (1, 2 or 3) that the
case is closest to – which is shown in the „initial choice
column‟.
Step Four – Calculate the mean (average) of each
cluster set

3
This document is the property of PHINMA EDUCATION
ACC 117: Statistical Analysis with Software Application
Students Activity Sheet #23

Name: ____________________________________________________________ Class number: _______


Section: ____________ Schedule:______________________________________ Date: ______________

We have now allocated each case to its initial cluster – and we can lay that out using an IF statement in
a table (as shown in Figure 6).
At the bottom of the table, we have the mean (average) of each of these cases. N0w – instead of
relying on just one “representative” data point – we have a set of cases representing each.

Step Five – Repeat Step 3 – the Distance from the


revised mean
The cluster analysis process now becomes a matter of
repeating Steps 4 and 5 (iterations) until the clusters
stabilize.
Each time we use the revised mean for each cluster.
Therefore, Figure 7 shows our second iteration – but
this time we are using the means generated at the
bottom of Figure 6 (instead of the start points from
Figure 1).
You can now see that there has been a slight change
in cluster application, with case 9 – one of our starting
points – being reallocated.

Final Step – Graph and Summarize the Clusters


After running multiple iterations, we now have the
output to graph and summarize the data.
Here is the output graph for this cluster analysis Excel
example.
As you can see, there are three distinct clusters
shown, along with the centroids (average) of each
cluster – the larger symbols.
We can also present this data in a table form if
required, as we have worked it out in Excel.
Please have a look at the case in Cluster 3 – the small
red square right next to the black dot in the top middle
of the graph. That case sits there because of the
influence of the third variable, which is not shown on
this two variable chart.

CELL REFERENCES IN MS EXCEL


Cell references in Excel are very important. Understand the difference between relative, absolute and
mixed reference, and you are on your way to success.

4
This document is the property of PHINMA EDUCATION
ACC 117: Statistical Analysis with Software Application
Students Activity Sheet #23

Name: ____________________________________________________________ Class number: _______


Section: ____________ Schedule:______________________________________ Date: ______________

 Relative Reference
By default, Excel uses relative
references. See the formula in
cell D2 below. Cell D2
references (points to) cell B2
and cell C2. Both references
are relative.
1. Select cell D2, click on the lower right corner of cell D2 and drag it down to cell D5.

Cell D3 references cell B3 and cell C3. Cell D4 references cell B4 and cell C4. Cell D5 references cell
B5 and cell C5. In other words: each cell references its two neighbors on the left.
 Absolute Reference
See the formula in cell E3 below.
1. To create an absolute reference to cell H3, place a $ symbol in front of the column letter and row
number ($H$3) in the formula of cell E3.

2. Now we can quickly drag this formula to the other cells.

5
This document is the property of PHINMA EDUCATION
ACC 117: Statistical Analysis with Software Application
Students Activity Sheet #23

Name: ____________________________________________________________ Class number: _______


Section: ____________ Schedule:______________________________________ Date: ______________

The reference to cell H3 is fixed (when we drag the formula down and across). As a result, the correct
lengths and widths in inches are calculated.
 Mixed Reference
Sometimes we need a combination of relative and absolute reference (mixed reference).
1. See the formula in cell F2 below.

2. We want to copy this formula to the other cells quickly. Drag cell F2 across one cell, and look at the
formula in cell G2.

Do you see what happens? The reference to the price should be a fixed reference to column B.
Solution: place a $ symbol in front of the column letter ($B2) in the formula of cell F2. In a similar way,
when we drag cell F2 down, the reference to the reduction should be a fixed reference to row 6.
Solution: place a $ symbol in front of the row number (B$6) in the formula of cell F2.

6
This document is the property of PHINMA EDUCATION
ACC 117: Statistical Analysis with Software Application
Students Activity Sheet #23

Name: ____________________________________________________________ Class number: _______


Section: ____________ Schedule:______________________________________ Date: ______________

Result:

Note: we don't place a $ symbol in front of the row number of $B2 (this way we allow the reference to
change from $B2 (Jeans) to $B3 (Shirts) when we drag the formula down). In a similar way, we don't
place a $ symbol in front of the column letter of B$6 (this way we allow the reference to change from
B$6 (Jan) to C$6 (Feb) and D$6 (Mar) when we drag the formula across).
3. Now we can quickly drag this formula to the other cells.

The references to column B and row 6 are fixed.

Activity 3: Skill Building Activities (20 minutes)


Directions: Have you study the content notes? That‟s great! Now, do some activity by giving an
example using or following the steps in different types of Cell References

TYPE OF EXAMPLES/STEPS
REFERENCES
1. Relative
Reference

7
This document is the property of PHINMA EDUCATION
ACC 117: Statistical Analysis with Software Application
Students Activity Sheet #23

Name: ____________________________________________________________ Class number: _______


Section: ____________ Schedule:______________________________________ Date: ______________

2. Absolute
Reference

3. Mixed
Reference

Activity 4: What I Know Chart Part 2 (2 minutes)


Now let‟s check your understanding about the lesson for today. I hope that everything about the
topic is clear to you. This time you have to fill out the What I Learned column in Activity 1 Part 1.

Activity 5: Check for Understanding (5 minutes)


Directions: You already know the lesson for today‟s session. This time, let us check your
understanding by enumerating the steps in Cell Analysis

8
This document is the property of PHINMA EDUCATION
ACC 117: Statistical Analysis with Software Application
Students Activity Sheet #23

Name: ____________________________________________________________ Class number: _______


Section: ____________ Schedule:______________________________________ Date: ______________

C. LESSON WRAP-UP
Activity 6: Thinking about Learning (5 minutes)
A. Work Tracker
You are done with this session! Let‟s track your progress. Shade the session number you just
completed.

B. Think About Learning


1. Please read again the learning targets for the day. Were you able to achieve those learning
targets? If yes, what helped you achieve them? If no, what is the reason for not achieving
them?
___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________

2. What question(s) do you have as we end this lesson?


___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________

FAQs

1. Any Tips on how to avoid error in MS Excel?

Avoid errors by using named ranges


Excel allows you to give names to cells and cell ranges. Just select the cell or range and type into the
small text box on the top left corner of the screen (next to the formula bar). In your formulas,
you can use the name instead of the reference.

Job well done! You’ve finished today’s activity.

9
This document is the property of PHINMA EDUCATION

You might also like