Sas#23-Acc 117
Sas#23-Acc 117
A. LESSON PREVIEW/REVIEW
INTRODUCTION (2 minutes)
Are you expert in using MS Excel? I know you already know on basic menus and functions but
for the next lesson you will learn more of its cluster analysis and cell references in MS Excel.
Get ready and learn from it!
B. MAIN LESSON
Activity 2: Content Notes (13 minutes)
Below are the notes about Microsoft Excel Application. You may underline or highlight words or
phrases that you think is the main focus of the lesson.
1
This document is the property of PHINMA EDUCATION
ACC 117: Statistical Analysis with Software Application
Students Activity Sheet #23
2
This document is the property of PHINMA EDUCATION
ACC 117: Statistical Analysis with Software Application
Students Activity Sheet #23
will only work for basic data sets – so now let‟s look at how to do the Excel calculation for k-means
clustering.
Step Three – Calculate the distance from each data point to the center of a cluster
For this walk-through example, let‟s assume that we
want to identify three segments/clusters only. Yes,
there are four clusters evident in the diagram above,
but that only looks at two of the variables. Please note
that you can use this Excel approach to identify as
many clusters as you like – just follow the same
concept as explained below.
For k-means clustering you typically pick some random
cases (starting points or seeds) to get the analysis
started.
In this example – as I want to create three clusters,
then I will need three starting points. For these start
points I have selected cases 6, 9 and 15 – but any
random points could also be suitable.
The reason I selected these cases is because – when
looking at variable X only – case 6 was the median,
case 9 was the maximum and case 15 was the minimum. This suggests that these three cases are
somewhat different to each other, so good starting points as they are spread out.
How does the calculation work?
Let‟s look at the first number in the table – case 1,
start 1 = 10.54.
Remember that we have arbitrarily designated
Case 6 to be our random start point for Cluster 1.
We want to calculate the distance and we use the
sum of squares method – as shown here. We
calculate the difference between each of the three data points in the set, and then square the
differences, and then sum them.We can do it “mechanically” as shown here – but Excel has a built-in
formula to use: SUMXMY2 – this is far more efficient to
use.
Referring back to Figure 4, we then find the minimum
distance for each case from each of the three start
points – this tells us which cluster (1, 2 or 3) that the
case is closest to – which is shown in the „initial choice
column‟.
Step Four – Calculate the mean (average) of each
cluster set
3
This document is the property of PHINMA EDUCATION
ACC 117: Statistical Analysis with Software Application
Students Activity Sheet #23
We have now allocated each case to its initial cluster – and we can lay that out using an IF statement in
a table (as shown in Figure 6).
At the bottom of the table, we have the mean (average) of each of these cases. N0w – instead of
relying on just one “representative” data point – we have a set of cases representing each.
4
This document is the property of PHINMA EDUCATION
ACC 117: Statistical Analysis with Software Application
Students Activity Sheet #23
Relative Reference
By default, Excel uses relative
references. See the formula in
cell D2 below. Cell D2
references (points to) cell B2
and cell C2. Both references
are relative.
1. Select cell D2, click on the lower right corner of cell D2 and drag it down to cell D5.
Cell D3 references cell B3 and cell C3. Cell D4 references cell B4 and cell C4. Cell D5 references cell
B5 and cell C5. In other words: each cell references its two neighbors on the left.
Absolute Reference
See the formula in cell E3 below.
1. To create an absolute reference to cell H3, place a $ symbol in front of the column letter and row
number ($H$3) in the formula of cell E3.
5
This document is the property of PHINMA EDUCATION
ACC 117: Statistical Analysis with Software Application
Students Activity Sheet #23
The reference to cell H3 is fixed (when we drag the formula down and across). As a result, the correct
lengths and widths in inches are calculated.
Mixed Reference
Sometimes we need a combination of relative and absolute reference (mixed reference).
1. See the formula in cell F2 below.
2. We want to copy this formula to the other cells quickly. Drag cell F2 across one cell, and look at the
formula in cell G2.
Do you see what happens? The reference to the price should be a fixed reference to column B.
Solution: place a $ symbol in front of the column letter ($B2) in the formula of cell F2. In a similar way,
when we drag cell F2 down, the reference to the reduction should be a fixed reference to row 6.
Solution: place a $ symbol in front of the row number (B$6) in the formula of cell F2.
6
This document is the property of PHINMA EDUCATION
ACC 117: Statistical Analysis with Software Application
Students Activity Sheet #23
Result:
Note: we don't place a $ symbol in front of the row number of $B2 (this way we allow the reference to
change from $B2 (Jeans) to $B3 (Shirts) when we drag the formula down). In a similar way, we don't
place a $ symbol in front of the column letter of B$6 (this way we allow the reference to change from
B$6 (Jan) to C$6 (Feb) and D$6 (Mar) when we drag the formula across).
3. Now we can quickly drag this formula to the other cells.
TYPE OF EXAMPLES/STEPS
REFERENCES
1. Relative
Reference
7
This document is the property of PHINMA EDUCATION
ACC 117: Statistical Analysis with Software Application
Students Activity Sheet #23
2. Absolute
Reference
3. Mixed
Reference
8
This document is the property of PHINMA EDUCATION
ACC 117: Statistical Analysis with Software Application
Students Activity Sheet #23
C. LESSON WRAP-UP
Activity 6: Thinking about Learning (5 minutes)
A. Work Tracker
You are done with this session! Let‟s track your progress. Shade the session number you just
completed.
FAQs
9
This document is the property of PHINMA EDUCATION