0% found this document useful (0 votes)
14 views11 pages

Lesson 6 Advanced Statistics

The document discusses measures of central tendency, including mean, median, and mode, explaining their definitions, properties, and appropriate usage in different data distributions. It also covers measures of variability, such as range and variance, detailing how to calculate them and their significance in understanding data dispersion. Additionally, the document provides instructions on using Microsoft Excel's Data Analysis ToolPak for statistical calculations.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views11 pages

Lesson 6 Advanced Statistics

The document discusses measures of central tendency, including mean, median, and mode, explaining their definitions, properties, and appropriate usage in different data distributions. It also covers measures of variability, such as range and variance, detailing how to calculate them and their significance in understanding data dispersion. Additionally, the document provides instructions on using Microsoft Excel's Data Analysis ToolPak for statistical calculations.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

UNIVERSITY OF CAGAYAN VALLEY

(Formerly Cagayan Colleges Tuguegarao)


Tuguegarao City, Cagayan, Philippines
SCHOOL OF LIBERAL ARTS AND TEACHER EDUCATION
MEASURES OF CENTRAL TENDENCY
WEIGHTED MEAN
MEASURES OF VARIABILITY OR DISPERSION

DISCUSSION:

A measure of central tendency is a value used to represent the typical or “average” value in a
data set.

Three Common Measures of Central Tendency

 Mean – the sum of all data values divided by the number of values in the data set. The
mean of a sample data set is denoted by x and the mean of a population data set by the
Greek letter  .

x
x  x
n N
Exercise: Find the mean of the following data set:

Quiz Scores: 1, 5, 7, 7, 6, 8, 10, 9, 5, 10, 8

 Median – the value which separates the largest 50% of data values from the lowest 50%.
To calculate the median, place data values in number order. If n is odd, the middle value
is the median. If n is even, the mean of the two middle values is the median.

Exercise: Find the median value for the set of quiz scores.
Find the median if the low score of 1 is dropped.

 Mode – the data value (or values) which appears the largest number of times in the set.
If no data value is repeated, we say that there is no mode.

Exercise: Find the mode(s) of the quiz score data set.

Properties of Mean, Median, and Mode

 Mean is the most commonly used measure of central tendency.

 One drawback of the mean is that it is heavily influenced by a few very high or very low
data values. In these cases it is more common to use the median.

Example: Median household income in the U.S.

 The mode has the advantage that it can be used to measure data sets even if they contain
only qualitative data. A disadvantage is that a data set may not have a mode.

Example: Modal college major.

Weighted Means

A weighted mean is used when we want some data values in a set to factor more often into the
calculation of the mean than others.
UNIVERSITY OF CAGAYAN VALLEY
(Formerly Cagayan Colleges Tuguegarao)
Tuguegarao City, Cagayan, Philippines
SCHOOL OF LIBERAL ARTS AND TEACHER EDUCATION
In this case, we attach a numerical weight to each value and calculate the mean as follows:

x
 ( x  w)
w
Note: This is equivalent to counting each data value the number of times given by its weight.

Examples:

 Grade point average. We assign the letter grades the number values A=4, B=3, C=2,
D=1, F=0, and then each grade value is counted into the GPA according to the number of
credits earned with that grade.

 Course grade. The final grade in this course is calculated according to the following
scale: Homework counts for 15%, 3 exams count 20% each, and the final exam is worth
25%. We can weight the score for each component of the final grade with its percentage
to calculate the final grade

Summary

The Mean is used in computing other statistics (such as the variance) and does not exist for open
ended grouped frequency distributions (1). It is often not appropriate for skewed distributions
such as salary information.

The Median is the center number and is good for skewed distributions because it is resistant to
change.

The Mode is used to describe the most typical case. The mode can be used with nominal data
whereas the others can't. The mode may or may not exist and there may be more than one value
for the mode (2).

The Midrange is not used very often. It is a very rough estimate of the average and is greatly
affected by extreme values (even more so than the mean).

Property Mean Median Mode Midrange

Always Exists No (1) Yes No (2) Yes

Uses all data values Yes No No No

Affected by extreme values Yes No No Yes

Benefits of the different measures of central tendency

Now that we understand the three different ways to calculate the center of the
distribution, the natural question is which should be used to describe the center of a particular
distribution. For normal distributions, it turns out that the mean, median and mode are all the
same, so it does not really matter which you use. However, for a normal distribution, we usually
use the mean to represent the center of the distribution.
UNIVERSITY OF CAGAYAN VALLEY
(Formerly Cagayan Colleges Tuguegarao)
Tuguegarao City, Cagayan, Philippines
SCHOOL OF LIBERAL ARTS AND TEACHER EDUCATION
For distributions that are close to normal, the mean is still the appropriate value because it
will best represent the center of the distribution. Because most of the distributions in the social
sciences are relatively normal, the mean is by far the most common measure of the center of a
distribution.

For skewed distributions, however, the mean is not the best indicator of the center of the
distribution. Instead, for a skewed distribution, either the median or mode better represent the
center of the distribution. Typically, the mode is used to indicate the center of strongly skewed
distributions. The median is appropriate when the skew is less severe.

MEASURES OF VARIABILITY OR DISPERSION

Another type of information we need to effectively describe a distribution of scores is how


spread out (or disperse) the scores are within the data set. Obviously, if the scores in one data set
are much more spread out than another data set, even if they have the same mean, the
distributions will be very different. Statistics that indicate how spread out the scores are in a
distribution are called measures of dispersion. There are three primary measures of dispersion
that we will discuss.

Range

The range represents one method for describing the dispersion of the data. It is calculated by
subtracting the smallest value from the largest value in the data set. In general, the range
provides some useful information but tells us relatively little about the data except for the two
most extreme scores in the data set. Therefore, the range tells us only about the two most
extreme scores. We would prefer to consider all the values in the data set when determining how
spreads out the scores are. Therefore, the range is infrequently used to describe the dispersion of
data sets.

The range is the simplest measure of variation to find. It is simply the highest value minus the
lowest value.

RANGE = MAXIMUM - MINIMUM

Since the range only uses the largest and smallest values, it is greatly affected by extreme values,
that is - it is not resistant to change.

Variance

Much more commonly, the dispersion of a set of data is represented using the variance
because this descriptive statistic considers all the scores in the data set. The variance examines
how far, on average, each score is away from the mean.

The variance is symbolized 𝑠 2 . There are two methods for calculating the variance. The
first method uses the conceptual/definitional formula which clearly demonstrates the underlying
logic of calculating the variance.

With this formula, however, there are many more calculations and a dramatic increase in
the possibility that we make a simple calculation error and end up with an incorrect variance.
Because of this, we do not actually use the conceptual formula on data sets. This should help
clarify the conceptual process of identifying the variance. After this example, though, we will
UNIVERSITY OF CAGAYAN VALLEY
(Formerly Cagayan Colleges Tuguegarao)
Tuguegarao City, Cagayan, Philippines
SCHOOL OF LIBERAL ARTS AND TEACHER EDUCATION
only use the second formula. The second formula is the calculation formula and should be used
whenever the variance needs to be calculated for a set of data.

Instead, we will always use the calculation formula to compute the variance. This
formula requires fewer calculations, is faster, and is less influenced by rounding. While the
benefits of the conceptual formula are significant, they come with a price. With the calculation
formula it is not as clear why each step is necessary. Be assured, however, that both formulas
will produce exactly the same variance values. The calculation formula is shown in the following
formula:

In this formula, sigma x2 is the sum of the squared scores, (sigma x)2 is the squared sum
of the scores and N represents the total number of scores in the data set. There are four steps for
calculating the variance using the calculation formula.

Unbiased Estimate of the Population Variance

One would expect the sample variance to simply be the population variance with the
population mean replaced by the sample mean. However, one of the major uses of statistics is to
estimate the corresponding parameter. This formula has the problem that the estimated value isn't
the same as the parameter. To counteract this, the sum of the squares of the deviations is divided
by one less than the sample size.

Variance Example: Depression Level

Let's try an example of calculating the variance and standard deviation using the formula.
If you were interested in determining the depression level of the people you interact with each
day, you could have each of the eight people complete the Goldberg Depression Inventory
(GDI). This measure yields scores that range between 0 (not depressed) and +7 (extremely
depressed). Using the scores of our eight participants, we can calculate the variance and standard
deviation of our sample

Step 1: Organize the Data

First, we should organize the data into a table. In this step, we should also note that N = 8.

5
UNIVERSITY OF CAGAYAN VALLEY
(Formerly Cagayan Colleges Tuguegarao)
Tuguegarao City, Cagayan, Philippines
SCHOOL OF LIBERAL ARTS AND TEACHER EDUCATION
4

Step 2: Add the x2 Column


The next step is to calculate the x2 Column.

x x2

7 49

6 36

5 25

4 16

4 16

3 9

1 1

0 0

Step 3: Sum the sigma x and sigma x2 Columns

The next step is to sum the score (x) and squared score (x2) columns. The sum of the
scores is 30 and the sum of the squared scores is 154. These are shown at the bottom of the
following table:
x x2

7 49

6 36

5 25

4 16

4 16

3 9

1 1

0 0

sigma 30 154
UNIVERSITY OF CAGAYAN VALLEY
(Formerly Cagayan Colleges Tuguegarao)
Tuguegarao City, Cagayan, Philippines
SCHOOL OF LIBERAL ARTS AND TEACHER EDUCATION
Step 4: Calculate the Variance

The next step is to use the calculation formula to compute the variance. To do this, we
will need the three values we have calculated: N = 8, sigma x = 30, and sigma x2 = 154. The
resulting variance, 5.19, is shown in the following calculation:

This tells us that, on average, each score is 5.19 squared units away from the mean.
However, in most cases the squared units make the variance difficult to understand. Therefore,
we usually calculate the standard deviation which is a measure of dispersion expressed in the
original units of measurement.

The next step is to use the calculation formula to compute the variance. To do this, we
will need the three values we have calculated: N = 8, sigma x = 30, and sigma x2 = 154. The
resulting variance, 5.19, is shown in the following calculation:

Step 5: Calculate the Standard Deviation

The final step is to take the square root of the variance to calculate the standard deviation,
2.28, as shown in the calculation below. The primary reason for calculating the standard
deviation is that our measure of dispersion is now expressed in the same units of measurement as
the original data. If you think about it, the variance produces a squared value of our original
measurement units. Therefore, the standard deviation is commonly reported as the measure of
dispersion for a data set.
UNIVERSITY OF CAGAYAN VALLEY
(Formerly Cagayan Colleges Tuguegarao)
Tuguegarao City, Cagayan, Philippines
SCHOOL OF LIBERAL ARTS AND TEACHER EDUCATION
So, now we know that based on the GDI, the average person in our data set was 2.28
units away from the mean score of 3.75. This is a fairly spread out data set because the possible
range of scores is 0 to +7.

Interpreting the Standard Deviation

The standard deviation indicates the spread of scores away from the mean. If SD is large,
it means the scores have a wide scatter away from the mean. It indicates that there is a wide
variation of scores among the group, suggesting heterogeneity of group composition.

If the SD is small, it indicates that there is a narrow spread of scores from the mean. It
means there is a little scatter of scores from the mean, suggesting that group members are
homogenous, that is, they have almost similar abilities.

HOW TO USE MICROSOFT EXCEL’S DATA ANALYSIS TOOLPAK FOR


DESCRIPTIVE STATISTICS

Excel makes calculating statistics much easier today than ever before. It literally takes a few keys
strokes and clicks to get just about any type of statistical measurement or graph from a data set.
Excel is preloaded with statistical functions that can help you find the mean, median, mode,
variance and many more statistical measurements. Aside from Excel's functions, the program
also allows users the option to install a Data Analysis ToolPak Add-in that is used to perform
many types of calculations at once. This tutorial shows an excel user how to use the Data
Analysis tool to find descriptive statistics and explains the results.

Activate the Data Analysis Data ToolPak

If you have never used the Data Analysis ToolPak, it is probably inactive on your Excel
program. You can check to see if you have it by first clicking on the data tab. Next, look for the
analysis group on the far-right side of your screen. If data analysis option does not exist use the
following steps to activate this add-in.

1. Click on the File tab, followed by clicking on options. Next, click on “Add-Ins.”
2. Next, click on the “Go” button to the manage add-ins section.
3. Lastly, check the “Analysis Pak” box and click “OK.”
4. You should now be ready to use the Data Analysis ToolPak from the data tab in the
analysis group.

Data Analysis Example

If following along with this example with an excel worksheet type this data set into Excel
vertically in individual cells.

11, 14, 32, 12, 15, 23, 19, 16, 21, 28

Click on “Data Analysis” in the data tab and then click on Descriptive Statistics in the dialog
box. Click the OK button.
UNIVERSITY OF CAGAYAN VALLEY
(Formerly Cagayan Colleges Tuguegarao)
Tuguegarao City, Cagayan, Philippines
SCHOOL OF LIBERAL ARTS AND TEACHER EDUCATION

Next, the range of the data needs to be typed in the input range section of the dialog box. Choose
the output range option and choose a cell for the output to display by typing that cell location in
the blank field. Lastly, click in the Summary statistics checkbox and click OK to display the
results.
UNIVERSITY OF CAGAYAN VALLEY
(Formerly Cagayan Colleges Tuguegarao)
Tuguegarao City, Cagayan, Philippines
SCHOOL OF LIBERAL ARTS AND TEACHER EDUCATION

The Results
The results print in two columns. The first column represents a the descriptive statistic and
second column shows the results for those statistics. In the following sections I will describe
what these descriptive statistics represent.

WEIGHTED AVERAGE

To calculate a weighted average in Excel, simply use SUMPRODUCT and SUM.


1. First, the AVERAGE function below calculates the normal average of three scores.

Below you can find the corresponding weights of the scores.


UNIVERSITY OF CAGAYAN VALLEY
(Formerly Cagayan Colleges Tuguegarao)
Tuguegarao City, Cagayan, Philippines
SCHOOL OF LIBERAL ARTS AND TEACHER EDUCATION

The formula below calculates the weighted average of these scores.

20 + 40 + 40 + 90 + 90 + 90
Weighted Average =
6

370
= = 61.67
6
3. We can use the SUMPRODUCT function in Excel to calculate the number above the fraction line (370).

Note: the SUMPRODUCT function performs this calculation: (20 * 1) + (40 * 2) + (90 * 3) = 370.

4. We can use the SUM function in Excel to calculate the number below the fraction line (6).

5. Use the functions at step 3 and step 4 to calculate the weighted average of these scores in Excel.
UNIVERSITY OF CAGAYAN VALLEY
(Formerly Cagayan Colleges Tuguegarao)
Tuguegarao City, Cagayan, Philippines
SCHOOL OF LIBERAL ARTS AND TEACHER EDUCATION

EXERCISE 5

1. 87, 23, 22, 35, 25, 12, 24, 55, 34, 62, 88, 80, 79, 60, 62
a) Find the mean, median and mode using excel using EXCEL.
b) Find the range, variance and standard deviation using EXCEL.

2. A student's grade in a psychology course is comprised of tests (40%), quizzes (20%),


assignments (20%), and a final project (20%). His scores for each of the categories are 85
(tests), 100 (quizzes), 92 (assignments) and 84 (final project). Calculate his overall grade.

3. Sarah has a supermarket and she earns a profit of P7,000 from his groceries, P12,000
from vegetables, P5,000 from dairy products and P3,000 fruits.

She wants to predict his profit for the next month. She assigns weights of 8 to groceries,
5 to vegetables, 8 to dairy products and 6 to fruits.

Calculate weighted average of her profits?

You might also like