0% found this document useful (0 votes)
14 views5 pages

Lesson 3 4

This document discusses the importance of sample size in research, emphasizing its impact on statistical analysis, confidence intervals, and confidence levels. It also explains measures of central tendency, including mean, median, and mode, detailing their advantages and limitations in representing data. Additionally, it describes how the shape of a distribution affects these measures, particularly in symmetrical and skewed distributions.

Uploaded by

Joelyn Capa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views5 pages

Lesson 3 4

This document discusses the importance of sample size in research, emphasizing its impact on statistical analysis, confidence intervals, and confidence levels. It also explains measures of central tendency, including mean, median, and mode, detailing their advantages and limitations in representing data. Additionally, it describes how the shape of a distribution affects these measures, particularly in symmetrical and skewed distributions.

Uploaded by

Joelyn Capa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Lesson 3: Sample size, Confidence Interval and Confidence Level

Sample size - is a research term used for defining the number of individuals included in a research study to
represent a population.
Determining the appropriate sample size is one of the most important factors in statistical analysis. If the sample
size is too small, it will not yield valid results or adequately represent the realities of the population being
studied. On the other hand, while larger sample sizes yield smaller margins of error and are more representative,
a sample size that is too large may significantly increase the cost and time taken to conduct the research.
When selecting a sample there are multiple factors that can impact the reliability and validity of results. When
thinking about sample size, the two measures of error that are almost always synonymous with sample sizes
are the confidence interval and the confidence level.

Confidence Interval (Margin of Error)


Confidence intervals measure the degree of uncertainty or certainty in a sampling method and how much
uncertainty there is with any particular statistic. In simple terms, the confidence interval tells you how confident
you can be that the results from a study reflect what you would expect to find if it were possible to survey the
entire population being studied. The confidence interval is usually a plus or minus (±) figure. For example, if your
confidence interval is 6 and 60% percent of your sample picks an answer, you can be confident that if you had
asked the entire population, between 54% (60-6) and 66% (60+6) would have picked that answer.

Confidence Level
The confidence level refers to the percentage of probability, or certainty that the confidence interval
would contain the true population parameter when you draw a random sample many times. It is expressed as a
percentage and represents how often the percentage of the population who would pick an answer lies within
the confidence interval. For example, a 99% confidence level means that should you repeat an experiment or
survey over and over again, 99 percent of the time, your results will match the results you get from a population.
The larger your sample size, the more confident you can be that their answers truly reflect the
population. In other words, the larger your sample for a given confidence level, the smaller your confidence
interval.
Lesson 4: Measures of Central Tendency
A measure of central tendency is a single value that attempts to describe a set of data by identifying the
central position within that set of data.

MEAN
- The mean (or average) is the most popular and well known measure of central tendency. It can be used
with both discrete and continuous data, although its use is most often with continuous data.

- The mean is the sum of the value of each observation in a dataset divided by the number of
observations. This is also known as the arithmetic average.
Looking at the retirement age distribution again:
54, 54, 54, 55, 56, 57, 57, 58, 58, 60, 60
The mean is calculated by adding together all the values (54+54+54+55+56+57+57+58+58+60+60 = 623) and
dividing by the number of observations (11) which equals 56.6 years.

Advantage of the mean


- The mean can be used for both continuous and discrete numeric data.

Limitations of the mean


- The mean cannot be calculated for categorical data, as the values cannot be summed.
The mean has one main disadvantage: it is particularly susceptible to the influence of outliers. These are values
that are unusual compared to the rest of the data set by being especially small or large in numerical value. For
example, consider the wages of staff at a factory below:

Staff 1 2 3 4 5 6 7 8 9 10

Salary 15k 18k 16k 14k 15k 15k 12k 17k 90k 95k

The mean salary for these ten staff is $30.7k. However, inspecting the raw data suggests that this mean value
might not be the best way to accurately reflect the typical salary of a worker, as most workers have salaries in
the $12k to 18k range. The mean is being skewed by the two large salaries. Therefore, in this situation, we would
like to have a better measure of central tendency, taking the median would be a better measure of central
tendency in this situation.

Median
The median is the middle score for a set of data that has been arranged in order of magnitude. The median is
less affected by outliers and skewed data. In order to calculate the median, suppose we have the data below:

65 55 89 56 35 14 56 55 87 45 92
We first need to rearrange that data into order of magnitude (smallest first):

14 35 45 55 55 56 56 65 87 89 92

Our median mark is the middle mark - in this case, 56 (highlighted in bold). It is the middle mark because there
are 5 scores before it and 5 scores after it. This works fine when you have an odd number of scores, but what
happens when you have an even number of scores? What if you had only 10 scores? Well, you simply have to
take the middle two scores and average the result. So, if we look at the example below:

65 55 89 56 35 14 56 55 87 45

We again rearrange that data into order of magnitude (smallest first):

14 35 45 55 55 56 56 65 87 89

Only now we have to take the 5th and 6th score in our data set and average them to get a median of 55.5.

Advantage of the median


The median is less affected by outliers and skewed data than the mean and is usually the preferred measure of
central tendency when the distribution is not symmetrical.
Limitation of the median
The median cannot be identified for categorical nominal data, as it cannot be logically ordered.

Mode
The mode is the most commonly occurring value in a distribution.
Consider this dataset showing the retirement age of 11 people, in whole years:
54, 54, 54, 55, 56, 57, 57, 58, 58, 60, 60

This table shows a simple frequency distribution of the retirement age data.

Advantage of the mode


The mode has an advantage over the median and the mean as it can be found for both numerical and categorical
(non-numerical) data.
Limitations of the mode
The are some limitations to using the mode. In some distributions, the mode may not reflect the centre of the
distribution very well. When the distribution of retirement age is ordered from lowest to highest value, it is easy
to see that the centre of the distribution is 57 years, but the mode is lower, at 54 years.
54, 54, 54, 55, 56, 57, 57, 58, 58, 60, 60
It is also possible for there to be more than one mode for the same distribution of data, (bi-modal, or multi-
modal). The presence of more than one mode can limit the ability of the mode in describing the centre or typical
value of the distribution because a single value to describe the centre cannot be identified.
In some cases, particularly where the data are continuous, the distribution may have no mode at all (i.e. if all
values are different).

Impact of shape of distribution on measures of central tendency


Symmetrical distributions
When a distribution is symmetrical, the mode, median and mean are all in the middle of the distribution. The
following graph shows a larger retirement age dataset with a distribution which is symmetrical. The mode,
median and mean all equal 58 years.

Skewed distributions
When a distribution is skewed the mode remains the most commonly occurring value, the median remains the
middle value in the distribution, but the mean is generally ‘pulled’ in the direction of the tails. In a skewed
distribution, the median is often a preferred measure of central tendency, as the mean is not usually in the
middle of the distribution.
A distribution is said to be positively or right skewed when the tail on the right side of the distribution is longer
than the left side. In a positively skewed distribution it is common for the mean to be ‘pulled’ toward the right
tail of the distribution. Although there are exceptions to this rule, generally, most of the values, including the
median value, tend to be less than the mean value.
The following graph shows a larger retirement age data set with a distribution which is right skewed. The data
has been grouped into classes, as the variable being measured (retirement age) is continuous. The mode is 54
years, the modal class is 54-56 years, the median is 56 years, and the mean is 57.2 years.
Retirement age: Positive (right) skew

A distribution is said to be negatively or left skewed when the tail on the left side of the distribution is longer
than the right side. In a negatively skewed distribution, it is common for the mean to be ‘pulled’ toward the left
tail of the distribution. Although there are exceptions to this rule, generally, most of the values, including the
median value, tend to be greater than the mean value.
The following graph shows a larger retirement age dataset with a distribution which left skewed. The mode is
65 years, the modal class is 63-65 years, the median is 63 years and the mean is 61.8 years.

You might also like