0% found this document useful (0 votes)
505 views99 pages

Statistics

This document discusses various statistical concepts including: - Descriptive statistics which collects and describes data to yield meaningful information. - Statistical inference which analyzes a subset of data to make predictions about the entire dataset. - Different sampling methods like simple random sampling, stratified sampling and cluster sampling. - Concepts like population, sample, parameter, statistic. - Measures of central tendency like mean, median and mode. - Measures of variation like range, variance and standard deviation.

Uploaded by

Anthony Caputol
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
505 views99 pages

Statistics

This document discusses various statistical concepts including: - Descriptive statistics which collects and describes data to yield meaningful information. - Statistical inference which analyzes a subset of data to make predictions about the entire dataset. - Different sampling methods like simple random sampling, stratified sampling and cluster sampling. - Concepts like population, sample, parameter, statistic. - Measures of central tendency like mean, median and mode. - Measures of variation like range, variance and standard deviation.

Uploaded by

Anthony Caputol
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 99

Statistics

INTRODUCTION
DESCRIPTIVE STATISTICS
• comprises those methods concerned with collecting
and describing a set of data so as to yield meaningful
information.
STATISTICAL INFERENCE
• Comprises those methods concerned with analysis of
a subset of data leading to predictions or inferences
about the entire set of data
POPULATION
• Consists of totality of the observations with which we
are concerned
SAMPLE
• Subset of a population
SIMPLE RANDOM SAMPLE
• A simple random sample of n observations is a
sample that is chosen in such a way that every
subset of n observations of the population has
the same probability of being selected.
SYSTEMATIC SAMPLING
• Individuals are selected at regular intervals from
a list of the whole population. The intervals are
chosen to ensure an adequate sample size. For
example, every 10th member of the population
is included. This is often convenient and easy to
use, although it may also lead to bias.
STRATIFIED SAMPLING
• In this method, the population is first divided into sub-groups (or
strata) who all share a similar characteristic. It is used when we might
reasonably expect the measurement of interest to vary between the
different sub-groups. Gender or smoking habits would be examples of
strata. The study sample is then obtained by taking samples from
each stratum.
• In a stratified sample, the probability of an individual being included
varies according to known characteristics, such as gender, and the aim
is to ensure that all sub-groups of the population that might be of
relevance to the study are adequately represented.
CLUSTERED SAMPLING
• In a clustered sample, sub-groups of the population are used
as the sampling unit, rather than individuals. The population
is divided into sub-groups, known as clusters, and a selection
of these are randomly selected to be included in the study.
All members of the cluster are then included in the study.
Clustering should be taken into account in the analysis.
QUOTA SAMPLING
• This method of sampling is often used by market
researchers. Interviewers are given a quota of subjects of a
specified type to attempt to recruit. For example, an
interviewer might be told to go out and select 20 adult men
and 20 adult women, 10 teenage girls and 10 teenage boys
so that they could interview them about their television
viewing. There are several flaws with this method, but most
importantly it is not truly random.
CONVENIENCE SAMPLING
• Convenience sampling is perhaps the easiest method of
sampling, because participants are selected in the most
convenient way, and are often allowed to chose or volunteer
to take part. Good results can be obtained, but the data set
may be seriously biased, because those who volunteer to
take part may be different from those who choose not to.
SNOWBALL SAMPLING
• This method is commonly used in social sciences when
investigating hard to reach groups. Existing subjects are
asked to nominate further subjects known to them, so the
sample increases in size like a rolling snowball. For example,
when carrying out a survey of risk behaviors amongst
intravenous drug users, participants may be asked to
nominate other users to be interviewed.
STATISTICAL MEASURES OF DATA
PARAMETER
• Any numerical value describing a characteristic of a
population is called a parameter.
STATISTIC
• Any numerical value describing a characteristic of a
sample is called a statistic.
MEASURES OF CENTRAL LOCATION

•Measure of Central Tendency


•Mean
•Median
•Mode
POPULATION MEAN

𝑁
σ𝑖=1 𝑥𝑖
𝜇=
𝑁
Example
• The number of employees at 5 different drugstores
are 3, 5, 6, 4, and 6. Treating the data as a population,
find the mean number of employees for the 5 stores.

3+5+6+4+6
𝜇= = 4.8
5
SAMPLE MEAN

𝑛
σ𝑖=1 𝑥𝑖
𝑥ҧ =
𝑛
Example
• A food inspector examined a random sample 7 cans of
a certain brand of tuna to determine the percent of
foreign impurities. The following data were recorded:
1.8, 2.1, 1.7, 1.6, 0.9, 2.7, and 1.8. Compute the
sample mean.

1.8 + 2.1 + 1.7 + 1.6 + 0.9 + 2.7 + 1.8


𝑥ҧ = = 1.8%
7
MEDIAN
• The median of a set of observations arranged in
an increasing or decreasing order of magnitude
is the middle value when the number of
observations is odd or the arithmetic mean of
two middle values when the number of
observations is even.
• On 5 term tests in sociology a student has made
grades of 82, 93, 86, 92, and 79. Find the median for
this population of grades.

𝐴𝑟𝑟𝑎𝑛𝑔𝑖𝑛𝑔 𝑡ℎ𝑒 𝑔𝑟𝑎𝑑𝑒𝑠 𝑖𝑛 𝑎𝑛 𝑖𝑛𝑐𝑟𝑒𝑎𝑠𝑖𝑛𝑔 𝑜𝑟𝑑𝑒𝑟 𝑜𝑓 𝑚𝑎𝑔𝑛𝑖𝑡𝑢𝑑𝑒, 𝑤𝑒 𝑔𝑒𝑡


79 82 86 92 93
and hence 𝑚𝑒𝑑𝑖𝑎𝑛 = 86
Example
• The nicotine contents for a random sample of 6
cigarettes f a certain brand are found to be 2.3, 2.7,
2.5, 2.9, 3.1, and 1.9 milligrams. Find the median.

𝐼𝑓 𝑤𝑒 𝑎𝑟𝑟𝑎𝑛𝑔𝑒 𝑡ℎ𝑒𝑠𝑒 𝑛𝑖𝑐𝑜𝑡𝑖𝑛𝑒 𝑐𝑜𝑛𝑡𝑒𝑛𝑡𝑠 𝑖𝑛 𝑎𝑛 𝑖𝑛𝑐𝑟𝑒𝑎𝑠𝑖𝑛𝑔 𝑜𝑟𝑑𝑒𝑟 𝑜𝑓 𝑚𝑎𝑔𝑛𝑖𝑡𝑢𝑑𝑒, 𝑤𝑒 𝑔𝑒𝑡


1.9 2.3 2.5 2.7 2.9 3.1
and the median is then the mean of 2.5 and 2.7. Therefore,
2.5+2.7
𝑚𝑒𝑑𝑖𝑎𝑛 = = 2.6 𝑚𝑖𝑙𝑙𝑖𝑔𝑟𝑎𝑚𝑠.
2
MODE
• The mode of a set of observations is that value which
occurs most often or with the greatest frequency.
• The mode does not always exists. This is certainly true
when all observations occur with the same frequency.
MIDRANGE
• Defined as the mean of the largest and smallest values
in a set of data.
WEIGHTED MEAN
• Often, we wish to average the k quantities 𝑥1 , 𝑥2 , … , 𝑥𝑘 by
attaching more significance to some of the numbers than to
others. We accomplish this by assigning weights
𝑤1 , 𝑤2 , … , 𝑤𝑘 to the k quantities, where the weights
represent measures of their relative importance. The
corresponding weighted mean, 𝜇𝑤 𝑜𝑟 𝑥ҧ𝑤

σ𝑘𝑖=1 𝑤𝑖 𝜇𝑖
𝑘 .
σ𝑖=1 𝑤𝑖
COMBINED MEAN
• Suppose that k finite populations having 𝑁1 , 𝑁2 , . . . , 𝑁𝑘
measurements, respectively, have means 𝜇1 , 𝜇2 , … , 𝜇𝑘 . The combined
population mean, 𝜇𝑐 , for all the populations is
σ𝑘𝑖=1 𝑁𝑖 𝜇𝑖
𝜇𝑐 = 𝑘
σ𝑖=1 𝑁𝑖

𝑐𝑜𝑚𝑏𝑖𝑛𝑒𝑑 𝑠𝑎𝑚𝑝𝑙𝑒 𝑚𝑒𝑎𝑛 (𝑠𝑎𝑚𝑒 𝑓𝑜𝑟𝑚𝑢𝑙𝑎)


Example
• Three sections of a statistics class containing 28, 32, and 35 students
averaged 83, 80, and 76, respectively, on the same final examination.
What is the combined population mean for all 3 sections?

28 83 +32 80 +35(76)
𝜇𝑐 = = 79.41
28+32+35
GEOMETRIC MEAN
• The geometric mean, G, of k positive numbers 𝑥1 , 𝑥2 , . . . , 𝑥𝑘 is the kth
root of their product; that is,
𝐺 = 𝑘 𝑥1 𝑥2 . . . 𝑥𝑘
Example
• Find the geometric mean of 1, 4, and 128.

3
𝐺= (1)(4)(128) = 8
HARMONIC MEAN
• The harmonic mean, H, of k numbers 𝑥1 , 𝑥2 , . . . , 𝑥𝑘 is
the number k divided by the sum of the reciprocals of
the k numbers; that is,
𝑘
𝐻= 1
σ𝑘
𝑖=1 𝑥𝑖
MEASURES OF VARIATION
• Range
• Variance
• Standard deviation
RANGE
• The range of a set of data is the difference
between the largest and smallest number in the
set.
• Range = Highest - Lowest
Example
• The IQs of 55 members of a family are 108, 112, 127,
118, and 113. Find the range.

Range = 127 – 108 = 19


• The range is a poor measure of variation, particularly
if the size of the sample or population is large. It
considers only extreme values and tells us nothing
about the distribution of members in between.
POPULATION VARIANCE
• Given the finite population 𝑥1 , 𝑥2 , . . . , 𝑥𝑁 , the
population variance is
σ𝑁 (𝑥 −𝜇) 2
𝜎2 = 𝑖=1 𝑖
𝑁
POPULATION STANDARD DEVIATION
• Square root of the variance
Example
• The following scores were given by 6 judges for a
gymnast’s performance in the vault of an international
meet: 7, 5, 9, 7, 8, ad 6. Find the standard deviation
and variance of this population.

7+5+9+7+8+6
𝜇= =7
6
6 2
2
σ (𝑥
𝑖=1 𝑖 − 7)
𝜎 =
6
2 2 2 2 2 2
(0) + (−2) + (2) + (0) + (1) + (−1)
𝜎2 =
6

5
𝜎2 =
3
15
𝜎 = = 1.29
3
SAMPLE VARIANCE
• Given a random sample 𝑥1 , 𝑥2 , . . . , 𝑥𝑛 , the sample
variance is
σ𝑛 (𝑥 − ҧ
𝑥) 2
𝑠2 = 𝑖=1 𝑖
𝑛−1
Example
• A comparison of coffee prices at 4 randomly selected
grocery stores in San Diego showed increases from
the previous month of 12, 15, 17, and 20 cents for a
200-gram jar. Find the variance of this random sample
of price increases.
• Sample mean:
12 + 15 + 17 + 20
𝑥ҧ = = 16 𝑐𝑒𝑛𝑡𝑠
4
• Sample variance
4 2
σ 𝑖=1(𝑥𝑖 − 16)
𝑠2 =
3
(12 − 16)2 +(15 − 16)2 +(17 − 16)2 +(20 − 16)2
𝑠2 =
3
(−4)2 +(−1)2 +(1)2 +(4)2
𝑠2 =
3
2
34
𝑠 =
3
SAMPLE STANDARD DEVIATION
• Square root of the sample variance
CHEBYSHEV’S THEOREM
1
• At least the fraction 1 − of the measurements of any set of data
𝑘2
must lie within k standard deviations of the mean.
Example
1 3
• For k=2 the theorem states that at least 1 − = or 75%, of the
,
22 4
measurements must lies within 2 standard deviations on either side
of the mean. That is ¾ or more observations of a population must lie
in the interval 𝜇 ± 2𝜎
z SCORES
• Any observation, x, from a population with mean 𝜇 and standard
deviation 𝜎, has z score or z value defined by
𝑥−𝜇
𝑧= .
𝜎
Example
• Find the z scores corresponding to student’s grads in chemistry and
economics.

Subjects Grade Mean Standard deviation


Chemistry 82 68 8
Economics 89 80 6

82 − 68
𝑓𝑜𝑟 𝑐ℎ𝑒𝑚𝑖𝑠𝑡𝑟𝑦: 𝑧 = = 1.75
8
89 − 80
𝑓𝑜𝑟 𝑒𝑐𝑜𝑛𝑜𝑚𝑖𝑐𝑠: 𝑧 = = 1.50
6
Interpretation:
• We see that the student had a grade in chemistry that
was 1.75 standard deviations above the mean of the
chemistry grades, whereas in economics she was only
1.50 standard deviations above the mean of the
economics grades. Comparing these two z scores, we
can now say that the student’s relative performance in
chemistry was higher that her performance in
economics.
MEAN DEVIATION
• The mean deviation of a sample of n observations is defined to be

𝑛
σ𝑖=1 𝑥𝑖 − 𝑥ҧ
𝑛
• Find the mean deviation of the sample 2, 3, 5, 7, and 8
2+3+5+7+8
𝑓𝑜𝑟 𝑚𝑒𝑎𝑛: 𝑥ҧ = =5
5

𝑓𝑜𝑟 𝑚𝑒𝑎𝑛 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛:


2−5 + 3−5 + 5−5 + 7−5 + 8−5
=
5
3+2+0+2+3
=
5

=2
COEFFICIENT OF VARIATION
• The standard deviation does not by itself tell us much about the
variability of a single set of data. Perhaps a more appropriate
measure is the coefficient of variation, defined by
𝑠 𝜎
𝑉 = × 100% 𝑜𝑟 𝑉 = × 100%
𝑥ҧ 𝜇
• Which expresses the SD as a percentage of the mean.
CHAPTER 3: STATISTICAL DESCRIPTION OF
DATA
FREQUENCY DISTRIBUTION
• Important characteristics of a large mass of data can be readily
assessed by grouping the data into different classes and then
determining the number by observations that fall in each of the
classes. Such arrangement, in tabular form is called frequency
distribution.
• Data that are represented in the form of frequency distribution are
called grouped data.
Frequency Distribution for the Weights of 50
Pieces of Luggage
Weight (Kilograms) Number of Pieces
7–9 2
10 – 12 8
13 – 15 14
16 – 18 19
19 – 21 7

Class limits: for interval 10 – 12, the smaller number 10, the lower class limit, and the larger
number, 12 is the upper class limit.
Class Boundaries: 9.5 – lower class boundary and 12.5 upper class boundary
Class Frequency: the number of observations falling in a particular class
Class Width: difference between the upper and lower class boundaries of a class interval
Class Mark or Class midpoint: The midpoint between the upper and lower class boundaries
Class Interval Class Class Mark, x Frequency, f Cumulative
Boundaries Frequency, cf
7–9 6.5 – 9.5 8 2 2
10 – 12 9.5 – 12.5 11 8 10
13 – 15 12.5 – 15.5 14 14 24
16 – 18 15.5 – 18.5 17 19 43
19 – 21 18.5 – 21.5 20 7 50
Graphical Representations
• Bar chart • Histogram
• Although the bar chart provides immediate
information about a set of data in a condensed form,
we are usually more interested in a related pictorial
representation call a histogram. A histogram differs
from a bar chart in that the bases of each bar are the
class boundaries rather than the class limits. The use
of class boundaries for the bases eliminates the
spaces between the bars to give a solid appearance.
Frequency polygon
• Constructed by plotting the class frequencies against
class marks and connecting the consecutive points by
straight lines.
Cumulative Frequency Polygon or Ogive
• Obtained by plotting the cumulative frequency less
than any upper class boundary against the upper class
boundary and joining all the consecutive points by
straight lines
SYMMETRY AND SKEWNESS
• A distribution is said to be symmetric if it can be folded
along a vertical axis so that the two sides coincide.
• A distribution that lacks symmetry with respect to a
vertical axis is said to be skewed.
• Positively Skewed –skewed to right; it has a long right
tail compared to much shorter left tail.
• Negatively Skewed – skewed to left
• For a perfectly symmetrical distribution the
mean and median are identical and the value of
SK is zero (bell-shaped)
• Skewed to the left, the mean is less than the
median and the value of SK will be negative
• Skewed to the right, the mean is greater than
the median and the value of SK will be positive
EMPIRICAL RULE
• Given a bell-shaped distribution of measurements, then the
approximately

68% - 1 SD
95% - 2 SD
99.7% - 3 SD
FRACTILES OR QUANTILES

•Percentiles, Deciles, and Quartiles


PERCENTILES

•Are values that divide a set of


observations into 100 equal parts.
DECILES

•Are values that divide a set of


observations into 10 equal parts.
QUARTILES

•Are values that divide a set of


observations into 4 equal parts.
Ungrouped Data
• First rank the given data in increasing order of magnitude.
𝑛
• 𝑃𝑒𝑟𝑐𝑒𝑛𝑡𝑖𝑙𝑒 𝑅𝑎𝑛𝑘 𝑜𝑓𝑃𝑛 = ×𝑁
100
• Seek the value 𝑃𝑒𝑟𝑐𝑒𝑛𝑡𝑖𝑙𝑒 𝑅𝑎𝑛𝑘 𝑜𝑓𝑃𝑛
Example
1.6 2.6 3.1 3.2 3.4 3.7 3.9 4.3
1.9 2.9 3.1 3.3 3.4 3.7 3.9 4.4
2.2 3.0 3.1 3.3 3.5 3.7 4.1 4.5
2,5 3.0 3.2 3.3 3.5 3.8 4.1 4.7
2.6 3.1 3.2 3.4 3.6 3.8 4.2 4.7

Let us find 𝑃85


85
× 40 = 34 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠 𝑓𝑎𝑙𝑙
100

4.1 + 4.2
ℎ𝑒𝑛𝑐𝑒, 𝑃85 = = 4.15
2
1.6 2.6 3.1 3.2 3.4 3.7 3.9 4.3
1.9 2.9 3.1 3.3 3.4 3.7 3.9 4.4
2.2 3.0 3.1 3.3 3.5 3.7 4.1 4.5
2,5 3.0 3.2 3.3 3.5 3.8 4.1 4.7
2.6 3.1 3.2 3.4 3.6 3.8 4.2 4.7

• Let us find 𝑃48


48
× 40 = 19.2 = 20 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠 𝑓𝑎𝑙𝑙 (𝑎𝑙𝑤𝑎𝑦𝑠 𝑟𝑜𝑢𝑛𝑑 𝑢𝑝 𝑡𝑜 𝑡ℎ𝑒 𝑛𝑒𝑥𝑡 𝑖𝑛𝑡𝑒𝑔𝑒𝑟)
100
ℎ𝑒𝑛𝑐𝑒, 𝑃48 = 3.4
Grouped Data
𝑛
−𝑐𝑓𝑎
•𝑃𝑛 = 𝐿 + 𝑁
𝑖
𝑓𝑛
Grouped Data
Class Interval Class Class Frequency Cumulative
Boundaries Midpoint f Frequency • Let us find 𝑃48
cf
1.5 – 1.9 1.45 – 1.95 1.7 2 2 48
2.0 – 2.4 1.95 – 2.45 2.2 1 3 × 40 = 19.2 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠 𝑓𝑎𝑙𝑙
100
2.5 – 2.9 2.45 – 2.95 2.7 4 7
3.0 – 3.4 2.95 – 3.45 3.2 15 22
3.5 – 3.9 3.45 – 3.95 3.7 10 32
4.0 – 4.4 3.95 – 4.45 4.2 5 37
4.5 – 4.9 4.45 – 4.95 4.7 3 40

19.2 − 7
𝑃48 = 2.95 + 0.5 = 3.36
15
Mean of Grouped Data
𝑘
σ𝑖=1 𝑓𝑖 𝑥𝑖
𝑀𝑒𝑎𝑛 =
𝑁
𝑓𝑖 - class frequency
𝑥𝑖 - class mark
𝑁 − 𝑛𝑜. 𝑜𝑓 𝑑𝑎𝑡𝑎
Class Interval Class Class Frequency Cumulative
Boundaries Midpoint f Frequency
cf
1.5 – 1.9 1.45 – 1.95 1.7 2 2
2.0 – 2.4 1.95 – 2.45 2.2 1 3
2.5 – 2.9 2.45 – 2.95 2.7 4 7
3.0 – 3.4 2.95 – 3.45 3.2 15 22
3.5 – 3.9 3.45 – 3.95 3.7 10 32
4.0 – 4.4 3.95 – 4.45 4.2 5 37
4.5 – 4.9 4.45 – 4.95 4.7 3 40

2 1.7 + 1 2.2 + 4 2.7 + 15 3.2 + 10 3.7 + 5 4.2 + 3(4.7)


𝑀𝑒𝑎𝑛 =
40
𝑀𝑒𝑎𝑛 = 3.41
NORMAL DISTRIBUTION
• NORMAL CURVE
• GAUSSIAN DISTRIBUTION
STANDARD NORMAL DISTRIBUTION
• MEAN = 0
• SD = 1
STATISTICAL HYPOTHESIS
• An assertion or conjecture concerning one or more
population
• Hypotheses that were formulated with the hope
that they be rejected led to the use of the term
null hypothesis (𝐻0 ).
• The rejection of 𝐻0 leads to the acceptance of
an alternative hypothesis (𝐻1 ).
Type I Error

•Rejection of the null hypothesis when it us


true
Type II Error
• Acceptance of the null hypothesis when it is false

You might also like