0% found this document useful (0 votes)
22 views83 pages

Basic Statistical Test

Uploaded by

Audrey Dump
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views83 pages

Basic Statistical Test

Uploaded by

Audrey Dump
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 83

Basic Statistical Test

Learning Competency:
1. Differentiate descriptive from
inferential statistics
2. Determine the appropriate
statistical tool for organizing and
describing the
numerical data gathered
Objectives:
1. Differentiate descriptive from inferential statistics
2. Determine the appropriate statistical tool for
organizing and describing the numerical data
gathered.
3.Calculate the mean, median, mode, and range from
the numerical data gathered.
Students will use the KWL chart as a Think-Pair-Share. Each group must fill in on
what they have learned on basic statistics. ( 5 mins only!)

What I Know What I Want to Know What I Learned


What are the the following levels of measurement scales:

⚫ Nominal Scale- classifies data into mutually exclusive


(nonoverlapping) categories in which no order or
ranking can be imposed on the data.
⚫ Ordinal Scale-classifies data into categories that can be
ranked; however, precise differences between the ranks
do not exist.
⚫ Interval Scale - ranks data, and precise differences
between units of measure do exist; however, there is no
meaningful zero.
⚫ Ratio Scale- possesses all the characteristics of
interval measurement, and there exists a true zero. In
addition, true ratios exist when the same variable is
measured on two different members of the population.
Video presentation on measurement scale
Level of Measurement Pop Quiz
I. Identify the following in the picture as nominal level, ordinal level,
interval level, or ratio level data.

II. Answer the following levels of measurement.


1. What level of measurement would be used to measure each variable?
What is descriptive and inferential statistics?

How they differ in terms of statistical data?


What is descriptive statistics?
Descriptive statistics is the term given to the analysis of data that helps
describe, show or summarize data in a meaningful way such that, for
example, patterns might emerge from the data. Descriptive statistics do
not, however, allow us to make conclusions beyond the data we have
analyzed or reach conclusions regarding any hypotheses we might have
made. They are simply a way to describe our data.
Descriptive statistics are very important because if we simply presented
our raw data it would be hard to visualize what the data was showing,
especially if there was a lot of it.
Descriptive statistics therefore enable us to present the data in a more
meaningful way, which allows simpler interpretation of the data. For
example, if we had the results of 100 pieces of students'coursework, we
may be interested in the overall performance of those students. We would
also be interested in the distribution or spread of the grades.
Suppose, junior high school students were asked how many hours
they spent on the computer, and in what subject they often used
the computer for. Results of the survey could indicate that on the
average, the junior high school students spent two (2) or more
hours with a range of one (1) to four (4) hours.
Typically, there are two general types of statistic that are used to describe
data:
1. Measures of central tendency: these are ways of describing the central
position of a frequency distribution for a group of data. In this case, the
frequency distribution is simply the distribution and pattern of grades
scored by the 100 students from the lowest to the highest. We can describe
this central position using a number of statistics, including the mode,
median, and mean. You can read about measures of central tendency.

Measures of average are also called measures of central tendency and include the mean,
median, mode, and midrange.
Comparison of Ungrouped and Grouped of Data

Ungrouped data is not classified or organized into different classes,


whereas grouped data is organized into a number of classes.
Ungrouped data is presented in the form of lists, whereas, frequency
tables are used to express, grouped data.
The Mean
The mean, also known as the arithmetic average, is found by adding the values of the data
and dividing by the total number of values.
Example: Police Incidents
The number of calls that a local police department responded
to for a sample of 9 months is shown. Find the mean. (Data
were obtained by the author.) 475, 447, 440, 761, 993, 1052,
783, 671, 621
The Median
An article recently reported that the median income for college professors
was $43,250. This measure of central tendency means that one-half of all the
professors surveyed earned more than $43,250, and one-half earned less
than $43,250. The median is the halfway point in a data set. Before you can
find this point, the data must be arranged in ascending or increasing order.
When the data set is ordered, it is called a data array. The median either will
be a specific value in the data set or will fall between two values, as shown in
the next examples. The median is the midpoint of the data array. The symbol
for the median is MD
Example 2: Tornadoes in the United States
The number of tornadoes that have occurred in the United States over an 8-year period
follows. Find the median.
684, 764, 656, 702, 856, 1133, 1132, 1303
The Mode
The third measure of average is called the mode. The mode is the value that
occurs most often in the data set. It is sometimes said to be the most typical
case. The value that occurs most often in a data set is called the mode. A data
set that has only one value that occurs with the greatest frequency is said to
be unimodal. If a data set has two values that occur with the same greatest
frequency, both values are considered to be the mode and the data set is said
to be bimodal. If a data set has more than two values that occur with the
same greatest frequency, each value is used as the mode, and the data set is
said to be multimodal. When no data value occurs more than once, the data
set is said to have no mode.
Note: Do not say that the mode is zero. That would be incorrect, because in
some data, such as temperature, zero can be an actual value. A data set can
have more than one mode or no mode at all.
Ex. Licensed Nuclear Reactors
The data show the number of licensed nuclear reactors in the United States
for a recent 15-year period. Find the mode. Source: The World Almanac and
Book of Facts.
104 104 104 104 104
107 109 109 109 110
109 111 112 111 109
The Midrange
The midrange is a rough estimate of the middle. It is found by adding the
lowest and highest values in the data set and dividing by 2. It is a very rough
estimate of the average and can be affected by one extremely high or low
value.
Licensed Nuclear Reactors
The data show the number of licensed nuclear reactors in the United
States for a recent 15-year period. Find the mode.
Source: The World Almanac and Book of Facts.

104 104 104 104 104


107 109 109 109 110
109 111 112 111 109
Grouped Data

Grouping of data plays a significant role when we have to deal


with large data. This information can also be displayed using
a pictograph or a bar graph. Data formed by arranging individual
observations of a variable into groups, so that a frequency
distribution table of these groups provides a convenient way of
summarizing or analyzing the data is termed as grouped data.
The Weighted Mean
Sometimes, you must find the mean of a data set in which not all values are equally
represented.
σ 𝑓𝑥
𝑤𝑒𝑖𝑔ℎ𝑡𝑒𝑑 𝑚𝑒𝑎𝑛 ത
𝑋𝑤 =
𝑛
f= frequency
x= numerical value or item in a set of data
n= number of observations in the data set
Mean Grouped of Data

Grade f Midpoi cf f.x


nt (x)
40 - 49 3
50 - 59 5
60 - 69 6
70 - 79 9
80 - 89 8
90 - 100 7

෍𝑓 = ෍𝑓x =
Solve for the mean, median, and mode of the data below.

Class Frequency (f) Class Midpoint cf fx


(x)
76-80
71- 75
66- 70
61- 65
56- 60
51- 55
46- 50
41- 45
36- 40
Solve for the mean, median, and mode of the data below.

Class Frequency (f) Class Midpoint cf fx


(x)
40- 49
50- 59
60- 69
70- 79
80- 89
90- 99
2. Measures of spread: these are ways of summarizing a group of
data by describing how spread out the scores are. For example,
the mean score of our 100 students may be 65 out of 100.
However, not all students will have scored 65 marks. Rather,
their scores will be spread out. Some will be
lower and others higher. Measures of spread help us to
summarize how spread out these scores are. To describe this
spread, a number of statistics are available to us, including the
range, quartiles, absolute deviation, variance and standard
deviation.
What Is Variance?
The term variance refers to a statistical measurement of the spread
between numbers in a data set. More specifically, variance
measures how far each number in the set is from
the mean (average), and thus from every other number in the set.
Variance is often depicted by this symbol: σ2. It is used by both
analysts and traders to determine volatility and market security.
The square root of the variance is the standard deviation (SD or σ),
which helps determine the consistency of an
investment’s returns over a period of time.
Grouping Data - Definition, Frequency distribution table and example (byjus.com)
Inferential Statistics
Inferential Statistics
We have seen that descriptive statistics provide
information about our immediate group of data. For
example, we could calculate the mean and standard
deviation of the exam grades for 100 students and
this could provide valuable information about this
group of 100 students. Any group of data like this,
which includes all the data you are interested in, is
called a population.
A population can be small or large, as long as it
includes all the data you are interested in. For
example, if you were only interested in the exam
grades of 100 students, the 100 students would
represent your population. Descriptive statistics are
applied to populations, and the properties of
populations, like the mean or standard deviation,
are called parameters as they represent the whole
population (i.e., everybody you are interested in).
Often, however, you do not have access to the whole population you are
interested in investigating, but only a limited number of data instead. For
example, you might be interested in the exam grades of all students in the
UK. It is not feasible to measure all exam grades of all students in the
whole of the UK so you have to measure a smaller sample of students (e.g.,
100 students), which are used to represent the larger population of all UK
students. Properties of samples, such as the mean or standard deviation,
are not called parameters, but statistics.
Inferential statistics are techniques that allow us to use these samples to
make generalizations about the populations from which the samples were
drawn. It is, therefore, important that the sample accurately represents the
population. The process of achieving this is called sampling (sampling
strategies are discussed in detail here on our sister site).

Inferential statistics arise out of the fact that sampling naturally incurs
sampling error and thus a sample is not expected to perfectly represent the
population. The methods of inferential statistics are (1) the estimation of
parameter(s) and (2) testing of statistical hypotheses.
Example:
Types of Statistical Tests
1. Ttests
• One sample ttest
• Measures: Mean of a single variable
• When to use: Comparing a known mean against a hypothetical value
• Assumptions: Variable should be normally distributed
• Interpretation: If the pvalue is less than .05, the results are significant
• What to use if assumptions are not met:
• Normality violated, use the Mann-Whitney or Wilcoxon Rank Sum
• Independent ttest
• Measures:
•Dependent variable (continuous)
•Independent variable (binary)
•When to use: Compare the means of 2 independent groups
• Assumptions:
• Dependent variable should be normally distributed
• Homogeneity of variance (Levene’s Test)
•Interpretation: If the pvalue is less than.05, the results are significant
• What to use if assumptions are not met:
•Normality violated, use the Mann-Whitney or Wilcoxon Rank Sum
•Homogeneity violated, use the second row of results on the ttest table
Ex. Students life satisfaction that undergo counseling.

Undergo counseling Not undergo counseling

X1 X2
7
8
10
8
7
6
4
7
8
9
One Sample t- Test
Problem 1: In the population, the average IQ is 100. A team of scientist wants to
test a new medication to see if it has either a positive or negative effect on
intelligence or no effect at all. A sample of 30 participants who have taken the
medication has a mean of 140 with a standard deviation of 20. Did the medication
affect the intelligence? Alpha = 0.05
1. Define Null and Alternative Hypotheses
2. State the Alpha
3. Calculate the degrees of Freedom
4. State the Decision Rule
5. Calculate Test Statistics
6. State Conclusion
Problem 2: Hospital Infections
A medical investigation claims that the average number of infections per week at a
hospital in southwestern Pennsylvania is 16.3. A random sample of 10 weeks had a
mean number of 17.7 infections. The sample standard deviation is 1.8. Is there
enough evidence to reject the investigator’s claim at a 0.05? Assume the variable is
normally distributed.
Source: Based on information obtained from Pennsylvania Health Care Cost Containment Council.
Problem #3: The manager of an rental agency claims the average mileage
of cars rented is less than 8000. A sample of five automobiles has an
average mileage of 7723, with a standard deviation of 500 miles. At Alpha
of 0.01, is there enough evidence to reject the menager’s claim?
Problem 4: Sample of Newborn Birth Weights in Kilogram
X
2.6 kg
2.7 kg
3.2 kg
2.9 kg
3.0 kg
3.3 kg
5. 2 kg
2.3 kg
4.1 kg
3.7 kg
Problem 5: Does the statistics ability of KSU graduates differ from
national average? The average on the national statistics exam is µ= 5.8
KSU scores
10
15
13
5
8
7
9
9
10
13
Problem #:
Test the claim that µ = 100 against µ> 100 given a sample of n= 81 for
which x= 100. 8. Assume that sigma/SD=5, and at the alpha =0. 05
significant level.

Problem #:
Test claim that µ= 15.5 against < 15.5 given a sample of n= 45 for which
the x =14.3. Assume that sigma/ SD= 5.5 and test at the alpha = 0.05
significance level.
• Paired sample ttest
• Measures:
• Dependent variable (continuous)
• Independent variable (2 points in time or 2 conditions with same group)
• When to use: Compare the means of a single group at 2 points in time
(pretest/post test)
• Assumptions:
• Paired differences should be normally distributed (check with histogram)
• Interpretation: If the pvalue is less than.05, the results are significant
• What to use if assumptions are not met: Wilcoxon Signed Rank Test
What is the paired t-test?

The paired t-test is a method used to test whether the mean difference
between pairs of measurements is zero or not

When can I use the test?

You can use the test when your data values are paired measurements. For
example, you might have before-and-after measurements for a group of
people. Also, the distribution of differences between the paired
measurements should be normally distributed.
What are some other names for the paired t-test?

The paired t-test is also known as the dependent samples t-test,


the paired-difference t-test, the matched pairs t-test and the
repeated-samples t-test.

What if my data isn’t nearly normally distributed?


Ex. Researcher
Example: Religious of a person after spiritual retreat
Before retreat After retreat Diff. 1 Diff. 2
(Pre-test) (Post test)

5 7
6 6
7 9
4 7
3 5
5 9
6 8
4 6
M1= M2= ෍𝐷 = ෍ 𝐷2 =
1. State the hypotheses:
2. Compute the difference score.

3. Compute the the mean of a sample diff. score & SSD

σ𝐷
ഥ=
𝐷 =
𝑁

෌ 𝐷2
𝑆𝑆𝐷 = ෍ 𝐷2 −
𝑁
4. Compute the t-statistics

𝐷−𝜇
𝑡 = 𝑠𝑠 𝐷
𝐷
𝑁(𝑁−1)
5. Compute the critical value:
d.f= N-1

Type equation here.


6. Conclusion:
2. One Way ANOVA (Analysis of Variance)
• Measures:
• Dependent (continuous)
• Independent (categorical, at least 3 categories)
• When to use: When assessing means between 3 or more groups
• Assumptions:
• Normal distribution of residuals (check with histogram)
• Homogeneity of Variance (Levene’s Test)
• Interpretation: Null hypothesis states all means are equal; if
rejected, conduct a post hoc test to see where actual differences occur
• What to use if assumptions are not met:
• Normality violated, use Kruskall-Wallis test
• Homogeneity violated, use Welch test and Games-Howell post hoc
test
One-way ANOVA (with repeated measures)
• Measures:
• Dependent (continuous)
• Independent (categorical, with levels as within subject factor)
• When to use: When assessing means between 3 or more groups with dependent
variable repeated
• Assumptions:
• Normal distribution of residuals (check with histogram)
• Sphericity (Mauchly’s Test)
• Interpretation: If the main ANOVA is significant, there is a difference between at
least two time points (check where difference occur with Bonferroni post hoc test).
• What to use if assumptions are not met:
• Normality violated, use Friedman test
• Sphericity violated, use Greenouse-Geissercorrection
Two-way ANOVA
• Measures:
• Dependent (continuous)
• Independent (categorical, with 2+ levels within each)
• When to use: There are three sets of hypothesis with a two-way
ANOVA. H0 for each set is as follows:
• The population means of the first factor are equal –equivalent
to a one-way ANOVA for the row factor.
• The population means of the second factor are equal –equivalent to a
one-way ANOVA for the column factor.
• There is no interaction between the two factors –equivalent to
performing a test for independence with contingency tables (a chi-
squared test for independence).
• Assumptions:
• Normal distribution of residuals (check with histogram)
• Homogeneity of variance (Levene’s Test)
• Interpretation: When interpreting the results, you need to return to the hypotheses and
address each one in turn. If the interaction is significant, the main effects cannot be
interpreted from the ANOVA table. Use the means plot to explain the effects or carry out
separate ANOVA by group.
• What to use if assumptions are not met:
• Normality violated, use Friedman test
• Homogeneity violated, compare p-values with smaller significance level, e.g, .01

You might also like