0% found this document useful (0 votes)
17 views33 pages

Chapter 5 Descriptive Inferential Statistics

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1/ 33

Chapter V

DESCRIPTIVE STATISTICS
&
INFERENTIAL
Descriptive vs Inferential statistics
• Descriptive statistics describe a collection
of data. E.g., Mean, median, and mode.
• Inferential statistics draw inferences
about characteristics of a population
following examination of a sample. E.g.,
standard deviation, t-test, anova.
Inferential statistics – PARAMETRIC
• Parametric statistics : where the
population is assumed to fit a
parameterized distribution ( most
typically the normal distribution ).
• Parametric is used when the test variable
is measured on an interval or ratio scale
Inferential statistics –
NON-PARAMETRIC
• Sometimes a fit to a distribution doesn’t work.
In that case a non-parametric alternative is
more likely to detect a difference or lack of
difference.
• Non-Parametric statistics are statistics where
it is not assumed that the population fits any
parameterized distributions.
• Non-Parametric statistics are typically applied
to populations that take on a ranked order.
What is Parametric test ?
• Parametric tests apply to QuanTitative data
( discrete or continuous ).
• Parametric test includes : mean, standard
deviation, one sample t-test, two sample t-test,
Z-test, ANOVA.
• In Parametric tests, the measure of central
tendency is : Mean.
• To test whether the data are normally
distributed, we can use two tests : Kolmogorov–
Smirnov test and the Shapiro–Wilk test.
Hypothesis testing :
PARAMETRIC testing
• An Inferential statistical analysis
compares sample populations to see
whether they are likely to have been
drawn from the same population.
• For parametric testing of two things :
student’s t-test.
• For parametric testing of multiple things
: Analysis of Variance (ANOVA)
What is t-test ?
• Use the t-test when you have a small sample size,
or if you don’t know the population standard
deviation. t-test very similar to a Z-score.
• t-test can be used to compare the mean of a
sample with a given value. We call this type a
“one-sample t-test.”
• t-tests can also compare the means of two
samples. We call “two-samples t-tests” : including
“two independent samples” or “two paired
samples.”
Types of t-test ?
• Independent samples t-tests consider two
distinct groups, such as males vs females.
• Paired samples t-tests relate to the same set
of respondents and thus, occur when
respondents are surveyed multiple times. This
situation occurs in pre-test/post-test studies
designed to measure a variable before and
after a treatment.
What is Non-parametric test
( also called Non-metric tests ) ?
• Non-parametric tests ( also called
distribution-free tests ) don’t assume that
your data follow a normal distribution.
• Non-parametric tests apply to QuaLitative
data : nominal or ordinal data ( also called
Non-metric independent variables ).
• In Non-parametric tests, the Measure of
central tendency is : Median.
Two cases of Standard Deviation

10
Distribution of Sample Means
Examples of Confidence level
Example of Significance level
• Contractor says that it will take 9 months to
construct a house. The house is finished in 9
months and 1 week. The completion time is not 9
months; however it is Not significantly different
from the estimated time.

• Local authorities estimate that there are 20,000


people at a concert. Ticket receipts indicate there
are 42,000 attendees. This number of 42,000 is
significantly different from 20,000.
Significance level - Confidence level -
p-value
• Confidence level is associated with Null
hypothesis H0 ( where we accept Null
hypothesis H0 ).
• Significance level is associated with the areas
outside H0 (where we reject Null hypothesis
H0 ).
• Significance value ( as alpha α ) gives us the
Critical Value for testing.
• p-value is a quantitative measure of significance.
Illustration of confidence & significance
Two-tailed hypothesis test
One-tailed hypothesis test ( right hand)
One-tailed hypothesis test ( left hand)
SELECTING METHODS OF
DATA ANALYSIS
Analyzing QuaLitative data
( Non-metric )
• Nominal :
 Calculating Frequency
 Chi-square (kiểm định tần số)

• Ordinal :
 Calculating Frequency & Median
 Testing Kolmogorov-Simirnov, Wilcoxon
Analyzing QuanTitative data ( metric )

Applying for Interval & Ratio Scale

 Calculating Mean
 Z test, t -test
Depending on Parametric vs.
Non-parametric
• Parametric  test Z, t

• Non-parametric  Chi-square,
Wilcoxon
Depending on number of samples
• Independent samples
e.g., Testing Mean of a population  use
t for Independent sample

• Dependent samples
e.g., Testing difference between 2 Means
 use t for Paired samples
Depending on Number of Variables

• Univariate data analysis


• Bivariate
• Multivariate
Depending on the correlations btw Variables
• Dependence method
– Independent variables & Dependent variables;
– e.g., Linear regression, Multiple regression,
Discriminant analysis, Multivariate analysis of
variance.
• Interdependence method
– When there are no independent nor dependent
variables but they depend on each other;
– e.g., Exploratory factor analysis, Cluster
analysis, Multidimensional scaling.
DATA ENTRY
Data Coding
• Precoding : coding Before interview [ with
Closed questions ]
• Postcoding : coding After interview [ with
Open questions ]
• Codes have two digits :
– 1st digit : for Variable
– 2nd digit : for Answer
• Data and codes are described on Code book
Data Matrix

• Data Coding  Data Matrix  Software


• Columns of matrix express Coding of Variables
• Rows of matrix express elements of samples
• Intersections of columns and rows express the
answers
Clean missing cells
• Mistake from data entry
• Mistake from missing field data
• To determine the reason : sum up the total number
of answers to each variable ( each column )
• If from data entry  supplement from
questionnaire
• If from missing interview 
– Amend interview
– Bypass that element ( reduce sample size )
– Replace by average of some answers or all
answers ( SPSS can do fast )
Clean unreasonable data
• Mainly due to data entry mistake
• Identify by calculating the frequency of
occurrence at column ( variable )
• Resolve by checking the questionnaire
SUMMARIZING DATA
• Statistical Summarization : mean, median, mode,
variance, standard deviation, range.
• Summarizing by Table: Simple & Cross table
• Summarizing by Chart :
– Bar chart :
– Pie
– Line graph
– Scattered

You might also like