0% found this document useful (0 votes)
67 views9 pages

EDU 411 Topic 5 Data Analysis

Uploaded by

danielndeto222
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
67 views9 pages

EDU 411 Topic 5 Data Analysis

Uploaded by

danielndeto222
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

EDU 411: EDUCATIONAL RESEARCH

Course Lecturer: Dr. Stephen Kipkorir Rotich Mobile: +254 724 941 908 ; e-mail: rotichkip-
[email protected]

5. DATA ANALYSIS
Measurement Scales

Measurement scales are used to classify and quantify variables in research. They determine the
nature of the data and the types of statistical analyses that can be applied. There are four primary
types of measurement scales:
1. Nominal Scale
o Definition: The nominal scale classifies data into distinct categories that do not
have any inherent order or ranking.
o Characteristics:
 Categories are mutually exclusive and exhaustive.
 No numerical value or order is associated with the categories.
o Examples: Gender (male, female), nationality (American, Canadian), color (red,
blue).
o Descriptive Statistics: Frequency counts and mode (the most common
category).

2. Ordinal Scale
o Definition: The ordinal scale ranks data in a meaningful order, but the intervals
between ranks are not necessarily equal.
o Characteristics:
 Data can be ordered or ranked.
 Differences between ranks are not uniform.
o Examples: Satisfaction ratings (very satisfied, satisfied, neutral, dissatisfied, very
dissatisfied), education level (high school, bachelor’s, master’s, doctoral).
o Descriptive Statistics: Median, mode, and range. Percentiles and quartiles
are also used.
3. Interval Scale
o Definition: The interval scale measures variables where the intervals between
values are equal, but there is no true zero point.
o Characteristics:
 Equal intervals represent equal differences in the variable being measured.
 Lacks a true zero point (zero does not mean the absence of the variable).
o Examples: Temperature in Celsius or Fahrenheit, IQ scores.
o Descriptive Statistics: Mean, median, mode, standard deviation, and
variance.
4. Ratio Scale
o Definition: The ratio scale has all the properties of the interval scale, with the
added feature of a true zero point.
o Characteristics:
 Allows for the comparison of absolute magnitudes.
 Zero indicates the absence of the variable.
1
o Examples: Weight, height, age, income.
o Descriptive Statistics: Mean, median, mode, range, standard deviation,
variance. Ratios and percentages are also meaningful.

1. Descriptive Statistics
 Descriptive statistics is a means of describing features of a data set by generating sum-
maries about data samples.
 Descriptive statistics summarize and describe the features of a dataset. They help to pre-
sent data in a meaningful way.
 Descriptive statistics explain high-level summaries of a set of information such as the
mean, median, mode, variance, range, and count of information.
 The main purpose of descriptive statistics is to:
 provide information about the actual characteristics of a data set
 help understand data attributes

Types of Descriptive Statistics

 The three main types of descriptive statistics are frequency distribu-


tion, central tendency, and variability of a data set.
 The frequency distribution records how often data occurs, central ten-
dency records the data's center point of distribution, and variability of a
data set records its degree of dispersion
1. Measures of Central Tendency
o Mean: The arithmetic average of the data set. Calculated as the sum of all values
divided by the number of values.
o Median: The middle value when data is ordered from least to greatest. If there is
an even number of observations, the median is the average of the two middle val-
ues.
o Mode: The most frequently occurring value in the data set. A data set may have
one mode, more than one mode, or no mode.
2. Measures of Dispersion/Variability
o Range: The difference between the maximum and minimum values in the data
set.
o Variance: The average of the squared differences from the mean. It measures
how much the data points deviate from the mean.
o Standard Deviation: The square root of the variance. It provides a measure of the
average distance of each data point from the mean.
o Interquartile Range (IQR): The range between the first quartile (25th percentile)
and the third quartile (75th percentile). It measures the spread of the middle 50%
of the data.
3. Measures of Shape
o Skewness: Measures the asymmetry of the data distribution. Positive skew indi-
cates a long tail on the right, while negative skew indicates a long tail on the left.
o Kurtosis: Measures the "tailedness" of the data distribution. High kurtosis indi-
cates heavy tails, while low kurtosis indicates light tails.

2
2. Inferential Statistics
 Inferential statistics are another broad category of techniques that go beyond describing a
data set.
 Inferential statistics can help researchers draw conclusions from a sample to a population.
 Inferential statistics are used to examine differences among groups and the relationships
among variables
 Inferential statistics are used to make generalizations or inferences about a population
based on a sample. They help to draw conclusions and make predictions.

Types of Inferential Statistics


1. Chi-Square test: Tests if there is an association between two categorical variables.
2. T-test
3. Analysis of Variance(ANOVA)
Both t-test and ANOVA compare the group means of a sample population
 T-test: The t-test is used to compare two group means by de-
termining whether group differences are likely to have oc-
curred randomly by chance or systematically indicating a re-
al difference
 Analysis of Variance (ANOVA): the t test is limited to two groups, but
the ANOVA is applicable to more than two groups.
4. Correlation:
 Linear model that determine association/relationship of two variables giving sig-
nificant, the strength and direction.
 Association can be directly, inversely proportion or no relationship. Direction can
be positive or negative. The strength ( r ) ranges from -1 to +1
5. Regression:
 Linear or multiple model that determine both association/relationship of more
two independent variables and predict the dependent variable outcome which
Correlation can’t predict.

Inferential statistics use Hypothesis Testing to determine the significant of relationships


of the variables.

o Null Hypothesis (H₀): The default assumption that there is no effect or no differ-
ence.
o Alternative Hypothesis (H₁): The hypothesis that there is an effect or a differ-
ence.
o Test Statistics: Values calculated from the sample data used to determine wheth-
er to reject the null hypothesis (e.g., t-test, chi-square test).
o P-Value: The probability of obtaining the observed results, or more extreme re-
sults, assuming the null hypothesis is true. A small p-value indicates strong evi-
dence against the null hypothesis.
Significance Level (α): The threshold for deciding whether to reject the null hypothesis,
commonly set at 0.05.

3
1. Chi-Square Tests
o Definition: Tests the association between categorical variables.
o Types:
 Chi-Square Test of Independence: Determines if there is an association
between two categorical variables.
 Chi-Square Goodness of Fit Test:

The chi-square goodness of fit test is a statistical test used to determine how well observed data
fits a particular theoretical distribution. Its main purposes are:

 Assessing Fit to a Distribution: It evaluates whether the observed frequencies of events


in categorical data align with the frequencies expected under a specific hypothesis or
theoretical distribution (e.g., uniform distribution, normal distribution).
 Testing Hypotheses: It helps test hypotheses about categorical data, such as whether
the distribution of categories differs from what is expected. For example, it can assess if
the distribution of colors in a bag of M&Ms follows the proportions advertised by the
manufacturer.
 Determining Model Accuracy: It measures the discrepancy between observed and ex-
pected frequencies, helping to determine the accuracy of a statistical model or theoretical
distribution in representing real-world data.

2. T-test

A t-test is a statistical test used to determine if there is a significant difference between the means
of two groups. It helps in assessing whether the differences observed are likely due to chance or
if they are statistically significant.

Here's a breakdown of the key aspects:

Types of t-tests

 One-Sample t-test: Compares the mean of a single sample to a known value or popula-
tion mean. For example, testing if the average height of a sample of students is different
from the known average height of the population.
 Independent Two-Sample t-test: Compares the means of two independent groups. For
example, testing if the average test scores of students from two different teaching meth-
ods are different.
 Paired Sample t-test: Compares means from the same group at different times or under
different conditions. For example, comparing test scores of the same students before and
after a training program.

Assumptions of the t-test

 Normality: The data in each group should be approximately normally distributed, espe-
cially for small sample sizes. For larger samples, the Central Limit Theorem often helps
in relaxing this assumption.
 Homogeneity of Variance: The variance within each group should be approximately
equal. This is more critical for the independent two-sample t-test.
 Independence: Observations should be independent of each other.

4
Hypotheses

 Null Hypothesis (H₀): Assumes there is no effect or difference. For example, in an inde-
pendent t-test, it might state that the means of the two groups are equal.
 Alternative Hypothesis (H₁): Assumes there is an effect or difference. In the independ-
ent t-test, it might state that the means of the two groups are not equal.

3. ANOVA (Analysis of Variance)


o Definition: Tests for differences between more than two group means by compar-
ing variance within groups to variance between groups.
o Types:
 One-Way ANOVA: Tests differences between means of three or more
groups based on one factor.
 Two-Way ANOVA: Tests differences based on two factors and can assess
interaction effects.
4. Correlation Analysis
o Definition: Measures the strength and direction of the rela-
tionship between two variables.
o Types:
 Pearson Correlation Coefficient (r): Measures the
linear relationship between two continuous variables.
 Spearman’s Rank Correlation Coefficient:
Measures the monotonic relationship between two
ranked variables
5. Regression Analysis
o Simple Linear Regression: Examines the relationship between two variables,
one dependent and one independent.
o Multiple Regression: Examines the relationship between one dependent variable
and two or more independent variables.
o Logistic Regression: Used when the dependent variable is categorical (e.g., bina-
ry outcomes).

In regression analysis, the fundamental purpose is to model the relationship between a dependent
variable and one or more independent variables. Here’s a breakdown of the equations used in
different types of regression models:

i. Simple Linear Regression

For simple linear regression, which models the relationship between a dependent variable (YYY)
and a single independent variable (XXX), the equation is:

Y=β0+β1X+ϵY = \beta_0 + \beta_1 X + \epsilonY=β0+β1X+ϵ

where:

 YYY is the dependent variable (response).


 XXX is the independent variable (predictor).
5
 β0\beta_0β0 is the y-intercept of the regression line.
 β1\beta_1β1 is the slope of the regression line, indicating the change in YYY for a one-
unit change in XXX.
 ϵ\epsilonϵ is the error term, representing the difference between the observed and predict-
ed values of YYY.

ii. Multiple Linear Regression

For multiple linear regression, which models the relationship between a dependent variable
(YYY) and multiple independent variables (X1,X2,…,XpX_1, X_2, \ldots, X_pX1,X2,…,Xp),
the equation extends to:

Y=β0+β1X1+β2X2+ϵ

where:

 YYY is the dependent variable.


 X1,X2,… are the independent variables.
 β0\beta_0β0 is the intercept.
 β1,β2,… are the coefficients of the independent variables.
 ϵ is the error term.

Summary
 Measurement Scales: Nominal, Ordinal, Interval, and Ratio scales determine the level of
measurement and the types of statistical analysis that can be applied.
 Descriptive Statistics: Include measures of central tendency, dispersion, and shape to
summarize and describe data.
 Inferential Statistics: Include estimation, hypothesis testing, regression analysis, ANO-
VA, chi-square tests, and correlation analysis to make inferences and predictions about
populations based on sample data.
Understanding these concepts helps researchers design studies, analyze data accurately, and draw
valid conclusions.

6
Table 1
Descriptive statistics

Statistic Statistic Description of calculation Intent


Measures of Mean Total of values divided by the number of values. Describe all responses with the
central ten- average value.
dency
Median Arrange all values in order and determine the Determine the middle value
halfway point. among all values, which is im-
portant when dealing with ex-
treme outliers.
Mode Examine all values and determine which one Describe the most common val-
appears most frequently. ue.
Measures of Variance Calculate the difference of each value from the Provide an indicator of spread.
variability mean, square this difference score, sum all of
the squared difference scores and divide by the
number of values minus 1.
Standard Square root of variance. Give an indicator of spread by
deviation reporting on average how much
values differ from the mean.
Range The difference between the maximum and min- Give a very general indicator of
imum value. spread.
Frequencies Count the number of occurrences of each value. Provide a distribution of how
many times each value occurs.

Inferential statistics:
Comparing groups with t-tests and ANOVA

Table 2 presents a menu of common, fundamental inferential tests. Remember that even more
complex statistics rely on these as a foundation.
Table 2
Inferential statistics

Statistic Intent
t tests Compare groups to examine whether means between two groups are statistically signifi-
cant.
Analysis of vari- Compare groups to examine whether means among two or more groups are statistically
ance[ANOVA] significant.
Correlation Examine whether there is a relationship or association between two or more variables. It
(Pearson/Spearman) provides degree /strength of association and whether it is significant
Regression Examine how one or more variables predict another variable and provides the
strength/degree of each variable

7
Examining relationships using correlation and regression
The general linear model contains two other major methods of analysis,
correlation and regression.
Correlation reveals whether values between two variables tend to sys-
tematically change together. Correlation analysis has three general out-
comes: (1) the two variables rise and fall together; (2) as values in one
variable rise, the other falls; and (3) the two variables do not appear to
be systematically related. To make those determinations, we use the cor-
relation coefficient (r) and related p value or CI. First, use the p value or
CI, as compared with established significance criteria (eg, p<0.05), to de-
termine whether a relationship is even statistically significant. If it is not,
stop as there is no point in looking at the coefficients. If so, move to the
correlation coefficient.
A correlation coefficient provides two very important pieces of infor-
mation—the strength and direction of the relationship. An r statistic can
range from −1.0 to +1.0. Strength is determined by how close the value is
to −1.0 or 1.0. Either extreme indicates a perfect relationship, while a
value of 0 indicates no relationship. Cohen provides guidance for inter-
pretations: 0.1 is a weak correlation, 0.3 is a medium correlation and 0.5
is a large correlation.1 2 These interpretations must be considered in the
context of the study and relative to the literature. The valence (+ or −)
coefficient reveals the direction of the relationship. A negative correla-
tion means one value rises, while the other tends to fall, and a positive
coefficient means that the values of the two variables tend to rise and fall
together.
Regression adds an additional layer beyond correlation that allows pre-
dicting one value from another. Assume we are trying to predict a de-
pendent variable (Y) from an independent variable (X). Simple linear re-
gression gives an equation (Y = b0 + b1X) for a line that we can use to pre-
dict one value from another. The three major components of that predic-
tion are the constant (ie, the intercept represented by b0), the systematic
explanation of variation (b1), and the error, which is a residual value not
accounted for in the equation3 but available as part of our regression
output. To assess a regression model (ie, model fit), examine key pieces
of the regression output: (1) F statistic and its significance to determine
whether the model systematically accounts for variance in the depend-
ent variable; (2) the r square value for a measure of how much variance
in the dependent variable is accounted for by the model; (3) the signifi-
cance of coefficients for each independent variable in the model; and (4)

8
residuals to examine random error in the model. Other factors, such as
outliers, are potentially important (see Field4).
The aforementioned inferential tests are foundational to many other ad-
vanced statistics that are beyond the scope of this article. Inferential
tests rely on foundational assumptions, including that data are normally
distributed, observations are independent, and generally that our de-
pendent or outcome variable is continuous. When data do not meet these
assumptions, we turn to non-parametric statistics (see Field4)..5
Statistical software
While the aforementioned statistics can be calculated manually, re-
searchers typically use statistical software that process data, calculate
statistics and p values, and supply a summary output from the analysis.
However, the programs still require an informed researcher to run the
correct analysis and interpret the output. Several available programs in-
clude SAS, Stata, SPSS and R. Try using the programs through a demon-
stration or trial period before deciding which one to use. It also helps to
know or have access to others using the program should you have ques-
tions.

You might also like