Chapter Seven Data Analysis
Chapter Seven Data Analysis
Chapter Seven
Zelalem T. (PhD)
2
Data Analysis
Coding
Zelalem T. (PhD)
2/11/2023
3
Data Processing
❖ Editing
❖ Coding
❖ Classification
Zelalem T. (PhD)
2/11/2023
4
Editing
Is a process of examining the collected raw data to
detect errors and omission (extreme values) and to
correct those when possible
involves a careful scrutiny of completed questionnaires
or schedules
is done to assure that the data are
▪ Accurate
▪ Consistent with other data gathered
▪ Uniformly entered
▪ As complete as possible
▪ And has been well organized to facilitate coding and tabulation
Cont’d
Coding: Refers to the process of assigning numerical
or other symbols to answers so that responses can be
put into a limited number of categories or classes. Such
classes should be appropriate to the research problem
under consideration.
There must be a class of every data items. They must be
mutually exclusive (a specific answer can be placed in
one and only one cell in a given category set).
Cont’d
Zelalem T. (PhD)
2/11/2023
7
Classification
Most research studies result in a large volume of raw
data, which must be reduced into homogeneous group.
Which means to classify the raw data or arranging data
in-groups or classes on the basis of common
characteristics.
Zelalem T. (PhD)
2/11/2023
8
Zelalem T. (PhD)
2/11/2023
9
Data Analysis
Zelalem T. (PhD)
2/11/2023
1
0
Cont’d
❖ Inferential Analysis
Zelalem T. (PhD)
2/11/2023
1
1
Zelalem T. (PhD)
2/11/2023
1
2
Cont’d
Zelalem T. (PhD)
2/11/2023
1
3
Descriptive ….
Zelalem T. (PhD)
2/11/2023
1
4
Tabulation
Refers to the orderly arrangement of data in a
table or other summary format. It
❖ presents responses or the observations on a question-
by-question or item-by-item basis and provides the
most basic form of information.
❖ tells the researcher how frequently each response
occurs.
❖ requires the counting of responses or observations
for each of the categories. Eg. Frequency tables
Zelalem T. (PhD)
2/11/2023
1
5
Percentage
Whether the data are tabulated by
computer or by hand, it is useful to have
percentages and cumulative percentage.
❖ Table containing percentage and frequency
distribution is easier to interpret.
❖ Percentages are useful for comparing the
trend over time or among categories
Zelalem T. (PhD)
2/11/2023
1
6
Data Presentation
Data in raw form:
❖ are usually not easy to use for decision making
Data presentation:
❖ The process of transforming a mass of raw data into
tables and charts-as a part of making sense of the
data.
Zelalem T. (PhD)
2/11/2023
1
7
Data Presentation
Variables are
categorical
Zelalem T. (PhD)
2/11/2023
1
9
Tables
Person Mode of Person Mode of
travel travel
Example for tabulation of
1 car 11 car
data: A small survey was
2 car 12 bus
carried out into the mode
3 bus 13 walk
of travel to work by taking
4 car 14 car
a random sample of 20
5 walk 15 train employed adults.
6 cycle 16 bus
7 car 17 car How would you classify
8 cycle 18 cycle this data
9 bus 19 car
10 train 20 car
Tables
This data is categorical
(nominal) since mode of
travel does not have a Mode of f Relative
numerical value. This Travel frequency %
information would be better Car 9 45
Bus 4 20
displayed as a frequency
Cycle 3 15
table: Walk 2 10
❑ Frequency: the number of times Train 2 10
each category appeared Total 20 100
Categorical Data
Tabulating Graphing
Frequency Distribution
Table Bar Chart Pie Chart
It includes:
• Bar charts,
• Histogram
• Line Graphs,
• frequency polygon
Zelalem T. (PhD)
2/11/2023
2
3
Zelalem T. (PhD)
2/11/2023
2
7
Zelalem T. (PhD)
2/11/2023
2
8
Pie Chart
Presents data as segments of the whole pie.
Each category is represented by a segment
of a circle.
The segments are presented in terms of
percentages.
The size of each segment reflects the
frequency of that category and can be
represented as an angle.
Zelalem T. (PhD)
2/11/2023
2
9
Pie Chart ….
Sales by quarter for three territories:
Zelalem T. (PhD)
2/11/2023
3
0
Data Presentation
Zelalem T. (PhD)
2/11/2023
3
1
Histogram
A graph of the data in a frequency distribution is called a
histogram
The interval endpoints are shown on the horizontal axis
the vertical axis is either frequency, relative frequency, or
percentage
Bars of the appropriate heights are used to represent the
number of observations within each class
Zelalem T. (PhD)
2/11/2023
Histogram an Example
Zelalem T. (PhD)
3
3
Activity
Sales by department
Create:
1. A pie chart for sales in 2009
2.A simple bar chart of total sales for each of the three years
3.A multiple bar chart for sales by year and department
4.A component bar chart
Zelalem T. (PhD)
2/11/2023
3
4
Zelalem T. (PhD)
2/11/2023
3
5
Measures of Central Tendency
Zelalem T. (PhD)
2/11/2023
3
6
2/11/2023
3
7
Zelalem T. (PhD)
2/11/2023
3
8
Mean
Also known as arithmetic average
The most common measure of central
tendency
The average of all values in a set of data
Calculated by adding all the values in the
group and then dividing by the number of
values
Helps to summarizing the essential features
and enables comparison
Zelalem T. (PhD)
2/11/2023
3
Mean
9
X 8 9 10 11 12 13 15 16 17 18
14
Mean
-2
+4
-2
-6 -2 -1 +2 +3 +4
The sum of the positive and negative deviations are always zero.
• The mean as a balance point.
• Since the mean is determined by the value of every score, it is the preferred measure
of central tendency.
Zelalem T. (PhD)
2/11/2023
4
0
Cont’d
Median Mode
Is the value of the middle Mode is the frequently
item of series when it is occurring value in a series
arranged in ascending or - maximum frequency.
descending order.
The mode in a distribution
It divides the series into is that item around which
two half there is maximum
It is positional average concentration
Zelalem T. (PhD)
2/11/2023
4
1
The arithmetic mean is the average of all the values under consideration
4 60,000
Total 300,000
Zelalem T. (PhD)
2/11/2023
4
2
Zelalem T. (PhD)
2/11/2023
4
3
Measures of Dispersion
After identifying the typical value of a variable the
researcher can measure how the value of an item is
scattered around the true value of the mean.
Zelalem T. (PhD)
2/11/2023
4
4
a. Measures of Dispersion
An average can represent a series only as best as a
single figure can, but it certainly cannot reveal the
entire story of any phenomenon under study
Zelalem T. (PhD)
2/11/2023
4
5
a. Measures of dispersion
In order to measure the degree of scatter, the
statistical device called measures of dispersion are
calculated
❖ It shows the difference b/n the highest value and the lowest
value, hence it is the weakest measure of dispersion
2/11/2023
4
6
cont’d
Mean deviation
❖ First calculate the mean, then deduct the mean from each
value in the group and divide the result by the number of
values
Variance
❖ First calculate the mean, then deduct the mean from each
value in the group, square the result and divide the result by
the number of values
Zelalem T. (PhD)
2/11/2023
4
7
cont’d
Standard Deviation
2/11/2023
4
8
Zelalem T. (PhD)
2/11/2023
Example
Scores score-mean (score-mean)2
8 8-7= 1 (1)= 1
11 11-7= 4 (4)= 16
6 6-7= -1 (-1)= 1
7 7-7= 0 (0)= 0
5 5-7= -2 (-2)= 4
9 9-7= 2 (2)= 4
5 5-7= -2 (-2)= 4
9 9-7= 2 (2)= 4
9 9-7= 2 (2)= 4
1 1-7= -6 (-6)= 36
Mean=70/10=7
5
0
Justification for Dispersion measures
Zelalem T. (PhD)
2/11/2023
5
1
Zelalem T. (PhD)
2/11/2023
5
2
10
9
8
7
Frequency
6
5
4
3
2
1
0
1 2 3 4 5 6 7 8 9
Zelalem T. (PhD)
2/11/2023
5
3
Cont’d
The shape of the distribution is said to be skewed if the
observations are not symmetrically distributed around the
center.
A positively skewed distribution (skewed to the right) has a
tail that extends to the right in the direction of positive
values. In this case, mean> median >mode
A negatively skewed distribution (skewed to the left) has a
tail that extends to the left in the direction of negative
values. In this case, mean< median <mode
Positively Skewed Distribution Negatively Skewed Distribution
12 12
10 10
8
Frequency
8
Frequency
6 6
4
4
2
2
0
0
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9
Zelalem T. (PhD)
2/11/2023
5
4
Inferential Statistics
Involves using obtained sample statistics to estimate
the corresponding population parameters
Zelalem T. (PhD)
2/11/2023
5
5
Cont’d
Zelalem T. (PhD)
2/11/2023
5
6
2/11/2023
5
7
Measures of relationship
Zelalem T. (PhD)
2/11/2023
5
8
Measures of relationship
Magnitude
and
Direction
Zelalem T. (PhD)
2/11/2023
5
9
Measures of relationship
Zelalem T. (PhD)
2/11/2023
6
0
Measures of relationship
Is there any cause and effect (causal
relationship) between two variables or between
one variable on one side and two or more
variables on the other side?
2/11/2023
6
1
Zelalem T. (PhD)
2/11/2023
6
2
Zelalem T. (PhD)
2/11/2023
6
3
Inferential
• T-Tests (2 groups)
• ANOVA (F-ratio)
Zelalem T. (PhD)
2/11/2023
6
4
Statistical Significance
Following the sampling-theory approach, we accept or
reject a hypothesis on the basis of sampling information
alone.
Zelalem T. (PhD)
2/11/2023
6
5
Statistical Significance
Zelalem T. (PhD)
2/11/2023
6
6
This relates to area (or probability) in the tail of the distribution being
used for the test.
This area is called the critical region, and if the test statistic lies in the
critical region, you would infer that the result is unlikely to have
occurred by chance. You would then reject the null hypothesis.
For example, if the 5% level of significance was used and the null
hypothesis was rejected, you would say that H0 had been rejected at the
5% (or the .005) significance level, and the result was significant.
Zelalem T. (PhD)
2/11/2023
6
7
This relates to area (or probability) in the tail of the distribution being
used for the test.
This area is called the critical region, and if the test statistic lies in the
critical region, you would infer that the result is unlikely to have
occurred by chance. You would then reject the null hypothesis.
For example, if the 5% level of significance was used and the null
hypothesis was rejected, you would say that H0 had been rejected at the
5% (or the .005) significance level, and the result was significant.
Zelalem T. (PhD)
2/11/2023
6
8
Statistical Significance cont’d
Zelalem T. (PhD)
2/11/2023
6
9
Hypothesis
Hypothesis-Testing Procedures
❖ Select an appropriate test statistic.
❖ Establish the level of significance (e.g., = .05).
❖ Compute test statistic with actual data
❖ Determine degrees of freedom for the test
statistic.
❖ Obtain a tabled value for the statistical test.
❖ Compare the test statistic to the tabled value.
❖ Make decision to accept or reject null hypothesis
Zelalem T. (PhD)
2/11/2023
7
0
❖ Chi-square test
Zelalem T. (PhD)
2/11/2023
7
1
t-Test
Zelalem T. (PhD)
2/11/2023
7
2
Zelalem T. (PhD)
2/11/2023
7
3
Zelalem T. (PhD)
2/11/2023
7
4
Zelalem T. (PhD)
2/11/2023
7
5
Zelalem T. (PhD)
2/11/2023
7
6
Zelalem T. (PhD)
2/11/2023
7
7
Aware 60 60
Unaware 40 40
Total 100 100
This frequency distributions/one-way frequency table, or
one dimensional table from a sample of 100, suggest that the
majority of the population (60 percent) is aware of [the
brand]. Is the observed difference the result of chance
variation or is it statistically significant?
X2 =4
Zelalem T. (PhD)
2/11/2023
7
8
Cont’d
Zelalem T. (PhD)
2/11/2023
7
9
Activity
Aware 60
Unaware 40
Total 100
Zelalem T. (PhD)
2/11/2023
Zelalem T. (PhD)
2/11/2023
80