Unit-5: Data Analysis
Unit-5: Data Analysis
Unit-5
Data Analysis
Data Analysis
Tabular representation
Frequency Table:
Construction of frequency distribution
1
24-Apr-19
Frequency Table:
Categories Frequency
Poor 2
Below Average 3
Average 5
Obtained on
Above Average 9
NOMINAL or
Excellent 1 ORDINAL Scale
Total 20
Graphical representation
BAR CHART
Size of different
• BAR CHART: classes can be
4 Compared
– A graph in which
Class Frequencies
2
24-Apr-19
Graphical representation
HISTOGRAM
• HISTOGRAM:
– A graph in which Auto Parts Price
Frequenc 18
Class Frequencies
y 16
• the classes with no gap 14
between them are 50 Up to 59 2
12
10
reported on the horizontal
60 Up to 69 13 8
axis and 6
70 Up to 79 16
• the class frequencies on 4
2
the vertical axis. 80 Up to 89 7
0
Graphical representation
PIE CHART
• PIE CHART Account Amo Angle
unt
– A chart that (in
• shows the proportion or Rs.)
percentage of each class Food 3000 1500
out of the total number of (A)
frequencies in a circle Rent 800 400
(B)
segments is pie chart
Education 1200 600
– The size of each pie is a (C)
record of the percentage Savings 1500 750
of a class. (D)
Miscellaneo 700 350
us
Total 7200 3600
3
24-Apr-19
Testing Hypothesis
What is a Hypothesis?
“The unproved statement suggesting an assumption
about parameter related to a target population with an
objective to evaluate the claim or speculation. ”
4
24-Apr-19
Null Hypothesis
Null Hypothesis
Example
Researcher is interested in knowing the proportion of consumer who
purchased a brand before and after advertisement. State the Null
hypothesis in this case.
Ho = There is no difference in the proportion of consumer who purchased a
brand before & after advertisement
Ho = p1 = p2
This statement is same as saying the proportion of consumer before &
after the advertisement is same
ALTERNATIVE HYPOTHESIS
• The conclusion which is accepted as an option because the
null hypothesis is not accepted is termed as ALTERNATIVE
HYPOTHESIS
• Designated as H1
• If we want to see whether the mean age of class is 21 years
• Ho= The mean age of the class is not different from 21
years
Ho : μ = 21
• H1 =The mean age of the class is different from 21 years
H1 : μ ǂ 21
NON-DIRECTIONAL H1:TWO TAILED TEST DIRECTIONAL H1 : ONE TAILED TEST
H1 :μ ǂ21 H1 :μ >21 OR H1 :μ <21
Pankaj Chauhan on Business Research Methods 29
5
24-Apr-19
Level of Significance
6
24-Apr-19
Points of
Parametric Non-Parametric
Difference
Assumed Distribution Bell Shaped (Normal) Distribution free
Type of data scaled Ratio & Interval scaled Nominal & ordinal scaled
7
24-Apr-19
Formula
Pankaj Chauhan on Business Research Methods 36
α = 5 % or 5/100 or 0.05 | It is a case of two tail test so divide ‘α’ into two parts
α/2= 0.025
Degree of freedom = (n-1)
=(14-1) or =13
Critical Value: at α/2=0.025 and df =13 the critical value is 2.16
Calculation of t-statistic
=(17.85-18.8)/(1.955/sqrt 14)
= -0.65/0.522
= -1.24
Conclusion:
Null Hypothesis is not rejected as calculated value is lesser than
critical value.
There is no significant deviation in the breaking strength of the rods:
8
24-Apr-19
Population Average income is $24000. Sample Size = 200 , Sample Mean = $17000
and Standard deviation of $5000. Statistical test: z- test
Standard error
Standard deviation of sample divided by square root of (n-1)
Decision Rule:
Since it is a two tailed test we take half of ‘α’ (5%). α/2=0.025, which is
the part of each tail.
To find appropriate value from table subtract 0.025 from 0.5
(0.5 – 0.025=0.475) Now see this value in the Z-table & add the Decimal
value with respective Z-value. It will be 1.9+.06= ±1.96
The decision rule will be to reject the H0 and not reject the alternate if
computed value of Z- does not fall within the region of -1.96 and +1.96
Now calculate the Z-Value
Std. Error will be $5000/sqrt(200-1) = $ 354.61
Z= ($17000 - $24000)/$354.61
=-19.74 Pankaj Chauhan on Business Research Methods 40
Statistical Conclusion:
Since the calculated value of Z is larger than ±1.96 so
null hypothesis would be rejected.
It means the sample mean lies more than 1.96 standard
deviation from the true population mean.
9
24-Apr-19
State Conclusion
Reject the null hypothesis. The claim that 9 out of 10 doctors recommend aspirin
for their patients is not accurate.
Chi-Square Test
10
24-Apr-19
Chi-Square Test
• The chi-square is one of the widely used Non
parametric test
• Makes no assumption about distribution of population
• Use data at the nominal scale
• Chi-square describe the magnitude of discrepancy
between actually observed frequency and expected
frequency.
Where, ‘O’ is the observed frequency, and ‘E’ is the expected frequency
11
24-Apr-19
IBM 47
Apple 36
Other 17
Since calculated value (13.820) is greater than 5.991, we reject the null hypothesis:
there is no significant differences between the observed and expected frequencies.
Pankaj Chauhan on Business Research Methods
12
24-Apr-19
13
24-Apr-19
Since calculated value 7.58 is greater than 5.991, we reject the null hypothesis:
there is no significant differences between the observed and expected frequencies.
Pankaj Chauhan on Business Research Methods
14
24-Apr-19
Null hypothesis
Small Medium Large Totals
Alternative hypothesis
Female 10 14 6 30
Male 4 1 1 6
The alpha level = 5% or .05 Totals 14 15 7 36
(O-
Observed Expected (O-E)2/E Observed Expected (O-E)2/E Observed Expected
E)2/E
Femal
10 11.667 0.238 14 12.500 0.180 6 5.833 0.005 30
e
Male 4 2.333 1.191 1 2.500 0.900 1 1.167 0.024 6
14 14 15 15 7 7
Total 36
Since the calculated value 2.538 is not greater than 5.991, we fail to reject the null hypothesis
15
24-Apr-19
One-Way ANOVA
The one-way analysis of variance is used to test the
claim that three or more population means are
equal.
Conditions or Assumptions
• The data are randomly sampled
• The variances of each sample are assumed equal
• The residualsPankaj
are normally
Chauhan on distributed
Business Research Methods 60
One-Way ANOVA
• A classroom is divided into three rows: front, middle, and
back. The instructor noticed that the far away the students
were from him, the more likely they were to less engage in
class. He wanted to see if the students far away did worse
on the exams. A random sample of the students in each row
was obtained. The score for those students on the second
exam was recorded
– Front: 82, 83, 97, 93, 55, 67, 53
– Middle: 83, 78, 68, 61, 77, 54, 69, 51, 63
– Back: 38, 59, 55, 66, 45, 52, 52, 61
The summary statistics for the grades of each row are shown in the table below
Row Front Middle Back
Sample size 7 9 8
Mean 75.71 67.11 53.50
St. Dev 17.63 10.95 8.96
Variance 310.90 119.86 80.29
The null hypothesis is that the population means are all equal
μ1 = μ2= μ3
The alternative hypothesis is that at least one of the means is different
16
24-Apr-19
x1
Calculate Critical Value Calculation
Mean of x2
df MSB (for numerator)=(k-1)
each sample x3 df MSW (for denominator)=(N-K)
dfMSB=3-1=2
Calculate dfMSW=24-3=21
Mean of X1 X 2 X 3 @5%, Critical value = 3.47
sample mean X
k
Measure the Variation in ANOVA
MSB=SSB/(k-1) | MSW=SSW/(N-K)
Variation between samples: SSB
2 2 2
n1 x1 X n 2 x 2 X n3 x3 X
MSB
F Ratio
Variation within sample: SSW MSW
x
x1 x2i x2 x3i x3
2 2 2
1i
One-Way ANOVA
Source SS df MS F Critical Value
Two-Way ANOVA
Two way ANOVA is also known as completely randomized block design. It
focuses on:
• One independent variable
• A blocking variable
• Block – There is a variable that researcher wants to control to reduce error is called as blocking var
• Complete - Every sampling unit is distributed within total block.
• Randomized - Subjects are randomly assigned within block
There will be two F-Ratio and each will be compared with corresponding critical
value
17
24-Apr-19
n xj X
df- Nu-Block= (n-1)
SSC= df-Deno for both=(n-1)(C-1)
j i
Row Mean Grand Mean
No. of Treatment level (Col.) Mean of row or col. Mean
n 2
SSR= C xi X
i 1
n c 2
SSE= C
i 1
x
j 1
ij x j xi X
Grand Mean
MSC=SSC/(C-1) Mean of row or col. Mean
MSR=SSR/(n-1) No. of Row Row Mean
MSE=SSE/(C-1)(n-1) Col. Mean
Single Value
F-Treatment=MSC/MSE
F-Block=MSR/MSE Pankaj Chauhan on Business Research Methods 66
Total 0.06330 29
18
24-Apr-19
19
24-Apr-19
20
24-Apr-19
21
24-Apr-19
Article
Last Name, first name. Year. “Article title.” Journal Name Volume: 1st Page- Last Page.
Lee, James Daniel. 2005. “Do Girls Change More than Boys? Gender Differences and
Similarities in the Impact of New Relationships on Identities and Behaviors.”
Self and Identity 4:131-47.
Chapter
Last Name, first name. Year. “Chapter Name.” Pages in the book in Book Name, edited
by first name last name. City of Publisher: Publisher.
Book:
Last name, first name. Year. Book Name. City of Publisher: Publisher.
22