0% found this document useful (0 votes)
7 views

Chapter 10 Data Analysis-Quantitative

Uploaded by

Naty Dereje
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Chapter 10 Data Analysis-Quantitative

Uploaded by

Naty Dereje
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 93

Chapter 10

Data Analysis-Quantitative

1
Data Analysis

Quantitative Analysis

In this chapter, students will be introduced to


the techniques of quantitative data analysis
and interpretation of the result.

2
Data Analysis
Descriptive Statistics
• Numeric data collected in a research project can be
analyzed quantitatively using statistical tools in two
different ways.
• Descriptive analysis refers to statistically describing,
aggregating, and presenting the constructs of interest
or associations between these constructs.
• Inferential analysis refers to the statistical testing of
hypotheses (theory testing).
• Much of today’s quantitative data analysis is conducted
using SPREAD SHEET software programs such as SPSS
or SAS or WEKA,…
• Readers are advised to familiarize themselves with one
of these programs for understanding the concepts
described in this discussion. 3
Data Analysis
Data Preparation
• In research projects, data may be collected
from a variety of sources:
• Mail-in surveys,
• Interviews, pretest or posttest
experimental data,
• Observational data, and so forth.
• This data must be converted into a machine-
readable, numeric format, such as in a
spreadsheet or a text file, so that they can be
analyzed by computer programs like MS
EXCEL or SPSS or SAS or WEKA…

4
Data Analysis
Data preparation usually follows the following steps.
Data Coding
• Coding is the process of converting data into numeric
format.
• A codebook should be created to guide the coding
process.
• A codebook is a comprehensive document containing :
• Detailed description of each variable in a research
study,
• Items or measures for that variable,
• The format of each item (numeric, text, etc.),
• The response scale for each item (i.e., Whether it
is measured on a nominal, ordinal, interval, or
ratio scale;
• Whether such scale is a five-point, seven-point, or
some other type of scale), and
• How to code each value into a numeric format.
5
Data Analysis
For Instance,
• If we have a measurement item on a
seven-point Likert scale with anchors
ranging from “strongly disagree” to
“strongly agree”, we may code that item as
1 for strongly disagree, 4 for neutral, and
7 for strongly agree.

6
Data Analysis
Data Entry
• Coded data can be entered into a spreadsheet,
database, text file, or directly into a statistical
program like SPSS.
• Most statistical programs provide a data editor for
entering data.
• However, these programs store data in their own
native format (e.g., SPSS stores data as .sav files),
which makes it difficult to share that data with
other statistical programs.
• Hence, it is often better to enter data into a
spreadsheet or database, where they can be
reorganized as needed, shared across programs,
and subsets of data can be extracted for analysis.
7
Data Analysis
Univariate Analysis
• Univariate Analysis, or analysis of a single
variable, refers to a set of statistical
techniques that can describe the general
properties of one variable.
Univariate statistics include:
1. Frequency distribution,

2. Central tendency, and

3. Dispersion

8
Data Analysis
i) Frequency Distribution
• The frequency distribution of a variable is a summary of
the frequency (or percentages) of individual values or
ranges of values for that variable.
For Instance,
• We can measure how many times a sample of respondents
attend religious services (as a measure of their
“religiosity”) using a categorical scale:
• Never,
• Once per year,
• Several times per year,
• About once a month,
• Several times per month,
• Several times per week, and
• An optional category for “did not answer.”

9
Data Analysis
Figure 10.1: Frequency Distribution

10
Data Analysis
• With very large samples where observations
are independent and random, the frequency
distribution tends to follow a plot that looked
like a bell-shaped curve (a smoothed bar chart
of the frequency distribution).
• Most observations are clustered toward the
center of the range of values, and fewer and
fewer observations toward the extreme ends
of the range.
• Such a curve is called a normal
distribution.

11
Data Analysis

12
Data Analysis
ii. Central Tendency
• It is an estimate of the center of a distribution of
values.
• There are three major estimates of central
tendency: Mean, Median, and Mode.
• The arithmetic mean (often simply called
the “mean”) is the simple average of all
values in a given distribution.
• Consider a set of eight test scores: 15, 22,
21, 18, 36, 15, 25, and 15.
• The arithmetic mean of these values is
(15 + 20 + 21 + 20 + 36 + 15 + 25 + 15)/8 =
20.875.
13
Data Analysis
• Geometric Mean (nth root of the product of n
numbers in a distribution) and
• Harmonic Mean (the reciprocal of the arithmetic
means of the reciprocal of each value in a
distribution).
• However, these types of means are not popular in
statistical analysis of social research data.

• GM= n
X 1 X 2 ... X n

( x
1
N )
• HM= i

14
Data Analysis
The Median
• The second measure of central tendency, the
median, is the middle value within a range of
values in a distribution.
• This is computed by sorting all values in a
distribution in increasing order and
selecting the middle value.
• In case there are two middle values (if
there is an even number of values in a
distribution), the average of the two
middle values represent the median.
• In the above example, the sorted values
are: 15, 15, 15, 18, 22, 23, 25, 36.
• The two middle values are 18 and 22, and
hence the median is (18 + 22)/2 = 20.
15
Data Analysis
The Mode
• Lastly, the mode is the most frequently
occurring value in a distribution of values.
• In the previous example, the most
frequently occurring value is 15, which is
the mode of the above set of test scores.
• Note that any value that is estimated
from a sample, such as mean, median,
mode, or any of the later estimates are
called a statistic.

16
Data Analysis
iii) Dispersion
• Dispersion refers to the way values are
spread around the central tendency, for
example, how tightly or how widely the
values are clustered around the mean.
• Two common measures of dispersion
are:
• the range and
• standard deviation.

17
Data Analysis
a) Range
• The range is the difference between the
highest and lowest values in a distribution.
• The range in our previous example is 36-15 =
21.
• If the maximum value is raised to 85 in the
above distribution while the other vales
remained the same, the range would be 85-15 =
70.
18
Data Analysis
b) Standard Deviation
• Standard deviation, the second measure of
dispersion, corrects for such outliers by using a
formula that takes into account how close or
how far each value from the distribution mean:
• σ is the standard deviation,
• xi is the ith observation (or value),
• μ is the arithmetic mean,
• n is the total number of observations,
and
• Σ means summation across all
observations.
19
Data Analysis

20
Data Analysis
c)Variance
• The square of the standard deviation is called the
variance of a distribution.
• In a normally distributed frequency distribution, it
is seen that
• 68% of the observations lie within one
standard deviation of the mean (μ + 1σ),
• 95% of the observations lie within two
standard deviations (μ + 2 σ), and
• 99.7% of the observations lie within three
standard deviations (μ + 3 σ).
21
Data Analysis
Bivariate Analysis
• Bivariate Analysis examines how two
variables are related to each other.
• The most common bivariate statistic is the
bivariate correlation (often, simply called
“correlation”), which is a number between
-1 and +1 denoting the strength of the
relationship between two variables.
• Let’s say that we wish to study how age is
related to self-esteem in a sample of 20
respondents, i.e., as age increases, does self-
esteem increase, decrease, or remains
unchanged.
22
Data Analysis
• If self-esteem increases, then we have a
positive correlation between the two variables,
• if self-esteem decreases, we have a negative
correlation, and
• if it remains the same, we have a zero
correlation.
• To calculate the value of this correlation, consider
the hypothetical dataset shown below.

23
Hypothetical Data on Age and Self-Esteem

24
Data Analysis

• The two variables in this dataset are age (x) and


self-esteem (y).
• Age is a ratio-scale variable, while self-
esteem is an average score computed
from a multi-item self-esteem scale
measured using a 7-point Likert scale,
ranging from “strongly disagree” to
“strongly agree.”

• The histogram of each variable is shown


on the left side of Figure below.
25
Data Analysis
The formula for calculating bivariate correlation
is:

26
Data Analysis
• Where rxy is the correlation, x and y are the
sample means of x and y, and sx and sy are
the standard deviations of x and y.
• The manually computed value of correlation
between age and self-esteem, using the
above formula is 0.79.
• This figure indicates that age has a
strong positive correlation with self-
esteem, i.e., self-esteem tends to
increase with increasing age, and
decrease with decreasing age.
27
Data Analysis

28
Data Analysis
• The bivariate scatter plot in the right panel of is
essentially a plot of self-esteem on the vertical axis
against age on the horizontal axis.
• This plot roughly resembles an upward sloping line (i.e.,
positive slope), which is also indicative of a positive
correlation.
• If the two variables were negatively correlated, the
scatter plot would slope down (negative slope), implying
that an increase in age would be related to a decrease in
self-esteem and vice versa.
• If the two variables were uncorrelated, the scatter plot
would approximate a horizontal line (zero slope),
implying than an increase in age would have no
systematic bearing on self-esteem.
29
Data Analysis

• After computing bivariate correlation,


researchers are often interested in knowing
whether the correlation is significant (i.e., a
real one) or caused by mere chance.

• Answering such a question would require


testing of the following hypothesis:

• H0: r = 0
• H1: r ≠ 0
30
Data Analysis

• H0 is called the null hypotheses, and H1 is called


the alternative hypothesis (sometimes, also
represented as Ha) or
• The hypothesis that we actually want to test
(i.e., whether the correlation is different from
zero).

• Although they may seem like two


hypotheses, H0 and H1 jointly represent
a single hypothesis since they are
opposites of each other.
31
Data Analysis
• Also note that H1 is a non-directional
hypotheses since it does not specify whether r
is greater than or less than zero.
• Directional hypotheses will be specified as H0:
r ≤ 0;
• H1: r > 0 (if we are testing for a positive
correlation).
• Significance testing of directional
hypothesis is done using a one-tailed t-
test, while that for non-directional
hypothesis is done using a two-tailed t-
test.
32
Data Analysis
• In statistical testing, the alternative hypothesis
cannot be proven directly or conclusively.

• Rather, it is indirectly proven by rejecting the null


hypotheses with a certain level of probability.

• Statistical testing is always probabilistic, because


we are never sure if our inferences, based on
sample data, apply to the population, since our
sample never equals the population.
33
Data Analysis
• The probability that a statistical inference is caused
by pure chance is called the p-value.
• The p-value is compared with the significance level
(α), which represents the maximum level of risk that
we are willing to take that our inference is incorrect.
• For most statistical analysis, α is set to 0.05.
• A p-value less than α=0.05 indicates that we
have enough statistical evidence to reject the
null hypothesis, and thereby, indirectly accept
the alternative hypothesis.
• If p>0.05, then we do not have adequate
statistical evidence to reject the null
hypothesis or accept the alternative
hypothesis. 34
Data Analysis
• A correlation matrix is a matrix that lists the
variable names along the first row and the first
column, and depicts bivariate correlations
between pairs of variables in the appropriate cell
in the matrix.

• The values along the principal diagonal


(from the top left to the bottom right
corner) of this matrix are always 1, because
any variable is always perfectly correlated
with itself.
35
Data Analysis

36
Data Analysis
Cross-Tab/Contingency Table

• Another useful way of presenting bivariate data is


cross-tabulation (often abbreviated to cross-tab,
and sometimes called more formally as A
CONTINGENCY TABLE).

• A cross-tab is a table that describes the


frequency (or percentage) of all combinations of
two or more nominal or categorical variables.
• As an example, let us assume that we
have the following observations of gender
and grade for a sample of 20 students, as
shown in the Figure on next page.
37
Data Analysis
• Gender is a nominal variable (male/female or M/F),
and grade is a categorical variable with three levels
(A, B, and C).
• A simple cross-tabulation of the data
may display the joint distribution of
gender and grades (i.e., how many
students of each gender are in each
grade category, as a raw frequency
count or as a percentage) in a 2 x 3
matrix.

38
Data Analysis

39
Data Analysis
 Is this pattern real or “statistically
significant”?
 In other words, do the above frequency counts
differ from that that may be expected from
pure chance?
 To answer this question, we should
compute the expected count of
observation in each cell of the 2 x 3
cross-tab matrix.
 This is done by multiplying the marginal
column total and the marginal row
total for each cell and dividing it by the
total number of observations.

40
Data Analysis
For Example,
• For the male/A grade cell, expected
count = 5 * 10 / 20 = 2.5.

• In other words, we were expecting 2.5


male students to receive an A grade, but
in reality, only one student received the
A grade.

41
Data Analysis

 Whether this difference between expected and actual


count is significant can be tested using a chi-square test.
 The chi-square statistic can be computed as the average
difference between observed and expected counts across
all cells.
 We can then compare this number to the critical value
associated with a desired probability level (p < 0.05) and
the degrees of freedom, which is simply (m-1)*(n-1), where
m and n are the number of rows and columns respectively.
 In this example, df = (2 – 1) * (3 – 1) = 2.
42
Data Analysis
• From standard chi-square tables in any
statistics book, the critical chi-square value for
p=0.05 and df=2 is 5.99. {table value}
• The computed chi-square value, based on our
observed data, is 1.00, which is less than the
critical value.
• Hence, we must conclude that the
observed grade pattern is not
statistically different from the pattern
that can be expected by pure chance.

43
Data Analysis
Inferential Statistics
• Inferential statistics are the statistical
procedures that are used to reach conclusions
about associations between variables.
• They differ from descriptive statistics in that
they are explicitly designed to test hypotheses.
• Numerous statistical procedures fall in this
category, most of which are supported by
modern statistical software such as SPSS and
SAS.
• Readers are advised to consult a formal text on
statistics or take a course on statistics for
more advanced procedures.
44
Data Analysis
Testing for Significance
• Having formulated the hypothesis, the next step is its
validity at a certain level of significance.
• The confidence with which a null hypothesis is accepted
or rejected depends upon the significance level.
• A significance level of say 5% means that the risk of
making a wrong decision is 5%.
• The researcher is likely to be wrong in accepting false
hypothesis or rejecting a true hypothesis by 5 out of
100 occasions.
• A significance level of say 1% means, that the
researcher is running the risk of being wrong
in accepting or rejecting the hypothesis is one
of every 100 occasions.
• Therefore, a 1% significance level provides
greater confidence to the decision than 5%
significance level.
45
Data Analysis
One-tailed and two-tailed tests
• A hypothesis test may be one-tailed or
two-tailed.
a) One Tailed Test
• In one-tailed test the test-statistic for
rejection of null hypothesis falls only in
one-tailed of sampling distribution
curve.
46
Data Analysis
Example 2
• A tyre company claims that mean
life of its new tyre is 15,000 km.

• Now the researcher formulates the


hypothesis that tyre life is = 15,000
km.

47
Data Analysis
b) A Two-tailed Test
• It is one in which the test statistics leading to
rejection of null hypothesis falls on both tails of
the sampling distribution curve as shown.

• When we should apply a hypothesis test that is


one-tailed or two-tailed depends on the nature of
the problem.
• One-tailed test is used when the researcher's
interest is primarily on one side of the issue.
• A two-tailed test is appropriate, when the
researcher has no reason to focus on one side of
the issue.
Example:
• Is the current advertisement less effective than
the proposed new advertisement"?
48
Data Analysis

a) Degree of Freedom

• It tells the researcher the number of


elements that can be chosen freely.
Example:

• a+ b/2 =5.
• Fix a=3, b has to be 7.
• Therefore, the degree of freedom is 1.

49
Data Analysis
b) Select Test Criteria
• If the hypothesis pertains to a larger
sample (30 or more), the Z-test is used.
• When the sample is small (less than 30),
the T-test is used.
C) Carry Out Computation

50
Data Analysis
d) Make Decisions

 Accepting or rejecting of the null hypothesis


depends on whether the computed value
falls in :

 the region of rejection at a given level of


significance.

51
Data Analysis
Assumptions about Parametric and Non-Parametric Test
1) Observations in the population are normally distributed.
2) Observations in the population are independent to each
other.
3) Population should posses' homogeneous characteristics.
4) Samples should be drawn using simple random sampling
techniques.
5) To use T test sample size should be less than 30.
6) To use F test sample size should be less than 30.
7) To use Z test sample size should be more than 30.
8) To use chi square minimum number of observation should
be 5.
52
Data Analysis
a) Parametric Test
• Parametric tests are more powerful.
• The data in this test is derived from interval and
ratio measurement.
• In parametric tests, it is assumed that the data
follows normal distributions. Examples of
parametric tests are (a) Z-Test, (b) T-Test and (c) F-
Test.
• Observations must be independent i.e., selection of
any one item should not affect the chances of
selecting any others be included in the sample.
53
Data Analysis
b) Non-Parametric Test
• Non-parametric tests are used to test the hypothesis with
nominal and ordinal data.
• We do not make assumptions about the shape of
population distribution.
• These are distribution-free tests.
• The hypothesis of non-parametric test is concerned with
something other than the value of a population parameter.
• Easy to compute. There are certain situations particularly
in marketing research, where the assumptions of
parametric tests are not valid. Example: In a parametric
test, we assume that data collected follows a normal
distribution. In such cases, non-parametric tests are used.
• Examples of non-parametric tests are:
a)Binomial test
b)Chi-Square test
c)Mann-Whitney U test
d)Sign test.
54
Data Analysis
Binomial test
 A binominal test is used when the population
has only two classes such as male, female;
buyers, non-buyers, success, failure etc.
 All observations made about the population
must fall into one of the two tests.
 The binomial test is used when the sample
size is small.
Advantages of non parametric test
i) They are quick and easy to use.

ii) When data are not very accurate, these tests


produce fairly good results.

55
Data Analysis
Disadvantages of non-parametric test
• Non-Parametric test involves the greater
risk of accepting a false hypothesis and thus
committing a Type 2 error.

56
Data Analysis
Examples of Parametric Tests
• T-test (Parametric test)
• T-test is used in the following circumstances:
When the sample size n<30.
Example:
• A certain pesticide is packed into bags by a
machine. A random sample of 10 bags is drawn
and their contents are found as follows:
50,49,52,44,45,48,46,45,49,45.Confirm whether
the average packaging can be taken to be 50 kgs.
The sample size is less than 30. Standard
deviations are not known using this test. We can
find out if there is any significant difference
between the two means i.e. whether the two
population means are equal.

57
Data Analysis
Illustration
• There are two nourishment programmes 'A' and 'B'.
Two groups of children are subjected to this. Their
weight is measured after six months.
• The first group of children subjected to the
programme 'A' weighed 44,37,48,60,41kgs. at
the end of programme.
• The second group of children were subjected to
nourishment programme 'B' and their weight
was 42, 42, 58, 64, 64, 67, 62 kgs at the end of
the programme. From the above, can we
conclude that nourishment programme 'B'
increased the weight of the children
significantly, given a 5% level of confidence?
58
Data Analysis

• H0: Null Hypothesis


• There is no significant difference between
Nourishment programme 'A' and 'B'.

• HA: Alternative Hypothesis


• Nourishment programme B is better
than 'A' or Nourishment programme 'B'
increase the children's weight
significantly.

59
Solution
Nourishment Programme A Nourishment Programme B

X Y

x-x mean (x-x-mean)2 (y-y mean)( y-y mean)2


= (x-46) =(y-57)

44 -2 4 42 -15 225

37 -9 81 42 -15 225

48 2 4 58 1 1

60 14 196 64 7 49

41 -5 25 64 7 49

67 10 100

62 5 25

230 310 399 674

60
Data Analysis
t=
Here,
n1= 5, n2 = 7
∑Y = 399

61
=

Data Analysis
=

∑Y = 399

=310

= 674

= 57
=

62
Data Analysis

S2 =

D.F = (n1+n2–2) = (5+7–2) = 10

S2 =

t=

63
Data Analysis

= = -1.89

t at 10 d.f. at 5% level is 1.81.


Since, calculated t is greater than 1.81, it is significant. Hence HA is
accepted. Therefore the two nutrition programmes differ significantly with
respect to weight increase.

64
Data Analysis
Analysis of Variance (ANOVA)
a)ANOVA
• It is a statistical technique.
• It is used to test the equality of three or
more sample means.
• Based on the means, inference is drawn
whether samples belongs to same
population or not.

65
Data Analysis
b) Conditions for using ANOVA
1. Data should be quantitative in nature.

2. Data normally distributed.

3. Samples drawn from a population


follow random variation.
(c) ANOVA can be discussed in two parts:
1. One-way classification

2. Two and three-way classification.


66
Data Analysis
One-way ANOVA
Following are the steps followed in ANOVA:

(a) Calculate the variance between samples.


(b) Calculate the variance within samples.
(c) Calculate F ratio using the formula.
F= Variance between the samples/ Variance
within the sample

67
Data Analysis
(d) Compare the value of F obtained above in (c)
with the critical value of F such as 5% level of
significance for the applicable degree of
freedom.

(e) When the calculated value of F is less than the


table value of F, the difference in sample means
is not significant and a null hypothesis is
accepted. On the other, in sample means is
considered as significant and the null hypothesis
is rejected.

68
Data Analysis
Example: ANOVA is useful:

• To compare the mileage achieved by different


brands of automotive fuel.

• Compare the first year earnings of graduates of


half a dozen top business schools.

69
Data Analysis
Two-way ANOVA
• The procedure to be followed to calculate
variance is the same as it is for the one-way
classification. The example of two-way
classification of ANOVA is as follows:
Example:
• A firm has four types of machines - A , B, C
and D. It has put four of its workers on each
machines for a specified period, say one
week. At the end of one week, the average
output of each worker on each type of
machine was calculated. These data are given
below:
70
Data Analysis
Worker Average Production by type of machine

A B C D

Worker 1 25 26 23 28

Worker 2 23 22 24 27

Worker 3 27 30 26 32

Worker 4 29 34 27 33

71
Data Analysis
The firm is interested in knowing:

• Whether the mean productivity of workers is


significantly different.

• Whether there is a significant difference in the


mean productivity of different types of
machines.

72
Data Analysis
Illustration
Company 'X' wants its employees to undergo three
different types of training programme with a view to
obtain improved productivity from them. After the
completion of the training programme, 16 new
employees are assigned at random to three training
methods and the production performance was
recorded. The training managers’ problem is to find
out if there are any differences in the effectiveness
of the training methods? The data recorded is as
under:

73
Data Analysis
Daily Output Of New Employees
Method 1 15 18 19 22 11

Method 2 22 27 18 21 17

Method 3 18 24 19 16 22 15

74
Data Analysis
Following steps are:

1) Calculate Sample mean i.e.


2) Calculate General mean i.e.
3) Calculate variance between columns
using the formula

Where,
K = (n1+n2+n3-3)

75
Data Analysis
4.Calculate sample variance. It is calculated using
formula:

Si2 =

Where n is No. of observation under each method.


5.Calculate variance within columns using the formula:

76
Data Analysis

6. Calculate F using the ratio F =

7. Calculate the number of degree of freedom in


the numerator F ratio using equation, d.f=
(No. of samples -1).

77
Data Analysis

8.Calculate the number of degree of freedom in


the denominator of F ratio using the Equation
d.f =

9. Refer to F table f8 and find value.

10. Draw conclusions

78
Solution
Method 1 Method 2 Method 3

15 22 24

18 27 19

19 18 16

22 21 22

11 17 15

18

85 105 114

79
Data Analysis

1. Sample mean is calculated as follows: = = 17, =


= 21, = = 19

2. Grand mean: = = = 19

3. Calculate variance between columns:

80
Data Analysis

N - n

5 17 19 -2 4 5X4 = 20

5 21 19 2 4 5X4 = 20

6 19 19 0 0 6X0 = 0

81
Data Analysis

= 20,

The variance between columns is 20

82
Data Analysis
Training Method -1 Training Method -2 Training Method -3

X- X- X-

15-17 -22 = 4 22-21 12 = 1 18-19 -12 = 1


18-17 -12 = 1 27-21 62 = 36 24-19 52 = 25
19-17 22 = 4 18-21 -32 =9 19-19 0 = 0
22-17 52 = 25 21-21 0= 0 16-19 -32 = 9
11-17 -62 = 36 17-21 -42 =16 22-19 32 = 9
15-19 -42 = 16

= 70 62

83
=

Data Analysis

4. Sample variance = 15.5

=12

2
So, s1 = 17.5, S22 = 15.5, s 32 = 12

84
Data Analysis

x17.5 + x12
5. Within column variance is

= 14.76
= 1.354
6. F =
=

85
Data Analysis

7. d.f of Numerator = (3 – 1) = 2.

8. d.f of Denominator = ∑n1-k = (5 - 1) + (5 - 1) + (6 - 1) = 16 - 3


= 13.

9. Refer to table using d.f = 2 and d.f = 13.

10.The value is 3.81. This is the upper limit of acceptance region. Since
calculated value 1.354 lies within it, we can accept H0, the null hypothesis.

86
Data Analysis

Conclusion:

 There is no significant difference in the


effect of the three training methods.

87
Statistical approaches
• Regression Analysis is a statistical procedure for analyzing
associative relationships between a metric dependent variable and
one or more independent variable.
• It can be used in the following ways:
• Determine whether the independent variable
explain a significant variations in the dependent
variable : whether a relationship exists
• Determine how much of the variation in the
dependent variable can be explained by the
dependent variables: strength of the relationship.
• Determine the structure or form of the
relationship : the mathematical equation relating
the independent and dependent variables.
• Predict the value of the dependent variable.
• Control for other independent variables when
evaluating the contribution of a specific variable or
set of variables.
88
Statistical approaches
Bivariate Regression : is a procedure for driving a mathematical
relationship , in the form of an equation , between a single metric
dependent variable and a single metric independent variable.
• This analysis is similar in manways to determining the
simple correlations between two variables.
• However, because an equation has to be derived , one
variable must be identified as the dependent variable
and the other as the independent variable.
• For Example,
• Can variation in sales be explained in terms of
variation in advertising expenditures? What is the
structure and form of this relationship , and can it be
modeled mathematically by an equation describing a
straight line?
• Can the variation in market share be accounted for by
the size of the sales force?
• Are consumers perceptions of quality determined by
their perceptions of price?
89
Statistical approaches
Bivariate regression Model the basic regression
equation is
 Yi=ß0 +B1Xi+ e i
 Where Y= dependent variable or criterion
variable , X= independent or predictor variable,
ß0=intercept of the line , ß1=slope of the line ,
and ei=the error term associated with ith
observation.

90
Statistical approaches
• Multiple Regression is a technique that simultaneously
develops a mathematical relationship between two or more
independent variables and an interval –scaled dependent
variable.
• Can variations in sales be explained in terms of
variations in advertising expenditure, prices, and
level of distribution?
• Can variation in market share be accounted for by
the size of the sales force, advertising expenditure
, and sales promotion budget?
• Are consumers perceptions of quality determined by
their perceptions of prices , brand image, and brand
attributes?

91
Statistical approaches

• Multiple regression Model- is an equation used to

explain the results of multiple regression analysis.

• Y= Yi=ß0 +B1X1+ ß2X2+ ß3X3+ …ßkXk +e

• Residual is the difference between the observed value of

Yi and the value predicted by the regression equation, Ýi.

• Multicollinearity is a state of very high intercorrelations

among independent variables.


92
END OF CHAPTER 10-Quantitative

93

You might also like