0% found this document useful (0 votes)
3 views

Topic 3 Data Processing_bus 221(0)

The document outlines the processes of data processing and analysis, including editing questionnaires, coding responses, and creating data files for analysis using software like SPSS. It details various data analysis techniques such as descriptive, bivariate, and multivariate analysis, along with guidance on selecting appropriate statistical tests based on research questions and data types. Additionally, it covers hypothesis testing, including types of errors that can occur during the testing process.

Uploaded by

simkokoshadrack
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Topic 3 Data Processing_bus 221(0)

The document outlines the processes of data processing and analysis, including editing questionnaires, coding responses, and creating data files for analysis using software like SPSS. It details various data analysis techniques such as descriptive, bivariate, and multivariate analysis, along with guidance on selecting appropriate statistical tests based on research questions and data types. Additionally, it covers hypothesis testing, including types of errors that can occur during the testing process.

Uploaded by

simkokoshadrack
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 130

DATA PROCESSING & ANALYSIS

DATA PROCESSING
• EDITING QUESTIONNAIRE
• CODING
• CREATING DATA FILE IN A COMPUTER
DATA ANALYSIS TECHNIQUES
• DESCRIPTIVE ANALYSIS
– UNIVARIATE ANALYSIS (ONE VARIABLE)
• BIVARIATE ANALYSIS (TWO VARIABLES)
• MULTIVARIATE ANALYSIS (MORE THAN TWO
VARIABLES)
Assignment..
• Download a Questionnaire from the E-learning
• Prepare a code book for the questionnaire
• Use the questionnaire and collect data from at
least ten respondents
• Download and Install SPSS in your computer
• Download a book called: Survival Manual for SPSS
• Use the Manual to learn about Data Processing
using SPSS
Editing Data
• Checking and adjusting responses in the completed
questionnaires
• Purpose for editing:
• For consistency among responses
• For completeness in responses
• To facilitate the coding process
Editing - Checking Questionnaire
A questionnaire returned from the field may be unacceptable
for several reasons.
– Parts of the questionnaire may be incomplete.
– The pattern of responses may indicate that the respondent
did not understand or follow the instructions.
– The responses show little variance.
– One or more pages are missing.
– The questionnaire is answered by someone who does not
qualify for participation.
Treatment of Unsatisfactory Results
•Returning to the Field – The questionnaires with
unsatisfactory responses may be returned to the field, where
the interviewers re-contact the respondents.
•Assigning Missing Values – If returning the questionnaires to
the field is not feasible, the editor may assign missing values
to unsatisfactory responses. (During data analysis)
•Discarding Unsatisfactory Respondents – In this approach,
the respondents with unsatisfactory responses are simply
discarded.
Coding
• Coding means assigning a code, usually a number,
to each possible response to each question.
• Codebook: a summary of the instructions you will
use to convert the information obtained from each
subject or case into a format that computer (SPSS
or any software) can understand.
• Preparing the codebook involves:
– defining and labelling each of the variables; and
– assigning numbers to each of the possible responses
CODING- EXAMPLE
Question for respondents Coding
• Which Programme do you • Variable name
study? (SELECT ONE) – Programme Type
– BAF-BS • Coding Instructions
– BAF-PS 1 = BAF-BS
– BBA-MM 2 = BAF-PS
– BBA-EIM 3= BBA-MM
– BPSCM 4 = BBA-EIM
5 = BPSCM
Coding
• Code the following Questions:
• Indicate your sex (MALE/FEMALE)
• Have you ever worked before? (YES/NO)
• Indicate your age…………….
• How do you rate yourself in terms of Self-
esteem (HIGH/MEDIUM/LOW)
• What is your education level
(University/Colleges/Secondary/Primary)
Examples of Coding…
Age Political Affiliation
 1=1  CCM = 1
 2=2  Chadema = 2
 3=3  CUF = 3
 4=4
 5=5 Self-esteem
 Low = 1
Sex  Medium = 2
 Male = 1  High= 3
 Female = 2
Coding of open-ended questions
• Eg. What is the major source of your start-up capital for
your business?
• You might notice most of the respondents listing their
source of financing as related to
– Loan from friends/family, assistance from family, Loan from
SACCOs, UPATU
• Major groups of responses under the variable name
STARTCAPIT, and assign a number to each (loan from
friends=1, Assitance = 2, SACCOS=3, UPATU = 4)
• You also need to add another numerical code for
responses that did not fall into these listed categories
(OTHERS= )
Example of a codebook
Description of Variable Variable name (SPSS) Coding instructions

Identification Number ID

Sex GENDER 1=female, 2=male

Age Age Number of Years

Education level attained EDLEVEL 1=primary, 2=secondary,


3=college, 4=university,

Marital Status MARITAL 1=single, 2=married,


3=divorced, 4=widowed

Source of start-up capital STARTCAPITAL 1=LOAN FROM FRIENDS,


2=SACCOS, 3=ASSISTANCE
Creating a data file
• Define variables
• Variable Name
• Type of Variable: Numerical or String
• ………………………
• Enter data in your file based on your codebook
• You may create a data file using one of your favorite
software programme (STATA, EXCEL, or SPSS or PSPP)
Data Cleaning
• Check the number of valid cases and missing cases
– if there are a lot of missing cases you need to ask why.
• Check for Errors in your data file
– For Categorical variables (eg. Sex, marital status,
education level)
• check for values that out of range (between Minimum and
Maximum values)
– For Continuous variables (Age)
• Check for Mean, Variance, Minimum & Maximum
• Does the mean score make sense?
• Correct the errors
DATA ANALYSIS TECHNIQUES
• DESCRIPTIVE ANALYSIS
– UNIVARIATE ANALYSIS (ONE VARIABLE)
• BIVARIATE ANALYSIS (TWO VARIABLES)
• MULTIVARIATE ANALYSIS (MORE THAN TWO
VARIABLES)
Descriptive Analysis
• For Continuous variables (eg. Age & Income);
– Descriptive statistics include the mean, standard
deviation, range of scores, skewness and kurtosis.
• For categorical variables you should use
Frequencies.
– This will tell you how many people in each category (e.g.
for sex; how many males, how many females).
Descriptive Analysis
• Descriptive Statistics
– The Range: Min/Max
– Average/Mean
– Median
– Mode
– Variance/Standard Deviation
– Histograms and Normal Distributions
• The most important thing is your interpretation of
these statistics
Choosing the right Statistical Test
Overview of statistical techniques
• Techniques used to explore the relationship among
variables (used when you have continuous
variables)
– Pearson correlation
– Partial Correlation
– Multiple regression
– Logistic Regression
– Factor analysis
– ………
Statistical techniques…
• Techniques you can use when you want to
explore the differences between groups
– Chi-Square test (for categorical variables)
– Independent Samples T-tests
– One-way analysis of variance (One way ANOVA)
– Two-way analysis of variance (Two way-ANOVA)
– Multivariate analysis of variance (MANOVA)
– Analysis of covariance (ANCOVA)
Choosing the right Statistical Test
• In choosing the right statistic you will need to
consider a number of different factors:
• 1. What questions do you want to address?
– Based on your research questions/objectives
• E.g. Effect of Age on Optimism
– Is there a relationship between age and level of
optimism?
– Are older people more optimistic than younger people?
Factors to be considered….
• 2. Type of the data collected (questionnaire items
and scales)
• E.g., the way you collected information about
respondents’ age
• If you have categories of age group
• If you have age in years
Factors………..
• 3. The nature of each of your variables
• It is also important that you know the level of
measurement for each of your variables. Are your
variables:
• Categorical (also referred to as nominal level data,
e.g. sex: male/females)
• Ordinal (rankings: 1st, 2nd, 3rd); and
• Continuous (also referred to as interval level data,
e.g. age in years, or scores on the Optimism scale).
Factors…..
• 4. Normal Distribution - For continuous variables:
• You need to check whether distribution of scores
normally distributed or are they badly skewed?).
• What is the range of scores?
REVISION
• Descriptive Analysis
• Factors for choosing particular Data analysis
technique
TYPES OF DATA NEEDED
Exploring relationships among variables….
• Pearson Correlation Analysis
– It tests relationship between two variables
• Example of a research question:
– Is there a relationship between age and optimism
scores?
– Does optimism increase with age?
• Types of data needed: two continuous variables
(e.g., age, optimism scores)
Exploring relationships among variables…

• Partial correlation Analysis


– It tests relationship between two variables, while
controlling the effects of a third variable
• Example of a research question:
– After controlling for the effects of education, is there
still a significant relationship between household
income and number of children?
• Types of data needed: three continuous
variables (e.g. education, income, number of
children)
Exploring relationship……
• Multiple regression
– It tests cause and effects
• Example of research questions:
– How well a set of variables is able to predict a particular
outcome;
– Which variable in a set of variables is the best predictor of an
outcome; and
– Whether a particular predictor variable is still able to predict an
outcome when the effects of another variable are controlled for
• Types of data needed: one continuous dependent
variable (e.g. life satisfaction); and two or more
continuous independent variables (e.g. self-esteem,
Types of data needed for Exploring
differences between groups
Exploring differences between groups
• Independent-samples t-test
– It tests whether two groups differ significantly in
relation to a particular variable?
• Example of research question:
– Are males more optimistic than females?
• What you need: one categorical independent
variable with only two groups (e.g. sex:
males/females); and one continuous dependent
variable (e.g. optimism score).
• Subjects can belong to only one group.
Exploring differences between groups
• Paired-samples t-test (repeated measures)
• Example of research question:
– Does ten weeks of meditation training result in a decrease in
participants’ level of anxiety?
– Is there a change in anxiety levels from Time 1 (pre-
intervention) to Time 2 (post-intervention)?
• Type of data you need:
– one categorical independent variable (e.g. Time 1/Time 2);
and
– one continuous dependent variable (e.g. anxiety score).
• Same subjects tested on two separate occasions: Time
1 (before intervention) and Time 2 (after intervention).
Exploring differences between groups
• Chi-square test for independence
– It tests whether groups differ significantly
• Example of research question:
– What is the relationship between gender and dropout
rates from school?
• Types of data you need;
– one categorical independent variable (e.g. sex:
males/females); and
– one categorical dependent variable (e.g. dropout rate:
Yes/No).
• You are interested in the number of people in each
Exploring differences between groups…
• One-way between-groups analysis of variance
(ONE-WAY ANOVA)
– It tests whether three groups differ significantly
• Example of research question:
– Is there a difference in optimism scores for people who
are under 30, between 31–49 and 50 years and over?
• What you need:
– one categorical independent variable with three groups
(e.g. age: under 30/31–49/50+); and
– one continuous dependent variable (e.g. optimism
score).
Exploring differences between groups
• Two-way between-groups analysis of variance
(TWO-WAY ANOVA)
• Example of research question:
– What is the effect of age on optimism scores for males
and females?
• What do you need:
– two categorical independent variables (e.g. sex:
males/females; age group: under 30/31–49/50+); and
– one continuous dependent variable (e.g. optimism
score).
Exploring Differences….
• Multivariate analysis of variance (MANOVA)
• Example of research question:
– Are males better adjusted than females in terms of their
general physical and psychological health (in terms of
anxiety and depression levels and perceived stress)?
• What you need:
– one categorical independent variable (e.g. sex:
males/females); and
– two or more continuous dependent variables (e.g.
anxiety, depression, perceived stress).
QUESTIONS FOR DISCUSSION
• When do you use the following statistical
techniques, and what does the technique
test? Give Examples
• Independent T-test
• One-way Anova
• Correlation Analysis
• Partial Correlation Analysis
• Multiple Regression Analysis
• Two-way ANOVA
REVISION
• A Lecturer wants to know who performed better on a
research methods examination, among the following groups:
BBA students or BAF students
• Juma wants to assess the effect of both sex and ethnic group
on examination performance
• One of the research questions that Juma is interested to
answer is: Are there differences between cars imported from
Japan, from Europe and from USA in relation to fuel
consumption?
• He was interested to examine whether there is significant
relationship between student performance and parents’
economic status after removing the effects of parents’
HYPOTHESIS TESTING
Hypotheses
• A hypothesis is a statement that something is true.
• It is a claim or statement about a property of a
population
• Two types in Statistics
– The null hypothesis (H0) is a claim of “there are no
differences in the population groups”
– The alternative hypothesis (Ha) claims “H0 is false” (i.e.
there groups do differ)
Non-statistical example…

• A criminal trial is an example of hypothesis testing


without the statistics.
• In a trial a jury must decide between two hypotheses.
The null hypothesis is
• H0: The defendant is innocent
• The alternative hypothesis or research hypothesis is
• H1: The defendant is guilty
• The jury does not know which hypothesis is true.
They must make a decision on the basis of evidence
presented. 11.42
Non-statistical example.…
• In the language of statistics, convicting the defendant is
called rejecting the null hypothesis in favor of the
alternative hypothesis.
• That is, the jury is saying that there is enough evidence to
conclude that the defendant is guilty (i.e., there is enough
evidence to support the alternative hypothesis).
• If the jury acquits it is stating that there is no enough
evidence to support the alternative hypothesis.
• Notice that, the jury is not saying that the defendant is
innocent, only that there is no enough evidence to support
the alternative hypothesis.
Non-statistical Example…
• There are two possible errors that can be
committed:
• A Type I error occurs when we reject a true null
hypothesis. i.e. a Type I error occurs when the jury
convicts an innocent person.
• In research , we would want the probability of this
type I error to be very small for a criminal trial
where a conviction results in the death penalty
• The most widely accepted cutoff point is 5%
• P(Type I error) =  (alpha) [usually 0.05 or 0.01]
Non-statistical Example…
• A Type II error occurs when we don’t reject a false
null hypothesis [i.e. accept the null hypothesis].
• That occurs when a guilty defendant is acquitted.
• In practice, this type of error is by far the most
serious mistake we normally make.
• The probability of a type II error is β (Greek letter
beta).
• The two probabilities are inversely related.
Decreasing one increases the other, for a fixed
sample size.
11.45
Type I and Type II Errors
True State of Nature
The null The null
hypothesis is hypothesis is
true false

Type I error
We decide to (rejecting a true Correct
reject the null hypothesis) decision
null hypothesis
Decision

Type II error
We fail to Correct (rejecting a false
reject the decision null hypothesis)
null hypothesis

Hypothesis Testing…
• The critical concepts are these:
• There are two hypotheses, the null and the alternative hypotheses.
• The procedure begins with the assumption that the null hypothesis
is true.
• The goal is to determine whether there is enough evidence to infer
that the alternative hypothesis is true, or the null is not likely to be
true.
• There are two possible decisions:
– Conclude that there is enough evidence to support the alternative
hypothesis. Reject the null.
– Conclude that there is no enough evidence to support the alternative
hypothesis. Fail to reject the null.
• Therefore, The smaller the P-value (), the stronger the evidence
against Null Hypothesis H0 11.47
Interpreting the P-Value
• The smaller the p-value, the more statistical evidence
exists to support the alternative hypothesis.
• If the p-value is less than 1%, there is overwhelming
evidence that supports the alternative hypothesis.
• If the p-value is between 1% and 5%, there is a strong
evidence that supports the alternative hypothesis.
• If the p-value is between 5% and 10% there is a weak
evidence that supports the alternative hypothesis.
• If the p-value exceeds 10%, there is no evidence that
supports the alternative hypothesis.

11.48
Correlation analysis
Correlation Analysis
• Correlation:
– determines whether and to what degree a relationship exists
between two or more quantifiable variables
– the degree of the relationship is expressed as a coefficient of
correlation, r
• Linear relationships implying straight line association are
visualized with scatter plots
• Strong linear relationship
– When the points lie close to a straight line, and weak if they are
widely scattered
• The presence of a correlation does not indicate a cause-
effect relationship primarily because of the possibility of
Scatter Plot Examples
Linear relationships Curvilinear relationships

y y

x x

y y

x x
Scatter Plot Examples
Strong relationships Weak relationships

y y

x x

y y

x x
Scatter Plot Examples
No relationship

x
Correlation Coefficient
• The sample correlation coefficient r is an estimate
of population correlation coefficient and is used to
measure the strength of the linear relationship in
the sample observations
• Correlation Coefficients:
– Range between -1 and 1
– The closer to -1, the stronger the negative linear
relationship
– The closer to 1, the stronger the positive linear
relationship
– The closer to 0, the weaker the linear relationship
Strength and Direction….
• Graded interpretation :
• r_: Below 0.1 = weak; r: 0.1- 0.29 = medium; 0.30 -
0.49 = strong and 0.5 -1.0= very strong correlation
• These guidelines apply whether or not there is a
negative sign out the front of your r value.
• Remember, the negative sign refers only to the
direction of the relationship, not the strength.
• The strength of correlation of r=.5 and r=–.5 is the
same. It is only in a different direction.
Testing hypothesis – Correlation Coefficient

• For correlation analysis, we are testing


whether or not there is a significant linear
relationship between the variables or if r is
due to chance.
• Null hypothesis is: Ho: r = 0
• No correlation between x and y variables
• Alternative H1: r ≠ 0
• Significant correlation between the variables
Basic Assumptions - Correlation
• There are must linear relationship between two
variables
• Requires both variables to be quantitative or
continuous variables
• Both variables must be normally distributed.
Example
• Research Question:
– What is relationship between Age and Blood Pressure?
• The relationship between Age and blood pressure
– Age
– Blood Pressure
Procedure for Correlation Analysis
• From the menu click on: Analyze, then click on
Correlate, then on Bivariate.
• Select your two variables and move them into the
box marked Variables
• You can list a whole range of variables here, not just
two.
• In the resulting matrix, the correlation between all
possible pairs of variables will be listed.
• You are interested on the strength and direction of
the relationship
SPSS Output
Interpretation of SPSS Output
• Descriptive Statistics
– Sample size used in the analysis (N)
– Mean and Std Deviation
• Correlation Matrix
– Direction of the relationship (positive or negative)
– Strength of the relationship (check correlation
coefficients, r)
– P-value or Significance value: when p-value is 5% or less
than 5%, it indicates that there significant relationship
between two variables)
Presenting the Correlation Results (Briefly)

• The relationship between age and blood pressure was


investigated using Pearson correlation coefficient.
• Preliminary analyses were performed to ensure no
violation of the assumptions of normality, and linearity
• The results indicate that in a sample of 66 respondents,
there is no significant relationship between age of the
respondents and BP, r = 0.02, p = 0.90.

Multiple Regression Analysis
Multiple Regression Analysis
• Multiple Regression is a statistical method for estimating the
relationship between a dependent variable (DV) and two or more
independent (or predictor) variables (IVs)
• You should have a sound theoretical or conceptual reason for the
analysis and, in particular, the expected relationship between IVs
and DV
• Multiple regression can be used to address a variety of research
questions.
• How well a set of variables is able to predict a particular outcome;
• Which variable in a set of variables is the best predictor of an
outcome; and
• Whether a particular predictor variable is still able to predict an
outcome when the effects of another variable are controlled for
Conceptual Framework

(X1) 1

(X2) 2

3
(X3) Dependent
Variable
4
(X4)
Multiple Regression Equation
Y = a + b1X1 + b2X2 + b3X3+ b4X4
Notation
 Y is the Dependent Variable
 Xs are Independent Variables
 a is the Y intercept, where the regression line crosses the Y axis
 b1 is the partial slope for X1 on Y
 b1 indicates the change in Y for one unit change in X 1,
controlling for X2, X3, X4
 b2 is the partial slope for X2 on Y
 b2 indicates the change in Y for one unit change in X 2,
controlling for X1, X4, X3
Assumptions of Multiple regression
• Sample size: The issue is generalisability which needs large sample
size.
• If your results do not generalise to other samples, then they are of
little scientific value.
• Different guidelines concerning the number of cases required for
multiple regression have given:
• Stevens (1996) recommends that ‘for social science research, about
15 subjects per predictor are needed for a reliable equation’.
• Tabachnick and Fidell (2001) gave a formula for calculating sample
size: taking into account the number of independent variables : n >
50 + 8m (where m = No. of Ind. variables).
• If you have five independent variables you will need 90 cases.
• More cases are needed if the dependent variable is skewed.
Assumptions……
• Multicollinearity and singularity
• This refers to the relationship among the independent
variables.
• Multiple regression doesn’t like multicollinearity or singularity
• Multicollinearity exists when the independent variables are
highly correlated (r=.9 and above).
• Singularity occurs when one independent variable is actually a
combination of other independent variables (e.g. when both
subscale scores and the total score of a scale are included).
• These certainly don’t contribute to a good regression model,
• Always check for these problems before you start analysis
Assumptions….
• Outliers
• Multiple regression is very sensitive to outliers (very
high or very low scores
• Check for extreme scores before you start analysis
• Do this for all the variables, both dependent and
independent variables.
• Outliers can either be deleted from the data set or,
replaced by a standardized score for that variable
Assumptions….
• Normality & linearity
• These all refer to various aspects of the distribution
of scores and the nature of the underlying
relationship between the variables
• Your variables need to be normally distributed,
especially dependent variable
• There should liner relationship between
Independent Variables and Dependent Variable
Checking for Multi-collinearity
• Two tests are available: Variance Inflation Factor (VIF) and
Tolerance (Available in SPSS)
• Variance Inflation Factor (VIF) – measures how much the
variance of the regression coefficients is inflated by
multicollinearity problems.
• If VIF equals 0, there is no correlation between the independent
measures.
• A VIF measure of 1 is an indication of some association between
predictor variables, but generally not enough to cause problems.
• A maximum acceptable VIF value would be 5.0; anything higher
would indicate a problem with multicollinearity.
• Some books recommend a cut-off point of 10 for VIF
Checking for MultiCollinearity…..
• Tolerance – the amount of variance in an independent
variable that is not explained by the other independent
variables.
• If the other variables explain a lot of the variance of a
particular independent variable we have a problem with
multicollinearity.
• Thus, small values for tolerance indicate problems of
multicollinearity.
• The minimum cutoff value for tolerance is typically .20
• That is, the tolerance value must be smaller than .20 to
indicate a problem of multicollinearity.
MultiCollinearity tests………
• You can also to check Pearson Correlation
coefficients
• Correlation coefficients between IVs should not be
very high: r = .9 or more
• If the IVs are highly correlated variables, you may
need to remove one the IVs.
Example of Multiple regression
• Research on Factors influencing Repeat
Purchase in a Hotel
• Dependent Variable
– Return in Future in a Hotel (Repeat purchase)
• Independent variables
– Wide variety of Menu items
– Excellent Food quality
– Excellent Food taste
Procedure for standard multiple regression

• From the menu click on: Analyze, then click on Regression,


then on Linear.
• Click on your continuous dependent variable, and move it
into the Dependent box.
• Click on your independent variables and move them into the
Independent box.
• For Method, make sure Enter is selected (this will give you
standard multiple regression).
• Click on the Statistics button. Tick the box marked Estimates,
Confidence Intervals, Model fit, Descriptives, and Collinearity
diagnostics.

Interpretation from SPSS output for MRA
• Evaluating the model
• Look in the Model Summary box and check the value given
under the heading R-Square.
• This tells you how much of the variance in the dependent
variable (repeat purchase) is explained by the IVs (which
includes Wide variety of Menu items, Excellent Food quality,
Excellent Food taste)
• In this example the value for R-square is .262; always
expressed as a % (multiply by 100)
• This means that the model (which includes all IVs) explains
26.2% of the variance in Repeat Purchase
• The higher the R-square, the better is the model

Evaluating the model
• SPSS also provides an Adjusted R Square value in the
output.
• When a small sample is involved, the R square value in
the sample tends to be a rather optimistic
overestimation of the true value in the population
• The Adjusted R square statistic ‘corrects’ this value to
provide a better estimate of the true population value.
• If you have a small sample you may wish to consider
reporting this value, rather than the normal R Square
value.
Evaluating the model
• ANOVA Table
• The ANOVA summary table tells us whether our model is
statistically adequate
– You use F-value and P-Value:
– When P-value (Sig.Value) is less than or equal to 5%; it
indicates that R-square is different from zero
– The regression equation is a better predictor for population
values (i.e. The model can be generalized in the population)

• In the example, F-value is 11.9..; p-value<0.001


Evaluating each of the independent
variables
• Which of the variables included in the model contributed to the
prediction of the dependent variable
• We find this information in the output box labelled Coefficients.
• Look in the column labelled Beta under Standardised
Coefficients.
• To compare the different variables it is important that you look
at the standardised coefficients, not the unstandardised ones.
• ‘Standardised’ means that these values for each of the different
variables have been converted to the same scale so that you can
compare them.
• If you were interested in constructing a regression equation,
you would use the unstandardised coefficient values listed as B.
Evaluating each of the IVs
• In this case we are interested in comparing the contribution
of each independent variable;
• Look down the Beta column and find which beta value is the
largest (ignoring any negative signs out the front).
• In this case the largest beta coefficient is .324, which is for
food quality .
• This means that this variable makes the strongest unique
contribution to explaining the dependent variable, when the
variance explained by all other variables in the model is
controlled for.
• The Beta value for Food taste variable is 0.291, and for wide
menu items is 0.094.
Evaluating of IVs….
• For each of these variables, check the value in the column
marked Sig.
• This tells you whether this variable is making a statistically
significant unique contribution to the equation.
• This is very dependent on which variables are included in the
equation, and how much overlap there is among the
independent variables.
• If the Sig. value is less than .05 (.01, .0001, etc.), then the
variable is making a significant unique contribution to the
prediction of the dependent variable.
• If greater than .05, then you can conclude that that variable is not
making a significant unique contribution to the prediction of your
Presenting Results from MRA….
• The data were analyzed by multiple regression, using IVs
(Quality, Taste and Menu) and DV (Repeat Purchase).
• The regression model was very poor (R-square adj = 26.9%),
but the overall relationship was significant (F = 11.382, p <
0.01).
• With other variables held constant, Repeat purchase scores
were positively related to quality and taste, but negatively
related to Menu items.
• The results show that Repeat purchase was increasing by ,
0.324 for every improvement of quality, and by 0.291 for
every improvement in taste, and was reduced by 0.094 for
any change in menu items.
• Only the effect of quality was significant (t = 4.985, p < 0.01).
What Can We Do With
Multiple Regression?

1. Determine the statistical significance of the


attempted prediction.
2. Determine the strength of association between
the single dependent variable and the multiple
independent variables.
3. Identify the relative importance of each of the
multiple independent variables in predicting the
single metric dependent variable.
4. Predict the values of the dependent variable from
the values of the multiple independent variables.
Chi-square test
Chi-square test for independence
• This test is used when you wish to explore the
relationship between two categorical variables.
• Each of these variables can have two or more categories.
• Research Questions:
– Are males more likely to be smokers than females?
– Is the proportion of males that smoke the same as the
proportion of females?
– Is there a relationship between gender and smoking
behaviour?
• What you need: Two categorical variables, with two or
more categories in each, for example:
• Gender (Male/Female); and Smoker (Yes/No).
Example: Chi-square test
• Variable 1: Gender:
– 1- male; 2-female
• Variable 2:
– Marital status: 1-married, 2-widowed, 3-divorced, 4-
separated, 5-single
Procedure for chi-square
• From the menu click on: Analyze, then click on Descriptive
Statistics, then on Crosstabs.
• Click on one of your variables, to be your row variable, move it
into the box marked Row(s).
• Click on the other variable to be your column variable; move it
into the box marked Column(s).
• Click on the Statistics button. Choose Chi-square. Click on
continue.
• Click on the Cells button. In the Counts box, click on the
Observed and Expected boxes.
• In the Percentage section click on the Row, Column and Total
boxes.
Chi-square Test - 1
Interpretation of Output- chi-square
• The first thing you should check is whether you have
violated one of the assumptions of chi-square
concerning the ‘minimum expected cell frequency’,
which should be 5 or greater (or at least 80 per cent of
cells have expected frequencies of 5 or more).
• This information is given in a footnote below the final
table (labelled Chi-Square Tests).
• Footnote b in the example provided indicates that ‘0
cells (.0%) have expected count less than 5’.
• This means that we have not violated the assumption, as
all our expected cell sizes are greater than 5 (in our case
Interpretation - Chi-square tests
• The main value that we are interested in from the
output is the Pearson chi-square value, which is
presented in the final table, headed Chi-Square Tests.
• You need to check on the column labelled Asymp. Sig
• To be significant the Sig. value needs to be .05 or
smaller
• In this case the value of .001 which is smaller than the
alpha value of .05, so we can conclude that there are
significant differences between the groups in the two
variables (marital status and gender)
Summary information
• To find what percentage of each sex in relation to their marital
status, we check on the summary information provided in the
table labelled MARITAL * SEX Cross-tabulation.
• The results show that 50.9% of males were married, while
42.7% of females are married.
• Researchers often try to identify try which cell(s) are the
major contributors to the significant chi-square test by
examining the pattern of column percentages.
• Based on the column percentages, we can identify cells on the
married row and the widowed row as the ones producing the
significant result because they show the largest differences:
8.2% on the married row (50.9%-42.7%) and 9.0% on the
widowed row (13.1%-4.1%)
Summary Information
Independent samples T-test
Independent-samples t-test
• An independent-samples t-test is used when you want to compare
the mean score, on some continuous variable, for two different
groups of subjects.
• Research Question: Is there a significant difference in the mean self-
esteem scores for males and females?
• What you need: Two variables: one categorical, independent
variable (e.g. males/females);
• And one continuous, dependent variable (e.g. self-esteem scores).
• An independent-samples t-test will tell you whether there is a
statistically significant difference in the mean scores for the two
groups (that is, whether males and females differ significantly in
terms of their self-esteem levels).
• In statistical terms, you are testing the probability that the two sets
of scores (for males and females) came from the same population.
Assumptions

• Level of measurement : it is assumed that the dependent


variable is measured at the interval or ratio level, that is,
using a continuous scale rather than discrete categories.
• Random sampling: The technique assumes that the scores
are obtained using a random sample from the population.
• Normal distribution: It is assumed that the populations
from which the samples are taken are normally distributed.
• Homogeneity of variance: Techniques in this section make
the assumption that samples are obtained from populations
of equal variances.
• This means that the variability of scores for each of the
groups is similar.
Procedure for t-test
• From the menu, click on: Analyze, then click on
Compare means, then on Independent Samples T-test.
• Move the dependent (continuous) variable (e.g. total
self-esteem) into the area labelled Test variable.
• Move the independent variable (categorical) variable
(e.g. sex) into the section labelled Grouping variable.
• Click on Define groups and type in the numbers used in
the data set to code each group.
• In the current data file 1=males, 2=females; therefore,
in the Group 1 box, type 1; and in the Group 2 box, type
2.
Example for t-test….
• Research Question:
– Is there a significant difference in the mean household
income scores for males and females?
• Variables:
– one categorical, independent variable: Gender-
(males/females);
– one continuous, dependent variable (e.g. household
Income)
SPSS Output (t- test)

100
Interpretation of output from independent-
samples t-test
• Checking the information about the groups
• In the Group Statistics box SPSS gives you the mean and
standard deviation for each of your groups (in this case:
male/female).
• It also gives you the number of people in each group (N).
• Always check these values first. Do they seem right?
• Are the N values for males and females correct? Or are
there a lot of missing data?
• If so, find out why. Perhaps you have entered the wrong
code for males and females (0 and 1, rather than 1 and
2). Check with your codebook
Interpretation of output for t-test

• Checking assumptions of equal variances


• The first section of the Independent Samples Test output box gives
you the results of Levene’s test for equality of variances.
• This tests whether the variance (variation) of scores for the two
groups (males and females) is the same.
• If your Sig. value is larger than .05 (e.g. .07, .10), you should use the
first line in the table, which refers to Equal variances assumed.
• If the significance level of Levene’s test is p=.05 or less
(e.g. .01, .001), this means that the variances for the two groups
(males/females) are not the same.
• Therefore your data violate the assumption of equal variance.
• But, SPSS provides with an alternative t-value which compensates
for the fact that your variances are not the same.
• You should use the information in the second line of the t-test table,
which refers to Equal variances not assumed.
Interpretation………….
• In the example given in the output above, the
significance level for Levene’s test is .171.
• This is larger than the cut-off of .05.
• This means that the assumption of equal variances
has not been violated;
• This means that the variances for the two groups
(males/females) are the same.
• Therefore, when you report your t-value, you will
use the one provided in the first line of the table.
Interpretation……
• Assessing differences between the groups
• To find out whether there is a significant difference between your
two groups, refer to the column labelled Sig. (2-tailed), which
appears under the section labelled t-test for equality of means.
• Two values are given. One for equal variance, the other for unequal
variance.
• Choose whichever your Levene’s test result says you should use.
• If the value in the Sig. (2-tailed) column is equal or less than .05 (e.g.
.03, .01, .001), then there is a significant difference in the mean
scores on your dependent variable for each of the two groups.
• In the example presented in the output above the Sig. (2-tailed) value
is .483.
• As this value is above the required cut-off of .05, you conclude that
there is not a statistically significant difference in the mean household
income for males and females
Presenting the results for independent-samples t-test

• The results of the analysis could be presented as


follows:
• An independent-samples t-test was conducted to
compare the household income scores for males
and females.
• There was no significant difference in scores for
females (M=68.78, SD=75.73) and males [M=70.16,
SD=81.56; t=-0.702, p=.483].
• The magnitude of the differences in the means was
very small.
One-way ANOVA
One way ANOVA
• When we are interested in comparing the mean scores of
more than two groups.
• In this situation we would use analysis of variance (ANOVA).
• One-way analysis of variance involves one independent
variable (referred to as a factor), which has a number of
different levels, and
• It has one dependent continuous variable
• It is called Analysis of Variance because it compares the
variance (variability in scores) between the different groups
(believed to be due to the independent variable) with the
variability within each of the groups (believed to be due to
chance)
One way Anova
• It tests whether there are significant
differences among groups for a particular
variable.
• Independent Categorical Variable: has more
than two groups
• Dependent variable has a continuous scale
Example..ANOVA
• Dependent Variable
– Online Purchase
• Independent Variable: Age in three categories
– 24 and younger
– 25 – 40
– Above 41
Procedure for one-way between-groups
ANOVA
• Analyze, then click on Compare Means, then on One-
way ANOVA.
• Click on your dependent (continuous) variable.
• Move this into the box marked Dependent List
• Click on your independent, categorical variable;
• Move this into the box labelled Factor.
• Click the Options button and click on Descriptive,
Homogeneity of variance test,
• Click on the button marked Post Hoc. Click on Tukey.
• Click on Continue and then OK.
One way- ANOVA
• An F ratio is calculated which represents the
variance between the groups, divided by the
variance within the groups.
• A large F ratio indicates that there is more variability
between the groups (caused by the independent
variable) than there is within each group.
• F-test does not, however, tell us which of the groups
differ.
• Post-hoc tests help us to compare the groups.
SPSS OUTPUT- ANOVA
Test of Homogeneity of Variances

Purchased a product over the internet


Levene
Statistic df1 df2 Sig.
4.044 2 190 .019

ANOVA

Purchased a product over the internet


Sum of
Squares df Mean Square F Sig.
Between Groups 2.943 2 1.471 6.185 .002
Within Groups 45.202 190 .238
Total 48.145 192
112
Interpretation: one-way ANOVA
• Test of homogeneity of variances
• The homogeneity of variance option gives you Levene’s test for
homogeneity of variances, which tests whether the variance in
scores is the same for each of the three groups.
• Check the significance value (Sig.) for Levene’s test.
• If this number is greater than .05, then you have not violated the
assumption of homogeneity of variance.
• In this example the Sig. value is .019.
• As this is smaller than .05, we have violated the homogeneity of
variance assumption.
• Since variances are not equal, then, we need to choose a Post Hoc
Analysis for Equal Variances Not Assumed
• Analyze>Compare Means>One-Way ANOVA>Options>Post Hoc,
Interpretation of output from one-way
ANOVA
• Assessing differences
• ANOVA table gives both between-groups and within-
groups
• The main thing you are interested in is the column
marked Sig.
• If the Sig. value is less than or equal to .05 then there
is a significant difference somewhere among the mean
scores on your dependent variable for the three
groups.
• This does not tell us which group is different from
Assessing differences - ANOVA
• The statistical significance of the differences between each pair of
groups is provided in the table labelled Multiple Comparisons,
which gives the results of the post-hoc tests
• You should look at this table only if you found a significant
difference in your overall ANOVA. (i.e. if the Sig. value was equal
to or less than .05)
• The post-hoc tests in this table will tell you exactly where the
differences among the groups occur.
• Look down the column labelled Mean Difference. Look for any
asterisks (*) next to the values listed.
• If you find an asterisk, this means that the two groups being
compared are significantly different from one another at the
SPSS OUTPUT
Analyze>Compare Means>One-way ANOVA>Post Hoc>Tamahane’s T2

Multiple Comparisons

Dependent Variable: Purchased a product over the internet


Tamhane

Mean
Difference 95% Confidence Interval
(I) Age of respondent (J) Age of respondent (I-J) Std. Error Sig. Lower Bound Upper Bound
24 and younger 25-40 -.22139* .09014 .046 -.4402 -.0026
41 and older -.31974* .08929 .002 -.5366 -.1029
25-40 24 and younger .22139* .09014 .046 .0026 .4402
41 and older -.09835 .08156 .543 -.2954 .0987
41 and older 24 and younger .31974* .08929 .002 .1029 .5366
25-40 .09835 .08156 .543 -.0987 .2954
*. The mean difference is significant at the .05 level.
116
Interpreting the Post Hoc Results
• A statistically significant difference (p = .05) exist in
the proportion of Internet users who have made an
on-line purchases for those in the age groups:
• 24 or younger vs. 25 – 40: the proportion of the 24
or younger age group is .22139 smaller than the
proportion in the 25 – 40 age group.
• 24 or younger vs. 41 or older: the proportion of the
24 or younger age group is .31974 smaller than the
proportion in the 41 or older age group.
Presenting the results
• A One-way ANOVA was conducted to determine if the
proportion of Internet users who made on-line
purchases was influenced by the users’ age.
• The test found a highly statistically significant difference
among the age groups (p = .002).
• Post hoc analysis showed that the proportions of users
making a purchase from the middle and older age
groups were higher than that for the younger age group
(p = .05).
• Therefore, there is highly significant statistical evidence
to support the hypothesis that age influences on-line
Qualitative data analysis
Data Analysis for Qualitative data
• The first difference between qualitative and
quantitative data analysis is that the data to be
analyzed are text, rather than numbers
– No hypotheses to be tested
– No variables
Quantitative vs. Qualitative
• Explanation through numbers • Explanation through
words
• Objective
• Subjective
• Deductive reasoning
• Inductive reasoning
• Predefined variables and
measurement • Creativity, extraneous
variables
• Data collection before
analysis • Data collection and
analysis intertwined

• Cause and effect relationships • Description, meaning


Qualitative Research Goals
• Meaning:
– how people see the world
• Context:
– the world in which people act
• Process:
– what actions and activities people do
• Reasoning:
– why people act and behave the way they do
Qualitative Data
• Written field notes

• Audio recordings of conversations

• Video recordings of activities

• Diary recordings of activities / thoughts


Qualitative Data
• Depth information on:

– thoughts, views, interpretations

– priorities, importance

– processes, practices

– intended effects of actions

– feelings and experiences


Qualitative Data analysis
• Data analysis
– An attempt by the researcher to summarize
collected data.
• Data Interpretation
– Attempt to find meaning
Data Analysis During Collection
• Analysis is not left until the end
• To avoid collecting data that are not important the
researcher must ask:
– How am I going to make sense of this data?
• As they collect data the researcher must ask
– Why do the participants act as they do?
– What does this focus mean?
– What else do I want to know?
– What new ideas have emerged?
– Is this new information?
Data Analysis After Collection
• One way is to follow three iterative steps
1. Become familiar with the data through
1. Reading
2. Memoing
2. Examine the data in depth to provide detailed
descriptions of the setting, participants, and
activities.
3. Categorizing and coding pieces of data and grouping
them into themes.
Data Analysis After Collection Summarizing

• Reading and memoing


• Read and write memos about field notes.
• Describing
• Develop comprehensive descriptions of
setting, participants, etc.
• Classifying
• Breaking data into analytic units.
• Categories
• Themes
Data Interpretation
• Answer these four questions
– What is important in the data?
– Why is it important?
– What can be learned from it?
– So what?
• Remember
– Interpretation depends on the perspective of the
researcher.
Reporting Results
• Find the main themes

• Use quotes / scenarios to represent them


(Example: next slide)

• Include counts for codes (optional)

You might also like