Research Methods in BRM
Research Methods in BRM
Session 2
Creating a data file and entering data
Step 1. Enter the data—that is, the values obtained from each
participant or respondent for each variable.
Step 2. The next step is to set up the structure of the data file by
‘defining’ the variables.
Screening and Cleaning the data
Step 1: Checking for errors. First, you need to check each of your
variables for scores that are out of range (i.e. not within the range of
possible scores).
Step 2: Finding and correcting the error in the data file. Second, you
need to find where in the data file this error occurred (i.e. which case is
involved) and correct or delete the value.
Procedure for checking categorical variables
From the main menu at the top of the screen, click on Analyze, then click on Descriptive Statistics,
then Frequencies.
Choose the variables that you wish to check (e.g. sex, marital, educ).
To assist in finding the variables you want you can right click on the list of variables and select Sort
Alphabetically, or to show Variable Names or Variable Labels.
Click on the arrow button to move these into the Variable box.
Click on the Statistics button. Tick Minimum and Maximum in the Dispersion section.
From the menu at the top of the screen, click on Analyze, then click on Descriptive
statistics, then Descriptives.
Click on the variables that you wish to check. Click on the arrow button to move them
into the Variables box (e.g. age).
Click on the Options button. You can ask for a range of statistics. The main ones at this
stage are mean, standard deviation, minimum and maximum. Click on the statistics you
wish to generate.
2. In the dialogue box that pops up, click on the variable that you
know has an error (e.g. sex) and then on the arrow to move it into
the Sort By box. Click on either ascending or descending (depending
on whether you want the higher values at the top or the bottom).
3. Click on OK.
Preliminary analyses
1. From the menu click on Analyze, then click on Descriptive Statistics, then Frequencies.
2. Choose and highlight the categorical variables you are interested in (e.g. sex). Move
these into the Variables box.
3. Click on OK
Note: If you have very unequal group sizes, particularly if the group sizes are small, it
may be inappropriate to run some of the parametric analyses (e.g. ANOVA).
CONTINUOUS VARIABLES
1. From the menu click on Analyze, then select Descriptive Statistics, then Descriptives.
2. Click on all the continuous variables that you wish to obtain descriptive statistics for.
Click on the arrow button to move them into the Variables box (e.g. age, Total perceived
stress: tpstress).
3. Click on the Options button. Make sure mean, standard deviation, minimum,
maximum are ticked and then click on skewness, kurtosis.
2. Click on the variable(s) you are interested in (e.g. Total perceived stress: tpstress). Click on the arrow button to move
5. Click on the Statistics button and click on Descriptives and Outliers. Click on Continue.
6. Click on the Plots button. Under Descriptive, click on Histogram to select it. Click on Stem-and-leaf to unselect it. Click
7. Click on the Options button. In the Missing Values section, click on Exclude cases pairwise. Click on Continue and then
OK
In the table labelled Tests of Normality, you are given the results of the
Kolmogorov-Smirnov statistic. This assesses the normality of the
distribution of scores. A non-significant result (Sig. value of more
than .05) indicates normality.
CHECKING FOR OUTLIERS
First, have a look at the Histogram. Look at the tails of the distribution.
Are there data points sitting on their own, out on the extremes?
Second, inspect the Boxplot. Any scores that IBM SPSS considers are
outliers appear as little circles and asterix with a number attached (this
is the ID number of the case).
Using graphs to describe and explore the data
HISTOGRAMS
2. To choose the type of graph that you want, click on the Gallery tab, and choose Histogram.
3. Click on the first image shown (Simple Histogram) and drag it up to the Chart Preview area, holding your left mouse button
down.
4. Choose your continuous variable from the list of Variables (tpstress) and drag it across to the area on the Chart preview screen
labelled X-Axis holding your left mouse button down. This will only work if you have identified your variable as Scale in the
Data Editor window (the icon next to the variable should be a ruler).
5. If you would like to generate separate graphs for different groups (e.g. male/female) you can click on the Groups/Point ID tab
and choose Column Panels variable option. This will produce separate graphs next to each other; if you would prefer them to
6. Choose your categorical grouping variable (e.g. sex) and drag it across to the section labelled Panel in the Chart Preview area.
7. Click on OK
BAR GRAPHS
The bar graph can show the number of cases in particular categories, or
it can show the score on some continuous variable for different
categories.
1. From the menu at the top of the screen, click on Graphs, then select Chart Builder and click
2. OK. Click on the Gallery tab and click on the second graph displayed (Clustered Bar). Holding your
left mouse button down drag this graph to the Chart Preview area.
3. From the Element Properties window click on Display error bars, and then on the Apply button at
the bottom of the window. If for some reason you don’t have the Element Properties window click
on the Element Properties box on the far right of the main screen.
4. From the list of Variables drag one of your grouping variables (e.g. sex) to the section on the Chart
Preview screen labelled Cluster on X: set colour. Click and drag your other categorical variable (e.g.
agegp3) to the section labelled X-Axis at the bottom of the graph. Click and drag your continuous
variable (Total Perceived Stress: tpstress) to the remaining blue section, the Y-axis.
5. Click on OK
LINE GRAPHS
4. Click on OK
SCATTERPLOTS
2. Click on the Gallery tab and select Scatter/Dot. Click on the second graph (Grouped
Scatter) and drag this to the Chart Preview area by holding your left mouse button down.
3. Click and drag your continuous independent variable (Total PCOISS:tpcoiss) to the X-Axis,
and click and drag your dependent variable (Total perceived stress:tpstress) to the Y-Axis.
Both of these variables need to be nominated as Scale variables. If you want to show groups
(e.g. males, females) separately choose your categorical grouping variable (e.g. sex) and drag
to the Set Colour box.
4. Click on OK
BOXPLOTS
2. Click on the Gallery tab and choose the first graph displayed (Simple boxplot).
Drag it up to the Chart Preview area, holding your left mouse button down.
3. From the Variables box choose your categorical variable (e.g. sex) and drag it to
the X-Axis box on the Chart Preview area. Drag your continuous variable (Total
Positive Affect:tposaff) to the Y-axis.
5. Click on OK
Manipulating the data
2. Select the items you want to reverse (op2, op4, op6). Move these into the Input Variable— Output Variable
box.
3. Click on the first variable (op2) and type a new name in the Output Variable section on the right-hand side of
In the New Value section, type 5 in the Value box (this will change all scores that were originally scored as 1 to a
5).
5. Click on Add. This will place the instruction (1 → 5) in the box labelled Old > New.
6. Repeat the same procedure for the remaining scores. For example: Old Value—type in
Always double-check the item numbers that you specify for recoding and the old and
2. In the Target Variable box, type in the new name you wish to give to the total scale scores.
3. Click on the Type and Label button. Click in the Label box and type in a description of the scale
(e.g. total optimism). Click on Continue.
4. From the list of variables on the left-hand side, click on the first item in the scale (op1). Click on
the arrow button to move it into the Numeric Expression box.
6. Click OK
Collapsing continuous variables (e.g. Age)
into categorical variables
1. From the menu at the top of the screen, click on Transform and choose Visual Binning.
2. Select the continuous variable that you want to use (e.g. age). Transfer it into the Variables to Bin box. Click
on the Continue button.
3. In the Visual Binning screen, a histogram showing the distribution of age scores should appear.
4. In the section at the top labelled Binned Variable, type the name for the new categorical variable that you will
create (e.g. Agegp3).
5. Click on the button labelled Make Cutpoints. In the dialogue box that appears, click on the option Equal
Percentiles Based on Scanned Cases. In the box Number of Cutpoints, specify a number one less than the
number of groups that you want (e.g. if you want three groups, type in 2 for cutpoints). In the Width (%)
section below, you will then see 33.33 appear. This means that IBM SPSS will try to put 33.3 per cent of the
sample in each group. Click on the Apply button.
6. Click on the Make Labels button back in the main dialogue box. This will automatically generate value labels
for each of the new groups created.
7. Click on OK
Reducing or collapsing the number of
categories of a categorical variable
1. From the menu at the top of the screen, click on Transform, then on
Recode into Different Variables.
2. Select the variable you wish to recode (e.g. educ). In the Name box, type a
name for the new variable that will be created (e.g. educrec). Type in an
extended label if you wish in the Label section. Click on the button labelled
Change.
5. For the second value, I would type 2 in the Old Value but in the New Value I would type 1. This will
recode all the values of both 1 and 2 from the original coding into one group in the new variable to be
created with a value of 1.
6. For the third value of the original variable, I would type 3 in the Old Value and 2 in the New Value. This
is just to keep the values in the new variable in sequence. Click on Add. Repeat for all the remaining
values of the original values. In the table Old > New, you should see the following codes for this example:
1→1; 2→1; 3→2; 4→3; 5→4; 6→5.
2. Select your text variable (e.g. Sex) and move this into the Variable-New Name
box.
3. Type the name you would like to give the converted variable in the New name
box (e.g. SexNum). Click on the Add New Name button.
4. Click on OK.
Reliability and Validity
Reliability Validity
Definition The consistency and stability of The accuracy and relevance of a
a measurement. measurement.
Purpose To ensure that a measurement To ensure that a measurement
produces consistent results accurately measures what it is
over time and across different intended to measure.
situations.
Does all the items of the scale Does the scale have a
have high correlation. theoretical underpinning.
Checking the reliability of a scale
One of the most commonly used indicators of internal consistency is Cronbach’s alpha coefficient. Ideally, the Cronbach
alpha coefficient of a scale should be above .7 (DeVellis 2012).
1. From the menu at the top of the screen, click on Analyze, select Scale, then Reliability Analysis.
2. Click on all of the individual items that make up the scale (e.g. lifsat1, lifsat2, lifsat3, lifsat4, lifsat5). Move these into the
box marked Items.
4. In the Scale label box, type in the name of the scale or subscale (Life Satisfaction).
5. Click on the Statistics button. In the Descriptives for section, select Item, Scale, and Scale if item deleted. In the Inter-
Item section, click on Correlations. In the Summaries section, click on Correlations.
Exploring relationships
What you need: Two variables: both continuous, or one continuous and the other
dichotomous (two values).
What it does: Correlation describes the relationship between two continuous variables, in
terms of both the strength of the relationship and the direction.
Non-parametric alternative: Spearman Rank Order Correlation (rho).
Assumptions
• Both IV should be continuous
• Normality
• Outliers
• Linearity
1. From the menu at the top of the screen, click on Analyze, then select Correlate, then
2. Bivariate.
3. Select your two variables and move them into the box marked Variables (e.g. Total perceived
stress: tpstress, Total PCOISS: tpcoiss). If you wish you can list a whole range of variables here,
not just two.
4. In the Correlation Coefficients section, the Pearson box is the default option. If you wish to
request the Spearman rho (the non-parametric alternative), tick the Spearman box instead.
5. Click on the Options button. For Missing Values, click on the Exclude cases pairwise box. Under
Options, you can also obtain means and standard deviations if you wish.
Negative sign refers only to the direction of the relationship, not the
strength.
Presenting the results
Research questions:
Multicollinearity and singularity: independent variables are highly correlated (r=.9 and above).
Outliers
• normality: the residuals should be normally distributed about the predicted DV scores
• linearity: the residuals should have a straight-line relationship with predicted DV scores
• homoscedasticity: the variance of the residuals about predicted DV scores should be the same for all
predicted scores.
1.From the menu at the top of the screen, click on Analyze, then select Regression, then
Linear.
2.Click on your continuous dependent variable (e.g. Total perceived stress: tpstress) and
move it into the Dependent box.
3.Click on your independent variables (Total Mastery: tmast; Total PCOISS: tpcoiss) and click
on the arrow to move them into the Independent(s) box.
4.For Method, make sure Enter is selected. (This will give you standard multiple regression.)
• Select the following: Estimates, Confidence Intervals, Model fit, Descriptives, Part and
partial correlations and Collinearity diagnostics.
• In the Residuals section, select Casewise diagnostics and Outliers outside standard deviations. Click on
Continue.
6. Click on the Options button. In the Missing Values section, select Exclude cases pairwise. Click on
Continue.
Click on Continue.
Multicollinearity
table labelled Correlations (r more than 0.3 and less than 0.8)
In the Normal P-P Plot, you are hoping that your points will lie in a reasonably straight diagonal line from bottom left to top
right.
In the Scatterplot of the standardised residuals (the second plot displayed) you are hoping that the residuals will be roughly
rectangularly distributed, with most of the scores concentrated in the centre
Multiple regression was used to assess the ability of two control measures (Mastery
Scale, Perceived Control of Internal States Scale: PCOISS) to predict levels of stress
(Perceived Stress Scale). Preliminary analyses were conducted to ensure no violation of
the assumptions of normality, linearity, multicollinearity and homoscedasticity. The
model was significant (p <0.001) and the value of adjusted R square was 0.457.
In the final model, both the variables were statistically significant, with the Mastery
Scale recording a higher beta value (beta = –.44, p < .001) than the PCOISS Scale (beta =
–.33, p < .001).
Research Problem
Example of Broad Problem areas:
5 Whys
Because they do not have a lot of influence over planning, executing, and evaluating the work they do.
Why?
Often the front end of total Preplanned and structured design Measure the effect on
research design dependent variable(s)
Control of other mediating
variables
Methods: Expert surveys Secondary data: quantitative Experiments
Pilot surveys analysis
Case studies Surveys
Secondary data: qualitative Panels
analysis Observation and other data
Qualitative research
Cross-Sectional Designs
• Involve the collection of information from any given sample of
population elements only once.
• In single cross-sectional designs, there is only one sample of
respondents and information is obtained from this sample only once.
• In multiple cross-sectional designs, there are two or more samples of
respondents, and information from each sample is obtained only once.
Often, information from different samples is obtained at different times.
• Cohort analysis consists of a series of surveys conducted at appropriate
time intervals, where the cohort serves as the basic unit of analysis. A
cohort is a group of respondents who experience the same event within
the same time interval.
Longitudinal Designs
• An element is the object about which or from which the information is desired, e.g., the respondent.
• A sampling unit is an element, or a unit containing the element, that is available for selection at some
stage of the sampling process.
• Extent refers to the geographical boundaries.
• Time is the time period under consideration.
Classification of Sampling Techniques
Convenience Sampling
Convenience sampling attempts to obtain a sample of convenient elements.
Often, respondents are selected because they happen to be in the right place at
the right time.
• Use of students, and members of social organizations
• Test markets
• In the second stage, sample elements are selected based on convenience or judgment.
Male 48 48 480
Female 52 52 520
Each possible sample of a given size (n) has a known and equal
probability of being the sample actually selected.
The sample is chosen by selecting a random starting point and then picking
every ith element in succession from the sampling frame.
The strata should be mutually exclusive and collectively exhaustive in that every
population element should be assigned to one and only one stratum and no
population elements should be omitted.
Next, elements are selected from each stratum by a random procedure, usually SRS.
Finally, the variables should decrease the cost of the stratification process
by being easy to measure and apply.
Cluster Sampling
The target population is first divided into mutually exclusive and collectively
exhaustive subpopulations, or clusters.
For each selected cluster, either all the elements are included in the sample
(one-stage) or a sample of elements is drawn probabilistically (two-stage).
Cluster Sampling
• n= sample size
• N= Number of people in the population
• p= Estimated variance for the population of area
• A= Desired precision expressed in decimal in the formula
• z= Required confidence level
Calculating Sample Size (Small to moderate)
• Respondents evaluate only one object at a time, and for this reason
non-comparative scales are often referred to as monadic scales.
• The respondents are provided with a scale that has a number or brief
description associated with each category.
• The categories are ordered in terms of scale position, and the respondents
are required to select the specified category that best describes the object
being rated.
• The commonly used itemized rating scales are the Likert, semantic
differential, and Stapel scales.
Likert Scale
The Likert scale requires the respondents to indicate a degree of
agreement or disagreement with each of a series of statements about
the stimulus objects.
Blank Strongly Disagree Neither agree Agree Strongly
disagree nor disagree agree
1. Wal-Mart sells high-quality merchandise. 1 2X 3 4 5
2. Wal-Mart has poor in-store service. 1 2X 3 4 5
3. I like to shop at Wal-Mart. 1 2 3X 4 5
The data obtained by using a Stapel scale can be analyzed in the same
way as semantic differential data.
Basic Noncomparative Scales
Table 9.1 Basic Noncomparative Scales
Scale Basic Characteristics Examples Advantages Disadvantages
Continuous rating Place a mark on a Reaction to TV Easy to construct Scoring can be
scale continuous line commercials cumbersome unless
computerized
Likert scale Degree of agreement on a Measurement of Easy to construct, More time consuming
1 (strongly disagree) to 5 attitudes administer, and
(strongly agree) scale understand
Stapel scale Unipolar ten-point scale, 25 Measurement of Easy to construct; Confusing and difficult
to 15, without a neutral attitudes and images administered over to apply
point (zero) telephone
Summary of Itemized Scale Decisions
Table 9.2 Summary of Itemized Rating Scale Decisions
1. Number of categories Although there is no single, optimal number, traditional guidelines
suggest that there should be between five and nine categories.
2. Balanced versus unbalanced In general, the scale should be balanced to obtain objective data.
3. Odd or even number of categories If a neutral or indifferent scale response is possible from at least
some of the respondents, an odd number of categories should be
used.
4. Forced versus nonforced In situations where the respondents are expected to have no
opinion, the accuracy of data may be improved by a nonforced
scale.
5. Verbal description An argument can be made for labeling all or many scale categories.
The category descriptions should be located as close to the
response categories as possible.
6. Physical form A number of options should be tried and the best one selected.
Balanced and Unbalanced Scales
Figure 9.1 Balanced and Unbalanced Scales
Rating Scale Configurations
Figure 9.2 Rating
Scale Configurations
Some Unique Rating Scale Configurations
Figure 9.3 Some
Unique Rating Chart
Configurations
Some Commonly Used Scales in Marketing
Table 9.3 Some Commonly Used Scales in Marketing
Attitude Very bad Bad Neither bad nor good Good Very good
• Random samples.
• Independent observations.
Chi-square test for independence
• Explore the relationship between two categorical variables.
• When a 2 by 2 table (two categories in each variable), the output from chi-square includes an
additional correction value (yates’ correction for continuity).
• Additional assumptions: the lowest expected frequency in any cell should be 5 or more.
Some authors suggest less stringent criteria: at least 80 per cent of cells should have expected
frequencies of 5 or more. If you have a 2 by 2 table, it is recommended that the expected
frequency be at least 10. If you have a 2 by 2 table that violates this assumption, you should
consider reporting fisher’s exact probability test instead.
1. From the menu at the top of the screen, click on Analyze, then Descriptive Statistics, and then Crosstabs.
2. Click on one of your variables (e.g. sex) to be your row variable and click on the arrow to move it into the
box marked Row(s).
3. Click on the other variable to be your column variable (e.g. smoker) and click on the arrow to move it into
the box marked Column(s).
4. Click on the Statistics button. Tick Chi-square and Phi and Cramer’s V. Click on Continue.
.10 for small effect, .30 for medium effect and .50 for large effect.
Cramer’s V
For either R–1 or C–1 equal to 2 (three categories): small=.07, medium=.21, large=.35
For either R–1 or C–1 equal to 3 (four categories): small=.06, medium=.17, large=.29
Reporting
2. In the Objective tab click on Customize analysis in the section “What is your objective?”.
4. Click on your categorical (independent) variable (e.g. Sex) and move it into the Groups box.
5. Click on your continuous (dependent) variable (e.g. Total Self esteem: tslfest) and move it into the Test
Fields box.
6. Click on the Settings tab and select Customize tests. Click on Mann-Whitney U (2 samples).
2. Click on your continuous variable (e.g. total self-esteem: tslfest) and move it into the Dependent List box.
3. Click on your categorical variable (e.g. sex) and move it into the Independent List box.
4. Click on the Options button. Click on Median in the Statistics section and move into the Cell
5. Statistics box. Click on Mean and Standard Deviation and remove from the Cell Statistics box.
6. Click on Continue.
7. Click on OK
WILCOXON SIGNED RANK TEST
1. From the menu at the top of the screen, click on Analyze, then select Nonparametric Tests,
2. Click on the variables that represent the scores at Time 1 and at Time 2 (e.g. fear of stats time1: fost1,
fear of stats time2: fost2). Click on the arrow to move these into the Test Pairs box.
3. Make sure that the Wilcoxon box is ticked in the Test Type section.
4. Click on the Options button. Choose Quartiles (this will provide the median scores for each
time point).