Marketing Research Final Notes
Marketing Research Final Notes
CHAPTER 6 – SAMPLING
Sampling involves selecting a relatively small number of elements from a larger defined
group of elements and expecting that the information gathered from the small group
will enable accurate judgments about the larger group.
Sampling decisions influence the type of research design, the survey instrument, and the
actual questionnaire.
General idea of the target population and the key characteristics that will be used to draw the
sample of respondents, researchers can customize the questionnaire to ensure that it is of
interest to respondents and provides high-quality data.
Population is the identifiable set of elements of interest to the researcher and pertinent
to the information problem.
Sampling units are target population elements actually available to be used during the
sampling process. Sampling frames are eligible sampling units.
One of the major goals of researching small, yet representative, samples of members of a
defined target population is that the results of the research will help to predict or estimate
what the true population parameters are within a certain degree of confidence.
Factors Underlying Sampling Theory
With an understanding of the basics of the CLT, the researcher can do the following:
There are numerous opportunities to make mistakes that result in some type of bias in any
research study. This bias can be classified as either sampling error or non sampling error.
(2) sampling error can be determined only after the sample is drawn and data collection is
completed.
Sampling error is any bias that results from mistakes in either the selection process for
prospective sampling units or in determining the sample size. Moreover, random
sampling error tends to occur because of chance variations in the selection of sampling
units.
(2) nonprobability.
Probability sampling, each sampling unit in the defined target population has a known
probability of being selected for the sample. The actual probability of selection for
each sampling unit may or may not be equal depending on the type of probability
sampling design used. Specific rules for selecting members from the population for
inclusion in the sample are determined at the beginning of a study to ensure
Probability sampling enables the researcher to judge the reliability and validity of data
collected by calculating the probability that the sample findings are different from the
defined target population.
Nonprobability sampling, the probability of selecting each sampling unit is not known.
Therefore, sampling error is not known. Selection of sampling units is based on intuitive
judgment or researcher knowledge. The degree to which the sample is representative of the
defined target population depends on the sampling approach and how well the researcher
executes the selection activities.
Types of Probability and Nonprobability Sampling Methods
Probability Sampling
Simple random sampling – every sampling unit has a known and equal chance of being
selected.
Systematic random sampling requires that the defined target population be ordered in
some way; it creates a sample of objects or prospective respondents that is very similar in
quality to a sample drawn using simple random sampling.
Researchers must be able to secure a complete listing of the potential sampling units that
make up the defined target population.
Stratified random sampling involves the separation of the target population into different
groups, called strata, and the selection of samples from each stratum. Stratified random
sampling is similar to segmentation of the defined target population into smaller, more
homogeneous sets of elements.
To ensure that the sample maintains the required precision, representative samples must be
drawn from each of the smaller population groups (stratum). Drawing a stratified random
sample involves three basic steps:
Two common methods are used to derive samples from the strata
Proportionately stratified sampling, the sample size from each stra- tum is dependent on
that stratum’s size relative to the defined target population. Therefore, the larger strata are
sampled more heavily because they make up a larger percentage of the target population.
Disproportionately stratified sampling, the sample size selected from each stratum is
independent of that stratum’s proportion of the total defined target population. This approach
is used when stratification of the target population pro- duces sample sizes for subgroups
that differ from their relative importance to the study.
Optimal allocation sampling. In this method, consideration is given to the relative size of
the stratum as well as the variability within the stratum to determine the necessary sample
size of each stratum.
Multisource sampling is being used when no single source can generate a large or low
incidence sample. While researchers have shied away from using multiple sources, mainly
because sampling theory dictates the use of a defined single population, changing
respondent behaviors (e.g., less frequent use of e-mail and more frequent use of social
media) are supporting multi source sampling.
Cluster sampling is similar to stratified random sampling, but is different in that the
sampling units are divided into mutually exclusive and collectively exhaustive subpopulations
called clusters. Each cluster is assumed to be representative of the heterogeneity of the
target population.
\
A popular form of cluster sampling is area sampling. In area sampling, the clusters are
formed by geographic designations.
Nonprobability Sampling
Consider how precise the estimates must be and how much time and money are available to
collect the required data, since data collection is generally one of the most expensive
components of a study. Sample size determination differs between probability and
nonprobability designs.
1.The population variance, which is a measure of the dispersion of the population, and its
square root, referred to as the population standard deviation. The greater the vari- ability in
the data being estimated, the larger the sample size needed.
2. The level of confidence desired in the estimate. Confidence is the certainty that the true
value of what we are estimating falls within the precision range we have selected. For
example, marketing researchers typically select a 90 or 95 percent confidence level for their
projects. The higher the level of confidence desired is the larger the sample size needed.
In the previously described formulas, the size of the population has no impact on the deter-
mination of the sample size. This is always true for “large” populations. When working with
small populations, however, use of the earlier formulas may lead to an unnecessarily large
sample size. If, for example, the sample size is larger than 5 percent of the population then
the calculated sample size should be multiplied by the following correction factor:
N/(N + n − 1)
where
N = Population size
n = Calculated sample size determined by the original formula
Sample size formulas cannot be used for nonprobability samples. Determining the sample
size for nonprobability samples is usually a subjective, intuitive judgment made by the
researcher based on either past studies, industry standards, or the amount of resources
available.
A sampling plan is the blueprint to ensure the data collected are representative of the
population. A good sampling plan includes the following steps:
Step 2: Select the Data Collection Method Using the problem definition, the data
requirements, and the research objectives, the researcher chooses a method for collecting
the data from the population. Choices include some type of interviewing approach (e.g.,
personal or telephone), a self-administered survey, or perhaps observation. The method of
data collection guides the researcher in selecting the sampling frame(s).
Step 3: Identify the Sampling Frame(s) Needed A list of eligible sampling units must be
obtained. The list includes information about prospective sampling units (individuals or
objects) so the researcher can contact them. An incomplete sampling frame decreases the
likelihood of drawing a representative sample. Sampling lists can be created from a number
of different sources (e.g., customer lists from a company’s internal database, random-digit
dialing, an organization’s membership roster, or purchased from a sampling vendor).
Step 4: Select the Appropriate Sampling Method The researcher chooses between
probability and nonprobability methods. If the findings will be generalized, a probability
sampling method will provide more accurate information than non probability sampling
methods. As noted previously, in determining the sampling method, the researcher must
consider seven factors: (1) research objectives; (2) desired accuracy; (3) availability of
resources; (4) time frame; (5) knowledge of the target population; (6) scope of the re-
search; and (7) statistical analysis needs.
Step 5: Determine Necessary Sample Sizes and Overall Contact Rates In this step of a
sampling plan, the researcher decides how precise the sample estimates must be and how
much time and money are available to collect the data. To determine the appropriate sample
size, decisions have to be made concerning (1) the variability of the population
characteristic under investigation, (2) the level of confidence desired in the estimates,
and (3) the precision required. The researcher also must decide how many completed
surveys are needed for data analysis.
At this point the researcher must consider what impact having fewer surveys than initially
desired would have on the accuracy of the sample statistics. An important question is “How
many prospective sampling units will have to be contacted to ensure the estimated sample
size is obtained, and at what additional costs?”
Step 6: Create an Operating Plan for Selecting Sampling Units The researcher must
decide how to contact the prospective respondents in the sample. Instructions should be
written so that interviewers know what to do and how to handle problems contacting
prospective respondents. For example, if the study data will be collected using mall-intercept
interviews, then interviewers must be given instructions on how to select respondents and
conduct the interviews.
Step 7: Execute the Operational Plan This step is similar to collecting the data from
respondents. The important consideration in this step is to maintain consistency and control.
CHAPTER 7 – Measurement and scaling
Organized and stored in a specific, defined do not have values and cannot be
format, and categorized, organized, and processed
managed in fixed fields
Measurement Process
Measurement is the process of determining the intensity (or amount) of information about
constructs, concepts, or objects.
The measurement process consists of two tasks.
● Construct selection/development with the goal to precisely identify and define what is
to be measured.
● Scale measurement determines how to precisely measure each construct.
What Is a Construct?
● A construct is an abstract idea/concept formed in a person’s mind through a
combination of construct characteristics and the characteristics are the variables that
collectively define the concept and make measurement of the concept possible.
● For example, to measure “customer interaction from the seller’s perspective,”
researchers may use Agree–Disagree Scales.
● The resultant score is called a scale, an index, or a summated rating.
● So to identify restaurant satisfaction, researchers may conduct a literature review,
perform interviews, and use personal experience to form the framework for
measuring the construct.
Construct Development
● A construct is an unobservable concept measured indirectly by a group of related
variables.
● The construct is measured with scale measurements of each individual indicator
variable.
● Construct development begins with an accurate definition of the purpose of the study
and the research problem.
● Researchers identify characteristics that define the concept and then develop
methods of indirectly measuring the concept.
● At the heart of construct development is the need to determine exactly what is to be
measured.
● If an object’s features can be directly measured, then the feature is a concrete
variable and not an abstract construct.
Scale Measurement
● Involves assigning a set of scale descriptors to represent the range of possible
responses.
● The scale descriptors are a combination of labels and numbers which are assigned
using a set of rules.
● Scale measurement assigns degrees of intensity to the responses.
● The degrees of intensity are commonly referred to as scale points.
Please circle the number of children under 18 years of age currently living in your household.
0 1 2 3 4 5 6 7 If more than 7, please specify: Blank
Example 2:
In the past seven days, how many times did you go online to shop at Amazon.com?
Blank # of times
Example 3:
I am never influenced
by advertisements.
Knowledgeable 7 6 5 4 3 2 1 Unknowledgeable
Skilled 7 6 5 4 3 2 1 Unskilled
Qualified 7 6 5 4 3 2 1 Unqualified
Experienced 7 6 5 4 3 2 1 Inexperienced
Trustworthine
ss:
Reliable 7 6 5 4 3 2 1 Unreliable
Sincere 7 6 5 4 3 2 1 Insincere
Trustworthy 7 6 5 4 3 2 1 Untrustworthy
Dependable 7 6 5 4 3 2 1 Undependable
Honest 7 6 5 4 3 2 1 Dishonest
● Descriptive questionnaires collect data that can be turned into knowledge about a
person, object, or issue.
● Predictive survey questionnaires predict changes in attitudes and behaviors, and
test hypotheses.
● A questionnaire designed to gather primary data.
● A pilot study is a small-scale version of the intended main research study, including
all subcomponents, such as the data collection and analysis, from 50 to 200
respondents who are representative of the main study’s defined target population.
● A pretest is a descriptive research activity representing a small-scale investigation of
10 to 30 subjects representative of the main study’s defined target population but
focus on a specific subcomponent of the main study.
Questionnaire Design
Step 1: Confirm research information objectives and data requirements.
Step 2: Select appropriate data collection method.
Step 3: Develop questions and scaling.
Step 4: Determine layout and evaluate questionnaire.
Step 5: Obtain initial client approval.
Step 6: Pretest, revise, and finalize questionnaire.
Step 7: Implement the survey.
open-ended closed-ended
Responses are unaided and unlimited but often They require a choice from a set of responses or
skipped. scale points.
These questions are more difficult to code for Reduces respondent’s effort.
analysis.
They also require more thought from Easy to answer, easy to code.
respondents.
Bad questions prevent or distort communications between the researcher and the
respondent.
● Unanswerable: either because the respondent does not have access to the needed
information or none of the answer choices apply.
● Leading (or loaded): when the respondent is directed to a response that would not
ordinarily be given.
● Double-barreled: when the respondent is asked to address more than one issue at a
time.
Step 4: Determine Layout and Evaluate Questionnaire
● An introductory section gives respondent’s an overview and screening questions
identify qualified respondents.
● The second section is the research questions section based on the objectives and
arranged from general to specific.
● The last section includes lifestyle, social media usage, and demographics, then
ends with a thank-you statement.
Review questionnaire
● Focus on whether each question is necessary and if length is acceptable.
● Check that the survey meets the research information objectives.
● Make sure the scale format and instructions work well.
● Check that questions move from general to specific.
● With online surveys, view it on the screen as a respondent would.
● Physically inspect mail, drop-off, and self-reporting surveys.
● Self-administered questionnaires should look professional and visually appealing.
● Frequency distributions can be useful for examining the different values for a
variable.
● Researchers use descriptive statistics to summarise information
● The mean, median, and mode are measures of central tendency.
● These measures locate the center of the distribution.
● The mean, median, and mode are sometimes also called measures of location.
● MEAN is the average value within the distribution and is the most commonly
used measure of central tendency, used for interval and ratio tapi bila ada outliers,
guna median/mode
● MEDIAN is the middle value of the distribution when ordered in either ascending
or descending sequence, usually used in ordinal data
● MODE is the value that appears in the distribution most often, usually used in
nominal data
Measures of Dispersion
● The range defines the spread of the data, the endpoints of the distribution of values.
● Standard deviation is the average distance of the distribution values from the mean.
Relationship
Independent Dependent Variable
Examined Statistical Test
Variable (I V) (DV)
(hypothesis)
Single Variable
Blank Blank
Nominal
5) Cross-Tabulation
● Cross-tabulation is useful for examining relationships and reporting the findings for
two variables.
● Purpose to determine whether differences exist between subgroups.
● Simplest methods for describing sets of relationships.
● It is a frequency distribution of responses on two or more variables.
● To conduct cross tabulation, the responses for each of the groups are tabulated and
compared.
● Chi-square (X²) analysis enables us to test whether there are any statistical
differences between the responses for the groups that nominally scaled variables
● Chi-square (X²) analysis statistic answers questions that cannot be analyzed with
other types of analysis, such as ANOVA or t tests.
6) Comparing Means—Independent Versus Related Samples
● A frequently examined question is whether the means of two groups of respondents
are significantly different.
● An independent samples comparison would be the results of interviews with male
and female coffee drinkers.
● A related samples compares the average cups of coffee per day by male students
with average soft drinks per day by the same male students.
● Although questions are independent, the respondents are the same.
●
● Sometimes marketing researchers want to test for differences in two means for
variables in the same sample.
● To examine this, researchers use the paired sample test for the difference in two
means.
● Examines whether two means from two different questions using the same scaling
and answered by the same respondents are different.
Perceptual Mapping
● Perceptual mapping is a process that is used to develop maps that show the
perceptions of respondents.
● Tahu dah kan cane buat ni
● Ada rankings, medians, and mean ratings.
●
● Identifies gaps in perceptions, helping position new products development.
● Identifies a company’s image, helping position them against competitors
● Can assess advertising effectiveness in positioning the brand for advertising
● Could be used to assess similarities of brands and channel outlets in distribution
CHAPTER 12 – Examining Relationship in Quantitative Research
Strength of Association: The strength of the relationship indicates how closely the
variables are related and is typically quantified by the correlation coefficient:
For example, a strong positive correlation coefficient between time spent in the store and
amount spent indicates that customers who spend more time also tend to spend more
money.
Type of Relationship: The nature of the relationship can significantly affect how it is
analyzed:
Covariation The amount of change in one variable that is consistently related to the change
in another variable of interest.
Scatter diagram A graphic plot of the relative position of two variables using a horizontal
and a vertical axis to represent the values of the respective variables.
The scatter plot might show a random distribution of dots, resembling a circle with no
discernible pattern.
The relationship shown is more complex, where the pattern of dots changes direction;
initially, small increases in Y correspond with increases in X, but as Y increases further, X
begins to decrease.
Interpretation: This curvilinear relationship indicates that the association between the
variables varies depending on their levels. Such patterns are challenging because the
direction of the relationship changes, making linear statistical methods inadequate for a full
understanding.
Similar to linearity but with an opposite pattern; increases in Y correspond with decreases in
X.
Interpretation: This scenario reflects a negative linear relationship, still indicating high
covariation but in opposite directions. This relationship is crucial for scenarios where one
variable inversely affects another.
The dots align in a pattern that could be depicted as a straight line or an elongated ellipse,
indicating that as values of Y increase, values of X also increase.
Interpretation: This type of diagram illustrates a positive linear relationship where the
variables change in the same direction. The covariation is considered high, suggesting a
strong and direct association between the two variables.
Correlation Analysis
Scatter Diagrams: Scatter diagrams are graphical tools used to visualize the relationship
between two quantitative variables. They help identify patterns in data such as trends,
clusters, and potential outliers, which indicate how one variable changes in relation to
another.
The sign of the PCC indicates the direction of the relationship (positive or negative), while its
magnitude shows how strong the association is. Higher absolute values of the coefficient
(closer to 1) signify a stronger relationship.
For example, a calculated PCC of 0.61 between Starbucks coffee consumption and income
levels suggests a moderately strong positive relationship, implying that higher income is
associated with greater coffee consumption.
● Null Hypothesis (H0): There is no association between the variables (PCC is zero in
the population).
● Alternative Hypothesis (H1): There is an association between the variables (PCC is
not zero in the population).
Statistical significance is typically assessed using a p-value. If the p-value is less than the
chosen level of significance (commonly 0.05), the null hypothesis is rejected, affirming that
the observed correlation is likely not due to random chance.
For instance, if the p-value for the correlation between coffee consumption and income is
0.05, it suggests that there's only a 5% probability that such a correlation would occur if there
was actually no association in the population, thus providing grounds to claim a real
relationship exists.
Rule of Thumb for Interpreting the Pearson Correlation Coefficient:
When evaluating the strength of association between two variables using the Pearson
correlation coefficient, it's helpful to apply these general guidelines:
These thresholds provide a quick way to assess the significance of the correlation
measured, aiding in swift and effective decision-making or further statistical analysis.
Coefficient of determina- tion (r2) A number measuring the proportion of variation in one
variable ac- counted for by another. The r2 measure can be thought of as a percentage and
var- ies from 0.0 to 1.00.
Pearson correlation coefficient is a primary tool used to quantify the strength and direction of
a linear relationship between two variables. This coefficient can range from -1 (perfect
negative correlation) to +1 (perfect positive correlation), with 0 indicating no correlation.
A strong correlation coefficient (e.g., 0.776 in the case of Santa Fe Grill relating customer
satisfaction to the likelihood of recommending the restaurant) suggests a robust linear
association between the variables.
When this correlation is squared, it yields the coefficient of determination (r²), which in this
example is 0.602. This indicates that approximately 60.2% of the variability in the likelihood
of recommending the restaurant can be explained by variations in customer satisfaction.
Statistical vs. Substantive Significance:
Statistical Significance: Refers to the likelihood that the correlation observed is not due to
random chance. This is often determined through hypothesis testing where a p-value less
than a threshold (commonly 0.05) suggests significant results.
In cases where correlations are weak or the data scales are not interval or ratio (e.g., ordinal
or nominal scales), alternative methods like the Spearman rank order correlation coefficient
are recommended. This measure is more appropriate for data that do not meet the
assumptions required for the Pearson correlation coefficient.
Correlation and regression are fundamental statistical tools used in analytics to describe
relationships between variables. The correlation coefficient quantifies the direction and
strength of a linear relationship between two variables, while regression analysis builds upon
this to predict values and test causal theories.
Correlation Coefficient:
Regression Analysis:
Bivariate Regression: A statistical technique that analyzes the linear relationship between
two variables by estimating coefficients for an equa- tion for a straight line. One variable is
designated as a dependent variable and the other is called an indepen- dent or predictor
variable.
Involves a simple relationship between two variables (one independent and one dependent).
For example, predicting sales volume based on price per unit can be modeled by the linear
equation: Sales Volume (Y)=a+b×Price per Unit (X)Sales Volume (Y)=a+b×Price per Unit (X)
Here, aa represents the intercept (often zero if no sales occur without pricing), and bb is the
slope, indicating how sales volume changes with price adjustments.
Evaluating Regression Models:
Least Squares Method: This common estimation technique minimizes the sum of the
squared differences (errors) between observed and predicted values, ensuring the best fit
line through the data points.
Multiple Regression:
Example: In a business setting, a manager might use multiple regression to understand how
various factors like age, income, employee satisfaction, and service speed collectively
impact customer satisfaction.
OLS Estimation: Ordinary Least Squares (OLS) is employed to estimate the regression
coefficients that will predict the dependent variable with the least error. These coefficients
(denoted as bb) represent the individual impact of each independent variable on the
dependent variable.
Error in Regression:
Error Term (ei): Represents the discrepancy between actual and predicted values of the
dependent variable for each observation. The sum of these squared errors across all
observations gives a measure of the overall model error, reflecting the accuracy of the
regression model.
Practical Implications:
Decision Making: By understanding and applying both correlation and regression analysis,
managers can make more informed decisions, predict future trends, and allocate resources
more effectively.
Model Limitations: It's crucial to recognize that statistical significance does not imply
causality. Regression models provide predictive insights and should be interpreted within the
context of sound theoretical reasoning and business logic.
Significance
Statistical Significance:
T-tests and F-statistics: These tests assess the validity of the coefficients and the overall
regression model, respectively. The t-test evaluates if individual coefficients significantly
differ from zero, while the F-statistic tests the overall model fit, comparing the variance
explained by the model to the unexplained variance.
Substantive Significance:
Coefficient of Determination (r²): This measure reflects the proportion of variance in the
dependent variable explained by the independent variables. For example, an r² of .230
indicates that 23% of the variability in customer satisfaction can be explained by variations in
perceptions of price reasonableness.
Strength of Relationship: Examining r² and the size of the regression coefficients helps
gauge how changes in independent variables affect the dependent variable. Substantive
significance considers whether the size and impact of these coefficients are meaningful in
practical, real-world terms.
Multiple Regression Analysis
A statistical technique which analyzes the linear relationship between a dependent variable
and multiple independent vari- ables by estimating coefficients for the equation for a straight
line.
Extension of Bivariate Regression: Multiple regression allows for the inclusion of various
independent variables to better understand their collective impact on a dependent variable.
This is crucial in complex scenarios where multiple factors influence outcomes such as
business, economics, and social sciences.
Normality: The model assumes that the distribution of residuals (errors) is normal. This
underpins the validity of many statistical tests, including the computation of confidence
intervals and hypothesis tests.
Linearity: The relationship between the independent and dependent variables is assumed
to be linear. Non-linear relationships require different analytical approaches, such as
transformations or non-linear modeling.
Interpreting r² and Adjusted r²: These statistics are crucial for understanding the
effectiveness of the model. While r² indicates the percentage of variance explained by the
model, adjusted r² accounts for the number of predictors and the sample size, providing a
more accurate assessment in the context of multiple regression.
Comparative Influence: Beta coefficients and their statistical significance help in comparing
the relative influence of different independent variables on the dependent variable.
Structural Equation Modeling (SEM):
Complex Relationships:
SEM allows for modeling of complex relationships where variables can be both dependent
and independent. This is useful in layered structures where, for example, an employee's
commitment might influence their performance but is itself influenced by factors like pay,
teamwork, and work environment.
Path Models:
SEM utilizes path models to illustrate the relationships among variables. These models are
comprised of multiple stages and can include intervening or mediating variables that help
elucidate indirect effects.
For cases involving complex models with multiple stages, PLS-SEM is used. It is similar to
ordinary least squares but is designed to handle path models with multiple constructs
measured by multiple variables. This method focuses on maximizing the variance explained
in the dependent variables.
Software Tools:
Tools like SmartPLS facilitate the use of PLS-SEM by providing a user-friendly interface
where researchers can draw path models, import data, and execute the analysis with ease.
These tools also provide various outputs such as Cronbach's Alpha for reliability, validity
testing, and explained variance (R²).
Reflects the amount of variance in the dependent variables explained by the independent
variables. For instance, in the Santa Fe Grill example, the R² values indicated strong
predictive power of the model regarding employee commitment and performance.
SEM analyses both the statistical significance of paths (using t-tests for individual paths and
F-statistics for overall model fit) and the substantive significance, which considers the
practical importance of the relationships identified.
Advantages of PLS-SEM in Research:
Predictive Capability: The ultimate test of a structural model's utility is its ability to predict
outcomes of interest effectively, as demonstrated by the predictive R² values in the models.