Research Design Defining
Research Design Defining
Research Design Defining
(Module 6)
• In our research some variables are fairly easy to define, manipulate, and measure.
• For example, if a researcher is studying the effects of exercise on blood pressure, she can find the
relation by:
• Manipulating the amount of exercise by length of time and see the results or
• Manipulate the intensity of the exercise (and monitoring target heart rates). • She can also
periodically measure blood pressure during the course of the study by a machine built to measure in a
consistent and accurate manner.
• Does the fact that a machine exists to take this measurement mean that the measurement is always
accurate? The answer is Yes and No. (We shall discuss this issue in Module 6 when we address
measurement error.)
• Now let us suppose that a researcher wants to study hunger, depression, aggression, in a group of
patients.
• One researcher’s definition of what it means to be hungry may be vastly different from other
researcher. Even patients may define hunger in different ways.
• The solution of this problem is that researcher should define, hunger, and this definition is called
operational definition, that is, criteria, the researcher uses to measure or manipulate it.
• In other words, the investigator might define hunger in terms of specific activities such as not having
eaten for 12 hours.
• Thus one operational definition of hunger could be that simple: Hunger occurs when 12 hours have
passed with no food intake
• Researchers must operationally define all variables: those measured (dependent variables) and those
manipulated (independent variables)
• The question becomes how did he operationally define anxiety? There are different ways as following
• Anxiety can be defined as the number of nervous actions displayed in a 1-hour time period, or
• As a person’s score on a GSR (Galvanic Skin Response) machine or on the Taylor Manifest Anxiety
Scale. Some measures are better than others, better meaning more reliable and valid (concepts we
discuss in Module 6).
• Once other investigators understand how a researcher has operationally defined a variable, they can
replicate the study if they so desire.
• They can better understand the study and whether it has problems. They can also better design their
own studies based on how the variables were operationally defined
Exercise 1. You are a teacher in a school, and want to send a team of 3 students to participate in an
interschool poetry competition. You have to select best 3 students among the school. Now, Define BEST
THREE STUDENTS
2. You are observing “How many people abide by road rules, by checking, number of people stop on
traffic signal”. How will you define “stopping at traffic signal – break, break for time, car rolling slowly,
etc.”
Properties of Measurement
• After operationally defining independent and dependent variables, next step is to consider the level of
measurement of the dependent variable.
• There are four levels of measurement, each based on the characteristics, or properties, of the data,
which can be:
• Identity,
• magnitude,
• Absolute zero.
• Identity – In this case numbers are allocated just for the sake of identification.
• The number cannot be used in mathematical operations. Thus numbers assigned are just to convey a
particular meaning.
• Similarly, if participants in a study have different political affiliations, they receive different scores
(numbers)
• This means that numbers have an inherent order from smaller to large. For example, Position in Class,
Level of Education or Rank in an Organization
• Variables having Identity and Magnitude are measured on Ordinal Scale.
• Equal Intervals – also called Equal Unit Size means that difference between numbers anywhere on the
scale are the same
• In Most business researches, variables are taken as having equal interval or any variable where the
difference between two units is the same
• as difference between any of the following or previous two units for instance the difference between 4
and 5 is the same as the difference between 76 and 77 i-e 1.
• Variables with Equal Intervals, Magnitude and Identification Properties are measured on Interval Scale.
• Absolute/true zero – means that the zero as a response represents the absence of the property being
measured (e.g., no money, no behavior, none correct)
• In other words, A property of measurement in which assigning a score of zero indicates an absence of
the variable being measured
• However, temperature on 0 is not absolute zero as it still has some effect and we cannot say no
temperature.
SCALES OF MEASUREMENT
• The level, or scale, of measurement depends on the properties of the data. There are four scales of
measurement:
• nominal,
• ordinal,
• interval, and
• ratio.
• Each of these scales has one or more of the properties described in the previous section.
• We discuss the scales in order, from the one with the fewest properties to the one with the most, that
is, from the least to the most sophisticated.
• As we see in later modules, it is important to establish the scale of data measurement in order to
determine the appropriate statistical test to use when analyzing the data.
Nominal Scale
• We can have 5 colors like Red, Blue, Orange, Green and Yellow and could number them 1 to 5 or 5 to 1
or number them in a mix, here the numbers are assigned to color just for the purpose of identification,
and ordering them
• The only mathematical operation we can perform with nominal data is to count.
Ordinal Scale –
• Ordinal Scale is ranking of responses, for instance Ranking Cyclist at the end of the race at the position
1, 2 and 3.
• Not these are rank and the time distance between 1 and 2 may well not be the same as between 2 and
3, so the distance between points is not the same but there is an order present,
• when responses have an order but the distance between the response is not necessarily same, the
items are regarded or put into the Ordinal Scale.
• Therefore an ordinal scale lets the researcher interpret gross order and not the relative positional
distances.
• This is similar to three positions in a class – difference is there, but not equal.
• The numbers represent a quality being measured (identity) and can tell us whether a case has more of
the quality measured or less of the quality measured than another case (magnitude). The distance
between scale points is not equal. Ranked preferences are presented as an example of ordinal scales
encountered in everyday life.
Interval Scale
• A normal survey rating scale is an interval scale for instance when asked to rate satisfaction with a
training on a 5 point scale, from Strongly Agree, Agree, Neutral, Disagree and Strongly Disagree, an
interval scale is being used.
• It is an interval scale because it is assumed to have equal distance between each of the scale elements
i.e. the Magnitude between Strongly Agree and Agree is assumed to be the same as Agree and Strongly
Agree.
• This means that we can interpret differences in the distance along the scale.
• We contrast this to an ordinal scale where we can only talk about differences in order, not differences
in the degree of order i.e. the distance between responses.
• Identity
• Magnitude
• Equal distance
• Variables which fulfill the above mentioned properties are put in this scale. The equal distance
between scale points helps in knowing how many units greater than, or less than, one case is from
another. The meaning of the distance between 25 and 35 is the same as the distance between 65 and
75.
Ratio Scale
• The factor which clearly defines a ratio scale is that it has a true zero point.
• The simplest example of a ratio scale is the measurement of length (disregarding any philosophical
points about defining how we can identify zero length) or money.
• Having zero length or zero money means that there is no length and no money but zero temperature
is not an absolute zero, as it certainly has its effect.
• Ratio scales of measurement have all of the properties of the abstract number system.
• Identity
• Magnitude
• Equal distance
• Absolute/true zero
• These properties allow to apply all possible mathematical operations that include addition,
subtraction, multiplication, and division. The absolute/true zero allows us to know how many times
greater one case is than another. Variables falling in this category and having all the above mentioned
numerical properties fall in ratio scale.
http://www.mnestudies.com/research/types-measurement-scales
• Physiological measures – blood pressure, respiration rate, etc. (These are measurable)
• Scale of Measurement
Type of Measures
• Self-Report Measures
• Physical Measures
Type of Measures
• Behavioral self-report measures typically ask people to report how often they do something such as
how often they eat a certain food, eat out at a restaurant, go to the gym, etc. – Just statement of facts
about one-self
• Cognitive self-report measures ask individuals to report what they think about something, such as
what do you think about canteen service – Involves judgement
• Affective self-report measures ask individuals to report how they feel about something. Questions
concerning emotional reactions such as happiness, depression, anxiety, or stress lie in Affective domain.
Many psychological tests are affective self- report measures. These tests also fit into the category of
measurement tests described in the next section.
• Tests: Tests are measurement instruments used to assess individual differences in various content
areas.
• Psychologists frequently use two types of tests: 1. Personality tests and 2. Ability tests.
• Many personality tests are also affective self-report measures; they are designed to measure aspects
of an individual s personality and feelings about certain things.
• Ability tests, however, are not self-report measures and generally fall into two different categories:
aptitude tests and achievement tests. Aptitude tests measure an individual s potential to do something,
whereas achievement tests measure an individual s competence in an area.•
• Behavioral measures are often referred to as observational measures because they involve observing
what a participant does.
• Behavioral measures can be applied to anything a person or an animal. For example, the way men and
women carry their bags, or how many follow the road signs.
• Indirect
• In direct observations, the participants may become cautious and may react in an unnatural way. This
affects the results. This response of participants is called “reactivity”
• Observers may hide themselves, or use a more indirect means of collecting the data (such as
videotape).
• Using an unobtrusive means of collecting data reduces reactivity, that is, participants reacting in an
unnatural way to being observed.
• Weight is measured with a scale, blood pressure with and temperature with a dedicated apparatus.
• A physical measure is not simply an observation. Instead, it is a measure of a physical activity that
takes place in the brain or body.
• Keep in mind that humans are still responsible for running the equipment that takes the measures and
ultimately for interpreting the data provided by the measuring instrument. Thus even when using
physical measures, a researcher needs to be concerned with the accuracy of the data.
• In other words, the measuring instrument must measure exactly the same way every time it is used.
• This consistency means that individuals should receive a similar output each time they use the
measuring instrument.
• For example, a bathroom scale needs to be reliable, that is, it needs to measure the same way every
time an individual uses it, otherwise it is useless as a measuring instrument.
Error in Measurement
• Consider some of the problems with the four types of measures discussed in the previous module (i.e.,
self-report, tests, behavioral, and physical).
• Some problems, known as method errors, stem from the experimenter and the testing situation. For
example,
• Does the individual taking the measures know how to use the measuring instrument properly?
• Both types of problems can lead to measurement error. In fact, a measurement is a combination of
the true score and an error score.
• The true score is what the score on the measuring instrument would be if there were no error.
• The error score is any measurement error (method or trait) (Leary, 2001; Salkind, 1997).
• The following formula represents the observed score on a measure, that is, the score recorded for a
participant on the measuring instrument used.
• The observed score is the sum of the true score and the measurement error. Observed score = True
score + Measurement error
• The observed score becomes increasingly reliable (more consistent) as we minimize error and thus
have a more accurate true score.
• True scores should not vary much over time, but error scores can vary tremendously from one testing
session to another.
• We can make sure that all the problems related to the four types of measures are minimized.
• These problems include those in recording or scoring data (method error) and those in understanding
instructions, motivation, fatigue, and the testing environment (trait error).
• A reduction in error leads to an increase in reliability, i.e. if there is no error, reliability is equal to 1.00,
the highest possible reliability score.
• As error increases, reliability drops – The greater the error, the lower the reliability of a measure
• A correlation coefficient measures the degree of relationship between two sets of scores (reading)
and can vary between −1.00 and +1.00
• The stronger the relationship between the variables, the closer the coefficient to either −1.00 or +
1.00.
• Similarly, the weaker the relationship between the variables, the coefficient are closer to 0
• Suppose then that of individuals measured on two variables, the top-scoring individual on variable 1
was also top scoring on variable 2, the second-highest-scoring person on variable 1 was also the second
highest on variable 2, and so on down to the lowest- scoring person.
• In this case there would be a perfect positive correlation ( 1.00) between variables 1 and 2.
• In the case of a perfect negative correlation (− 1.00), the person having the highest score on variable 1
would have the lowest score on variable 2, the person with the second- highest score on variable 1
would have the second-lowest score on variable 2, and so on.
Types of Reliability
• Test/retest reliability,
• Alternate-forms reliability,
• Interrater reliability.
• Each type provides a measure of consistency, but they are used in different situations.
• Test/ Retest Reliability: This deals with repeating the same test on a second occasion, is known as
test/retest reliability
• If the test is reliable, we expect the results for each individual to be similar. That is, the resulting
correlation coefficient will be high (close to 1.00).
• However, it is ideal case. Some error will be present in each measurement so the correlation
coefficient will not be 1.00 in most of the cases, but we expect it to be 0.80 or higher.
• Alternate-Forms Reliability – Using alternate forms of the testing instrument and correlating the
performance of individuals on the two different forms.
• In this case the tests taken at times 1 and 2 are different but equivalent or parallel (hence the terms
equivalent-forms reliability and parallel-forms reliability are also used)
• For example: 1. We want to find the reliability for a test of mathematics comprehension, so we create
a set of 100 questions that measure that construct. Then we randomly split the questions into two sets
of 50 (set A and set B), and administer those questions to the same group of students a week apart. i.e.
say one set of 50 questions on Monday and the next 50 questions to same students on say Friday or
next Monday and then results are correlated 2. We have made 100 different sample of same material.
We take 50 samples in one test and 50 in the same machine after some time. (This way we confirm
calibration also)
• Since, tests taken at times 1 and 2 are different but equivalent or parallel (hence the terms equivalent-
forms reliability and parallel-forms reliability are also used)
• Split-Half Reliability - a third means of establishing reliability is by splitting the items on the test into
equivalent halves and correlating scores on one half of the items with scores on the other half.
• This split-half reliability gives a measure of the equivalence of the content of the test but not of its
stability over time as test/retest, what we did in Alternate form reliability.
• The biggest problem with split-half reliability is determining how to divide the items so that the two
halves are in fact equivalent.
• For example, it would not be advisable to correlate scores on multiple choice questions with scores on
short-answer or essay questions.
• What is typically recommended is to correlate scores on even-numbered items with scores on odd-
numbered items. •
• Interrater Reliability – This measures the reliability of observers rather than tests
• It measure of consistency that assesses the agreement of observations made by two or more raters or
judges.
• Let us say that you are observing play behavior in children. Rather than simply making observations
on your own, it is advisable to have several independent observers collect data. • The observers all
watch the children playing but independently count the number and types of play behaviors they
observe.
• Once the data are collected, interrater reliability needs to be established by examining the percentage
of agreement among the raters.
• If the raters data are reliable, then the percentage of agreement should be high.
• If the raters are not paying close attention to what they are doing or if the measuring scale devised for
the various play behaviors is unclear, the percentage of agreement among observers will not be high.
• Although interrater reliability is measured using a correlation coefficient, the following formula offers
a quick means of estimating interrater reliability:
• Thus, if your observers agree 45 times out of a possible 50, the interrater reliability is 90% fairly high.
• However, if they agree only 20 times out of 50, then the interrater reliability is 40% low.
• Such a low level of agreement indicates a problem with the measuring instrument or with the
individuals using the instrument and should be of great concern to a researcher.
• alternative-forms reliability tells us whether the questions measure the same concepts (equivalency).
• Whether individuals perform similarly on equivalent tests at different times indicates the stability of a
test.
Disagreement = 38 times
VALIDITY
• Validity refers to whether a (statistical or scientific) study is able to draw conclusions that are in
agreement with statistical and scientific laws.
• This means if a conclusion is drawn from a given data set after experimentation, it is said to be
scientifically valid if the conclusion drawn from the experiment is scientific and relies on mathematical
and statistical laws.
• For instance, if researchers developed a new test to measure any parameter, (such as depression),
they might establish the validity of the test by correlating scores on the new test with scores on an
already established measure of depression, and as with reliability we would expect the correlation to be
positive.
• Coefficients as low as 0.20 or 0.30 may establish the validity of a measure (Anastasi & Urbina, 1997).
• In brief it means that the results are most likely not due to chance
Content validity:
• A systematic examination of the test content to determine whether it covers a representative sample
of the domain of behaviors to be measured assesses content validity.
• This type of validity is important to make sure that the test or questionnaire that is prepared actually
covers all aspects of the variable that is being studied. If the test is too narrow, then it will not predict
what it claims.
• In other words, a test with content validity has items that satisfactorily assess the content being
examined.
• To determine whether a test has content validity, researchers consult experts in the area being tested.
(* In fact this is a challenge to EM students, who prepare questionnaire *)
Face Validity
• Face validity simply addresses whether or not a test looks valid on its surface. Does it appear to be an
adequate measure of the conceptual variable? It is generally confused with content validity
Criterion validity:
• The extent to which a measuring instrument accurately predicts behavior or ability in a given area
establishes criterion validity.
• Two types of criterion validity may be used, depending on whether the test is used to estimate
present performance (concurrent validity) or to predict future performance (predictive validity).
• The SAT and GRE are examples of tests that have predictive validity because performance on the tests
correlates with later performance in college and graduate school, respectively.
• The tests can be used with some degree of accuracy to predict future behavior.
• A test used to determine whether someone qualifies as a pilot is a measure of concurrent validity. The
test is estimating the person’s ability at the present time, not attempting to predict future outcomes.
• Thus concurrent validation is used for the diagnosis of existing status rather than the prediction of
future outcomes.
Construct Validity
• The construct validity of a test assesses the extent to which a measuring instrument accurately
measures a theoretical construct or trait that it is designed to measure.
• Some examples of theoretical constructs or traits are verbal fluency, neuroticism, depression, anxiety,
intelligence, and scholastic aptitude. One means of establishing construct validity is by correlating
performance on the test with performance on a test for which construct validity has already been
determined. Thus performance on a newly developed intelligence test might be correlated with
performance on an existing intelligence test for which construct validity has been previously established.
Another means of establishing construct validity is to show that the scores on the new test differ across
people with different levels of the trait being measured. For example, if a new test is designed to
measure depression, you can compare scores on the test for those known to be suffering from
depression with scores for those not suffering from depression. The new measure has construct validity
if it measures the construct of depression accurately.
1. Content and construct validity 2. Face validity 3. A test to measure something (different) other than
what it claims to measure – establish validity through experiment on other things. 4. It is a concern of
validity of the test, because it does not measure, what it is supposed to measure.