Research Methods-Lectures
Research Methods-Lectures
Wither descriptive or explanatory research needs to have a frame of reference to interpret results
within and not just report results.
Descriptive research
The need for a frame of reference in descriptive research is fairly obvious, for example,
unemployment rate of an area is 15%, but is this high or low, is it increasing or decreasing over
time, how it compares with other areas or of other years, etc.
There is a need to provide a context to make sense of data and to design methods to ensure that
needed data to provide this context are collected.
HOW to do this? Collect data about other groups and about the same group over time.
Explanatory research
Deals with causal processes, the goal is to develop an explanation of the patterns in the data and
need to eliminate as many alternative explanations of the patterns as possible.
An experimental design to measure the effect of distributing flyers on building towers preference in
residential areas in Amman
E1 Intervention E2
Experimental 40% support 33% support Ediff = 33-40
Read flyers = -7%
C1 C2 C diff = 39-40
Control 40% support 39 support = -1%
This design uses only the top two cells of the experimental design; it looks at the same group over a
period of time.
Sampling
Sampling provides an efficient and accurate way of obtaining information about a large number of
cases. How efficient and accurate depends on the type of sample used, size of sample and the method of
collecting data from the sample. In the end the decision about samples will be a compromise between
cost, accuracy, the nature of research problem and the art of possible.
Probability
Simple random sampling, systematic sampling, stratified sampling, multistage sampling.
Sample size
Depends on two key factors:
degree of accuracy for sample, and
the extent to which there is variation in the population
There is a need to decide on how much error we are prepared to tolerate
Non response
Sample size reduction (eg. add 20%) &
Bias (old age, migrants, lower education, etc.) what is the bias and its extent (info from
observation, define characteristics of non respondents and adjust accordingly, then compare
with others characteristics from sample).
Analysing Data
Levels of measurement
There are three main levels of measurement: nominal, ordinal and interval/ratio
1. Nominal level variables: can be distinguished but not ranked (country, religion, sex, marital status)
2. Ordinal variable: its meaningful to rank the categories, but not possible too quantify difference
between categories (e.g. paid and unpaid job, part time or full time) categories can be ranked but
differences between categories cannot be quantified in numerical terms is an ordinal variable.
3. An interval/ratio variable: categories can be ranked and it is possible to quantify precisely the
differences between categories (e.g. income, if measured in money, is an interval because as well as
ranking respondents according to their income, the precise difference between income can be
quantified.
Level of measurement
Nominal Ordinal Interval
Are there difference categories: yes yes yes
Can categories be ranked: no yes yes
Can differences between
categories be specify numerically no no yes
1. Univariate analysis
Descriptive statistics
1. Nominal variables:
Frequency distributions
Histograms (charts)
N = 200
e.g. Frequency table for age groups (population over 15 years old)
16-20 20%
21-29 55%
30-39 5%
40-49 12%
50-60 7%
Total 100%
The mode is the age group 21-29
It is not typical, it might be typical for a group of 100 but not as a whole, some distributions have more
than one mode, vulnerable to how categories are combined, thus mode is unstable and open to
manipulation.
2. Ordinal variables
Frequency distributions
Categories of ordinal variables are ranked thus should be put in a rank order in a frequency table.
The cumulative percentage is a rolling addition of each of the percentages in earlier categories of the
variable. Thus, the cumulative percentage of 65 per cent means that 65 per cent visit their nighbours
weekly or more often.
age 12 14 16 18 20 22 24
Case 1 2 3 4 5 6 7 8 9 10
class KG1 KG2 1st 2nd 3ed 4th 5th 6th 7th 8th
median
If most cases in a distribution are in ranked categories close to the median category, the median is a
good summary of the group, if many cases are long way from median category its not so good.
1.
Case 1 2 3 4 5 6 7 8 9 10
no. of men 0 1 2 2 2 2 2 2 3 10
median
Median =2 Extreme score, high range underestimates the usefulness of the median
Range =10
Decile range = (3-1) = 2
2.
no. of men 0 1 1 2 2 2 2 2 2 3
median
Median=2 Low range reflects adequacy of median
Range = 3
Decile range = (2-1) =1
3.
no. of men 0 1 2 2 2 2 5 6 9 11
median
Median=2 High range accurately shows inadequacy of median
Range = 11
Decile range = (9-1) = 8
To avoid distorting effect of extreme cases, drop bottom 10 per cent of cases (first decile) and the
top 10 per cent and look at the middle 80 percent
University of Jordan, Dept. of Architecture Dr Firas Sharaf 2020 © 9
Scientific Research methodology
Interval variables
Variables which categories can be ranked and differences between categories can be quantified in
precise numerical amounts.
Frequency
Tables for interval variables are similar to ordinal variables. If interval variables have a large
number of values (e.g. age, income $, etc.) its better to group values (e.g. 10-19, 20-29, etc.). note
that its desirable to make all categories of similar width & have open ended category (e.g. 10-19,
20-29, 30 plus).
Central tendency (the mean X )
Its calculated by adding up scores for each case in the sample and divided by N, the member of
cases in that sample.
Case $ Income
1 12 000
2 13 000
3 15 000 Total income = 189000
4 16 000 Total cases N = 10
5 18 000
6 20 000 X = 18900
7 21 000
8 22 000
9 25 000
10 27 000 Main problem is it can be distorted by extreme cases, e.g. if a person earns
$1 million was added to the table below mean would be $108 091, thus mean doesn’t reflect
adequately reflect the bulk of the group. Another problem is its possible to obtain same mean for
two different distributions as:
1. Provides a measure of the summarizing value of a mean and tells us within what range of the
mean a given percentage of cases lies, i.e. to see how far each case is from the mean, then add up all
the deviations and obtain an overall average of these deviations to use as the measure for dispersion.
Variance s 2 = ∑ (Xi – X)2 Standard deviation s = √ s2
N
Xi = an individual’s score on the variable (e.g. age)
X = the mean, (e.g. 45)
N = total number of cases in the sample.
The lower s the better the mean is as a summary measure. The standard deviation (s) for Group1
(=5.5) is more satisfactory than s for Group 2 (=14). In other words s shows that the mean is a
more accurate summary for group 1 than for group 2.
From probability theory, it is known that in a normal distribution it is true that 68 % of cases will lie
within one standard deviations about or below the mean. In this case the size of a standard deviation
is 5.5, that is 5.5 years. Thus, 68% of cases will be within a range of 45 years (mean) plus or minus
5.5 years, i.e. 39.5 to 50.5 years. Probability theory tells us that 95 % of cases will always lie within
plus or minus two standard deviations of the mean, in this case 11 years of the mean (34-56 years).
When a distribution is not normal, the percentage of cases which lie within various numbers of
standard deviations of the mean cannot be predicted with as much precision as summarized in the
below table (from probability theory):
2. Standard deviation has another use; any score on a variable can be converted into a standard
score called (z- score), it’s a particular score expressed as a standard deviation units. e.g. if a mean
age is 50 years and s = 5, the age 60 is expressed with a z-score of 2 (mean+2s, i.e. 50 +10).
To convert any score into a z-score use the formula z = (Xi – X)
s
e.g. if age is 53, z = 53-50 = 0.6, for the age 45 z-score= -1(mean - 1 standard deviation)
5
Inferential statistics
Used to help explore from the patterns in a sample to likely patterns in the population of the sample.
There are two technique:
1. Interval estimates most common used in univariate analysis (using the mean, ordinal data)
2. Inference for non-interval variables (mean cannot be calculated, i.e. nominal or ordinal data)
1. Interval estimates
Samples are always liable to some error, it is crucial when trying to generalize from sample
estimates to use the relevant inferential statistics. With univariate analysis the most widely used
technique in survey research is to use the standard error which enables us to estimate the population
patterns within a range. This procedure is called interval estimation.
If in a sample the mean income is $18000, what is the mean for the population. Since samples are
unlikely to be a perfect reflection of the population (sampling error), we can’t use the sample mean
(sample estimate) to suggest the actual mean income for the population (population parameter).
We need a way of estimating how accurate our sample estimate is likely to be and how close is it
likely to be the population mean. Standard error of the mean formula is Sm = s
Sm = Standard error of the mean , s = standard deviation N total no. in sample √N
Probability theory says that for 95% of samples the population mean will be within ¯+ 2 standard
error units of the sample mean, i.e. in our sample there is a 95% chance that the population mean
will be within ¯+ 2 standard errors of the population mean. We can then estimate within a range
where the population mean is likely to be. This range is called the confidence interval and our
degree of certainty that the population mean will fall within that range. 95% is called the confidence
level. The units of Sm are the same of the variable concerned, e.g. income in $ or age in years.
What does this mean in our language? For example, if a sample mean is $18,000 and Standard
error (Sm) is $1,000. Thus there is a 95% chance that the population mean is within the range
$16,000-$20,000 (i.e. ¯+ 2 standard error). The size of S m is a function of sample size, i.e. the
smaller Sm the larger the sample size. Quadrupling the sample size halves Sm.
Look at the percentage of respondents answering a question in a particular way and ask how close
to it the percentage in the population is likely to be. E.g. sample of 1000, 52% intend to vote labor
(thus 48% will vote other way), estimate how the population will vote?
In summary, samples are always liable to some error, it is crucial when trying to generalize from
sample estimates to use the relevant inferential statistics. With univariate analysis the most widely
used technique in survey research is to use the standard error which enables us to estimate the
population patterns within a range. This procedure is called interval estimation.
Summary
A range of descriptive and inferential statistics for univariate analysis have been outlined. The
statistics chosen depend on the level of measurement of the variables and these are summarized in
the table below:
The main text of the social science research report or thesis consists of 5 chapters