Research Methods: Dr. Abeer Yasin
Research Methods: Dr. Abeer Yasin
Research Methods: Dr. Abeer Yasin
Methods
Dr. Abeer Yasin
Outline:
Sampling
Why Sample?
Sample Size.
Probability vrs Non probability Sampling
Descriptive Statistics.
Measures of central tendency.
Measures of variability
Measures of relative standing.
Measures of Association
Displaying data- bar charts, pie charts, frequency
histogram, line chart, frequency curve, stem and leaf
plot, box and whisker plot.
Use of A graph.
Skewness
Outline:
Measures of central tendency: mean, median, mode,
weighted mean, geometric mean
Measures of dispersion: range, mean of absolute
deviation, variance, standard deviation, interquartile
range.
Frequency Distribution.
When to use each measure of central tendency
Analysis strategies of quantitative data
Correlation and regression
Scatterplots
Correlation coefficient
Correlation coefficient and scatter plot
Regression analysis
Linear and nonlinear relationships.
Sampling
When measuring every item in a population
is impossible, inconvenient, too expensive,
we take a sample.
The process of sampling involves using a
portion or a population to make conclusions
about the whole population. A sample is a
subset or some part of a larger population.
The purpose of sampling is to estimate an
unknown characteristic about a population
Why Sample?
Sampling cuts costs, reduces labor
requirements and gather vital information
quickly.
Most properly selected samples give
results that are reasonably accurate.
In a sample, increased accuracy may
sometimes be possible because the
fieldwork and tabulation of data can be
more closely supervised.
Why Sample?
In most cases, studying the entire
population would be a massive
undertaking. It can be avoided by selecting
a sample from a population of interest.
With proper sampling, we can use
information obtained from the participants
who were sampled to estimate
characteristics about the population as a
whole. Statistical theory allows to infer
what the population is like, based on data
obtained from a sample.
Why Sample?
At the outset of the sampling process, the
target population must be carefully defined
so that the proper sources from which the
data are to be collected can be identified.
The usual technique about defining the
target population is to answer questions
about the characteristics of the population.
Sampling
Define the target population
Select the sampling frame
Determine if a probability or nonprobability
sampling method will be chosen.
Plan procedure for selecting sampling units
( sample unit: a single element or group of
elements subject to selection in the sample)
Determine sample size
Select actual sampling units
Conduct fieldwork
Confidence Intervals
When researchers make inferences about
populations, they do with a certain degree
of confidence.
Example: Results from the survey are
accurate within 3 percentage points using
a 95% level of confidence.
This is called a confidence interval. You can
have 95% confidence that that the true
population value lies within this interval
around the obtained sample result.
Sample Size
A larger sample size will reduce the size of
the confidence interval. Although the size
of the confidence interval is determined by
several factors, the most important is
sample size. Larger samples are more
likely to yield data that accurately reflect
the true population value.
NonprobabilitySampling
NonprobabilitySampling
NonprobabilitySampling
Quota sampling: in quota sampling the
interviewer has a quota to achieve.
The interviewer is responsible for finding
enough people to meet the quota.
NonprobabilitySampling
Snowball sampling: involves using
probability methods for an initial selection
of respondents and then obtaining
additional respondents through
information provided by initial
respondents.
This method is used to locate members of
rare populations by referrals.
Probability Sampling
Probability Sampling
Simple random sampling:
sampling procedure that ensures each
element in the population will have an
equal chance of being included in the
sample.
Probability Sampling
Systematic sampling: A starting point is
selected by a random process, then every
nth number on the list is selected.
While systematic sampling is not actually
random selection process, it does yield
random results if the arrangement of items
in the list is random in character.
The problem of periodicity occurs if a list
has systematic pattern.
Probability Sampling
Stratified sampling: Choose starta on the
basis of existing information, a subsample
is drawn using simple random sampling
within each stratum.
The reason for taking a stratified sample is
to obtain a more efficient sample than
would be possible with simple random
sampling.
Probability Sampling
Cluster sampling:
The purpose of cluster sampling is to
sample economically while retaining the
characteristics of a probability sample.
In cluster sampling the primary sampling
unit is no longer the individual element in
the population but a larger cluster of
elements located in proximity to one
another. the area sample is the most
popular type of cluster sampling.
Descriptive Statistics
Measures of Variability
Definition:
Indicate dispersion of scores within a data
set, deviation scores, which indicate each
score's distance from the mean.
Examples: Deviation scores such as
average deviation and variance.
Standard deviation which is the most
widely used indicator of the average
difference between the mean and
individual scores.
Measures of Association
Definition:
Single indicators of the degree of
relationship between two or more
variables.
Examples: Correlation coefficients, which
indicate the strength or relationships
between variables (Pearson's correlation
coefficient).
Displaying Data
Methods for displaying data for further
analysis are charts and graphs.
Graphs and charts provide a convenient
way to communicate information.
Of these charts are vertical Bar charts, line
charts, pie charts, histograms, stem and
leaf display and the Box and Whisker plots
Bar Charts
Bar charts:
Consist of rectangles.
On the horizontal axis place the classes or
labels used.
On the vertical axis place the frequencies
Bar Chart
100
80
Valu e VAR 0 0 0 0 2
60
40
20
0
P
VAR00001
Pie Charts
Pie charts:
circular drawing with each piece of the pie
representing a class with its frequency
represented by the area of the slice.
Pie Chart
Frequency Histogram
Frequency histogram:
Consists of rectangles near each other
with no spaces in between.
Place on the horizontal axis the classes
and on the vertical axis the frequencies.
Note: the width of the classes is equal
since it represents the C.I.
Histogram
Line Chart
Line chart:
Consists of connecting straight lines
through class midpoints.
To construct a line chart
Plot the midpoints of each class on the
horizontal axis.
Plot the frequencies on the vertical axis.
Connect the points through with straight
lines.
Line Chart
Frequency Curve
Frequency curve:
Is a curve running through the midpoints
of each class.
To construct a frequency curve:
Plot the midpoints on the horizontal axis.
Plot the frequencies on the vertical axis.
Connect the points through with curves
Frequency Curve
124678899
122246899
01234
02
Frequency Distributions
The term frequency distribution has many appearances
and applications in statistics.Bar charts areoften
referred to as frequency distributions. In general
frequency distributions are seen as graphs or plots that
do explain a certain trend about a variable under study.
Bar charts are a good example of these frequency
distributions as bar charts can be skewed and the
direction of skewness is an indication of the direction of
the trend or change ina variable. Also Examining the
shape of a distribution illustrates how the distribution
is centered about its mean, therefore can also be
viewed as a graphical measure of central tendency.
A Symmetric Distribution
A Negatively Skewed
Distribution
Descriptive Statistics
Measures
The Mean
Mean: average of data.
= sum of observations / number of
observations.
For a population the mean is given the symbol
(MU)
=x/N
Where x = the different x observations and N =
number of the x observations. (= number of
items in a population)
For a sample the mean is given the symbol (X
bar)
X = x/ n
Where x = the different x observations and n =
The Mean
Consider the following example representing
the scores of a student on five different tests
during a school year.
Ex.
63
59 71 41 32
= 63+59+71+41+32/5 = 53.2.
The Median
Median: is the middle observation after the data have
been put in to an ordered array, half data above and
half below the median.
For an odd set of observations the median is found as
follows:
Median position = (N + 1)/ 2.
For and even set of data the median is found by taking
the average of the middle two observations.
Ex. Previous set of data in Ex.1 put in descending order:
71 63 59 41 32
The median is the middle value since the number of the
data is odd = 59.
The Median
Ex2, Find the median of the set of data:
71 63 60 59 41 32
It is an even set of data the median is the
average of the two middle values (60+ 59) /
2 = 59.5
The Mode
Mode: is the observation which occurs with
the greatest frequency.
Ex.3 Scores on a test for a student
were
63 61 59 59 59 20 59
Mode = 59.
Exam
First test
Second
test
Final exam
Score
60
69
Weight
1
1
75
Measures of Dispersion
Definition: A measure of dispersion indicates
to what degree the individual observations
are spread about their mean.
Measure of dispersion are listed below:
Range.
Mean absolute deviation.
Variance.
Standard deviation.
The Interquartile Range
The Range
Range: Is the difference between the highest
observation and lowest observation.
Ex.6, Consider the following data:
120 49 25 90 20
The range is = 120 20 =100.
Standard Deviation
Ex9. A sample of 7 items has been selected
from a population.
87 120 54 92 73 80 63
Mean = 81.2
We calculate the sum of observations
squared to be 49,047
Variance = 49047 7(81.2)2 / 6 = 465.9
Standard deviation = 21.58
Frequency Distributions
Researchers more often work with large sets of
data where computations of central tendency or
dispersion measures become tedious.
In this case researchers begin with summarizing
the data in to grouped sets (frequency
distribution) and use statistical software programs
such as excel for calculations of central tendency
or dispersion measures as well as graph
representations of data.
Frequency Distribution
In a frequency distribution table data is
classified in to categories or classes. Each
class has an upper and lower boundary,
mid-point, interval and frequency.
A frequency distribution: provides order to
the data by dividing then in to classes and
recording the number of observations in
each class.
Frequency Distribution
Ex10. Place the following set of data in to a
frequency table
25 38 58 71 83 22 44 88 62
65
Solution: The classes can be defined as
follows
20-40, 41-60, 61-80, 81-100.
Frequency Distribution
Class
tally
frequency
20-40
111
41-60
11
61-80
111
81-100
11
Nonparametric
Assumed
distribution
Normal
Any
Assumed variance
Homogenous
Any
Typical data
Ratio or Interval
Ordinal or Nominal
Independent
Any
Mean
Median
Simplicity; Less
affected by outliers
Data set
relationships
Usual central
measure
Benefits
Scatter Plots
Scatter diagrams are plots of the paired
observations of X and Y on a graph with the
independent variable X placed on the horizontal
axis and the dependent variable Y placed on the
vertical axis.
Scatter plots provide a picture of the data
including the range of each variable, patterns of
values over the range, a suggestion to the
relationship between the variables ( linear
equation or a curve) and the number and place
of the outliers ( data representing error in
calculation , stands out of other data).
Scatter Plots
Scatterplots are important and effective in
the visualization of data scatter around the
line of best fit or sometimes the curve of
best fit.
The scatterplot indicates the spread of
data about the mean and hence an
indication or a measure of dispersion.
Scatter Plots
The more scattered the data about the line
of best fit the weaker the relationship
(correlation) between the two variables
under study.
The closer the points to the line of best fit
the stronger the correlation between the
two variables.
The regression line is the line that best fits
the data once plotted on the scatter
diagram.
Scatter Plot
Correlation Coefficient
Correlation is an index that measures the
strength of the relationship between two
variables under study.
When two variables correlate they have effect on
each other. This effect can be in the positive
sense or the negative sense.
If the variables correlate in the positive sense
then an increase in the value of one results in an
increase in the value of the other.
While a negative effect means an increase in the
value of one variable results in the decrease of
the value of the other.
Correlation Coefficient
The correlation factor is an index of values ranging
from -1 to +1.
1 indicates a perfect strong correlation between two
variables indicated in the scatterplot by how close
the points are to the line of best fit (in this case they
would be very close and some actually fall straight
on the line).
0 indicates no correlation between the two variables
that is the change in one has no effect what so ever
on the other.
The sign indicates the direction of the correlation,
positive means positive correlation and negative
means negative reciprocal correlation.
Degree of Correlation
Regression Analysis
Regression analysis is a tool for building
statistical models that characterize relationships
between a dependent variable and one or more
independent variables, all of which are numerical.
A regression model that involves a single
independent variable is called simple regression.
Simple Regression is fitting the data in to the
best linear equation that represents the
dependent variable Y as a linear equation of the
dependent variable
X : Y = aX + b, where a and b are real coefficients.
Regression Line
linear function
polynomial function.
exponential function.
Nonlinear Correlation
Nonlinear Regression
2.5
4.5
5.25
Slope
0.93330.07698
-0.03333
0.2755
0.03571
1.071
Goodness of Fit
0.9800