Research Methods: Dr. Abeer Yasin

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 109

Research

Methods
Dr. Abeer Yasin

Outline:

Sampling
Why Sample?
Sample Size.
Probability vrs Non probability Sampling
Descriptive Statistics.
Measures of central tendency.
Measures of variability
Measures of relative standing.
Measures of Association
Displaying data- bar charts, pie charts, frequency
histogram, line chart, frequency curve, stem and leaf
plot, box and whisker plot.
Use of A graph.
Skewness

Outline:
Measures of central tendency: mean, median, mode,
weighted mean, geometric mean
Measures of dispersion: range, mean of absolute
deviation, variance, standard deviation, interquartile
range.
Frequency Distribution.
When to use each measure of central tendency
Analysis strategies of quantitative data
Correlation and regression
Scatterplots
Correlation coefficient
Correlation coefficient and scatter plot
Regression analysis
Linear and nonlinear relationships.

Sampling
When measuring every item in a population
is impossible, inconvenient, too expensive,
we take a sample.
The process of sampling involves using a
portion or a population to make conclusions
about the whole population. A sample is a
subset or some part of a larger population.
The purpose of sampling is to estimate an
unknown characteristic about a population

Why Sample?
Sampling cuts costs, reduces labor
requirements and gather vital information
quickly.
Most properly selected samples give
results that are reasonably accurate.
In a sample, increased accuracy may
sometimes be possible because the
fieldwork and tabulation of data can be
more closely supervised.

Why Sample?
In most cases, studying the entire
population would be a massive
undertaking. It can be avoided by selecting
a sample from a population of interest.
With proper sampling, we can use
information obtained from the participants
who were sampled to estimate
characteristics about the population as a
whole. Statistical theory allows to infer
what the population is like, based on data
obtained from a sample.

Why Sample?
At the outset of the sampling process, the
target population must be carefully defined
so that the proper sources from which the
data are to be collected can be identified.
The usual technique about defining the
target population is to answer questions
about the characteristics of the population.

Sampling
Define the target population
Select the sampling frame
Determine if a probability or nonprobability
sampling method will be chosen.
Plan procedure for selecting sampling units
( sample unit: a single element or group of
elements subject to selection in the sample)
Determine sample size
Select actual sampling units
Conduct fieldwork

Confidence Intervals
When researchers make inferences about
populations, they do with a certain degree
of confidence.
Example: Results from the survey are
accurate within 3 percentage points using
a 95% level of confidence.
This is called a confidence interval. You can
have 95% confidence that that the true
population value lies within this interval
around the obtained sample result.

Sample Size
A larger sample size will reduce the size of
the confidence interval. Although the size
of the confidence interval is determined by
several factors, the most important is
sample size. Larger samples are more
likely to yield data that accurately reflect
the true population value.

Probability vrs Non


probability Sampling
Several alternative ways to take a sample are
available: probability techniques and nonprobability
techniques.
Probability sampling: every element in the
population has a known, non zero probability of
selection. The simple random sample, in which
each member of the population has an equal
probability of being selected , is the best known
probability sample.
Nonprobability sampling: The probability of any
particular member of the population being chosen
is unknown. Theselection is arbitrary, the
researcher relays heavily on personal judgment.

NonprobabilitySampling

Convenience sampling: sampling by


obtaining people or units that are
conveniently available.
Researchers use convenience sampling to
obtain large number of completed
questionnaires quickly and economically or
when obtaining a sample through other
means is impractical.

NonprobabilitySampling

Judgment sampling: is a technique in which


an experienced individual selects the
sample based on his or her judgment
about some appropriate characteristics
required of the sample member.
Researchers select samples that satisfy
their specific purposes even if they are not
fully representative.

NonprobabilitySampling
Quota sampling: in quota sampling the
interviewer has a quota to achieve.
The interviewer is responsible for finding
enough people to meet the quota.

NonprobabilitySampling
Snowball sampling: involves using
probability methods for an initial selection
of respondents and then obtaining
additional respondents through
information provided by initial
respondents.
This method is used to locate members of
rare populations by referrals.

Probability Sampling

All probability sampling techniques are


based on chance selection procedures.
The term random refers to the procedure
and not the data in the sample.

Probability Sampling
Simple random sampling:
sampling procedure that ensures each
element in the population will have an
equal chance of being included in the
sample.

Probability Sampling
Systematic sampling: A starting point is
selected by a random process, then every
nth number on the list is selected.
While systematic sampling is not actually
random selection process, it does yield
random results if the arrangement of items
in the list is random in character.
The problem of periodicity occurs if a list
has systematic pattern.

Probability Sampling
Stratified sampling: Choose starta on the
basis of existing information, a subsample
is drawn using simple random sampling
within each stratum.
The reason for taking a stratified sample is
to obtain a more efficient sample than
would be possible with simple random
sampling.

Probability Sampling
Cluster sampling:
The purpose of cluster sampling is to
sample economically while retaining the
characteristics of a probability sample.
In cluster sampling the primary sampling
unit is no longer the individual element in
the population but a larger cluster of
elements located in proximity to one
another. the area sample is the most
popular type of cluster sampling.

Frequency, Tables and Graphs

Definition: summary displays of variables


and their frequency of occurrence, which
may involve one variable or more than one
variable at a time.
Examples: Tables, graphs, contingency
tables containing different variables on
their rows and columns.

Descriptive Statistics

Measures of Central Tendency


Definition:
Single score summary of group of
observations/scores.
Examples:
Mode,which is the most frequent score in
a group
Mean which is the average of scores (sum
divided by number of scores)
Median which is the score at or below the
50% of the scores fall

Measures of Variability
Definition:
Indicate dispersion of scores within a data
set, deviation scores, which indicate each
score's distance from the mean.
Examples: Deviation scores such as
average deviation and variance.
Standard deviation which is the most
widely used indicator of the average
difference between the mean and
individual scores.

Measures of Relative Standing


Definition:
Single indicators of the relative position of
a score in relation to others.
Examples: Percentile rank, which is the
percent of scores that fall at or below a
specific score, z score, standard score.

Measures of Association
Definition:
Single indicators of the degree of
relationship between two or more
variables.
Examples: Correlation coefficients, which
indicate the strength or relationships
between variables (Pearson's correlation
coefficient).

Displaying Data
Methods for displaying data for further
analysis are charts and graphs.
Graphs and charts provide a convenient
way to communicate information.
Of these charts are vertical Bar charts, line
charts, pie charts, histograms, stem and
leaf display and the Box and Whisker plots

Bar Charts
Bar charts:
Consist of rectangles.
On the horizontal axis place the classes or
labels used.
On the vertical axis place the frequencies

Bar Chart
100

80

Valu e VAR 0 0 0 0 2

60

40

20

0
P

VAR00001

Pie Charts
Pie charts:
circular drawing with each piece of the pie
representing a class with its frequency
represented by the area of the slice.

Pie Chart

Frequency Histogram
Frequency histogram:
Consists of rectangles near each other
with no spaces in between.
Place on the horizontal axis the classes
and on the vertical axis the frequencies.
Note: the width of the classes is equal
since it represents the C.I.

Histogram

Line Chart
Line chart:
Consists of connecting straight lines
through class midpoints.
To construct a line chart
Plot the midpoints of each class on the
horizontal axis.
Plot the frequencies on the vertical axis.
Connect the points through with straight
lines.

Line Chart

Frequency Curve
Frequency curve:
Is a curve running through the midpoints
of each class.
To construct a frequency curve:
Plot the midpoints on the horizontal axis.
Plot the frequencies on the vertical axis.
Connect the points through with curves

Frequency Curve

Stem and Leaf Display


Stem and Leaf display:
In this display data are grouped according
to their leading digits (called stems, 10th or
100th place) while listing final digits
(leaves) separately for each member of a
class.
The leaves are displayed in ascending
order for each stem.

Stem and Leaf Display


Consider the following data and construct
the stem and leaf display.
16 21 28 22 31
40 14 18 22 29
30 32 18 11 34
42 12 24 19 29
17 33 26 19 22

Stem and Leaf Display


1

124678899

122246899

01234

02

Box and Whisker Plot.


The box and whisker plot uses the five point
summary. It consists of:
An inner box that shows the numbers
which span the range from Q1 to Q3 .
A line that is drawn through the box at the
median position.
The whiskers are the lines from Q1 to the
minimum value and from Q3 to the
maximum value.

Box and Whisker Plot

Box and Whisker Plot


One of the commonly used graphs in scientific
research to display data is the Box and Whisker
Plot.
In order to understand the Box and Whisker Plots
we need to understand the components taken in to
sketching the plot and the inter-quartile range.
The inter-quartile range is a measure of dispersion,
it measures the spread in the middle 50% of the
data and it equals the difference between the
observations at the 75th percentile (third quartile,
Q3)and at the 25th(first quartile, Q1) percentile
IQR = Q3 - Q1.

The Box and Whisker Plot


The first quartile: Q1
Q1 = 25th percentile.
Q1 divides data such that 25% of the data are at or below this
value.
Q1 = (n+1)/4
Q1 is located in the 0.25 (n + 1) positions.
Third quartile: Q3
Q3 = 75th percentile.
Divides data such that 75% of the data are at or below Q 3.
Q3 = 3(n+1)/4
Q3 is located in the 0.75 (n + 1) positions.
Second quartile Q2
Q2 = 50th percentile.
Q2 = median

The Box and Whisker Plot


The five point summary used in sketching the Box and
Whisker Plot:
Consists of the minimum, Q1, median = Q2, Q3, and
maximum of the data set such that
Min < Q1 < Q2 < Q3 < Max
The box and whisker plot uses the five point summary
as follows:
An inner box that shows the numbers which span the
range from Q1 to Q3 .
A line that is drawn through the box at the median
position.
The whiskers are the lines from Q1 to the minimum
value and from Q3 to the maximum value.

Use of the Graph


The use of a particular graph depends on the
information you would display or in other words
the property you are more interested in
displaying.
Frequency distributions would work to give a
comparison between categories in terms of
frequency of each, the distribution of the data and
its skewness in other words in which direction it
tends to concentrate and the different possible
categories.
The pie charts would display the different
categories and the percentage of each as a piece
of the pie, therefore it demonstrates the portion of
the data that concentrates in each category as a

Frequency Distributions
The term frequency distribution has many appearances
and applications in statistics.Bar charts areoften
referred to as frequency distributions. In general
frequency distributions are seen as graphs or plots that
do explain a certain trend about a variable under study.
Bar charts are a good example of these frequency
distributions as bar charts can be skewed and the
direction of skewness is an indication of the direction of
the trend or change ina variable. Also Examining the
shape of a distribution illustrates how the distribution
is centered about its mean, therefore can also be
viewed as a graphical measure of central tendency.

In order to understand the concept of skewness we need to


understand another concept called symmetry. The shape of
a distribution is symmetric if the observations (data) are
balanced or evenly distributed about the mean. In a
symmetric distribution the mean is equal to the median.
A distributionon the other hand is defined as skewed if not
symmetric. In this case the distribution can be positively
skewedwhere the tail of the distribution extends to the
right or negatively skewed where the tail of the
distributionextendsto the left. The attached graphs show
a symmetric distribution (figure 1), positively skewed
distribution (figure 2) and negatively skewed distribution
(figure 3). The extended tailin no doubt illustrates the
clustering of data and their relationship to the mean.

A Symmetric Distribution

A Positively Skewed Distribution

A Negatively Skewed
Distribution

Descriptive Statistics
Measures

of Central Tendency and Dispersion


of Ungrouped Data:
Measures of central tendency locate the center
value for a set of data. The central measure of
tendency are listed below:
Mean
Median
Mode

The weighted mean


The Geometric mean

The Mean
Mean: average of data.
= sum of observations / number of
observations.
For a population the mean is given the symbol
(MU)
=x/N
Where x = the different x observations and N =
number of the x observations. (= number of
items in a population)
For a sample the mean is given the symbol (X
bar)
X = x/ n
Where x = the different x observations and n =

The Mean
Consider the following example representing
the scores of a student on five different tests
during a school year.
Ex.
63
59 71 41 32
= 63+59+71+41+32/5 = 53.2.

The Median
Median: is the middle observation after the data have
been put in to an ordered array, half data above and
half below the median.
For an odd set of observations the median is found as
follows:
Median position = (N + 1)/ 2.
For and even set of data the median is found by taking
the average of the middle two observations.
Ex. Previous set of data in Ex.1 put in descending order:
71 63 59 41 32
The median is the middle value since the number of the
data is odd = 59.

The Median
Ex2, Find the median of the set of data:
71 63 60 59 41 32
It is an even set of data the median is the
average of the two middle values (60+ 59) /
2 = 59.5

The Mode
Mode: is the observation which occurs with
the greatest frequency.
Ex.3 Scores on a test for a student
were
63 61 59 59 59 20 59
Mode = 59.

The Weighted Mean


The weighted mean: is calculated when
certain observations carry more weight than
others
Xw = XW / W
Where W = weight of each observation.
X = observation
Xw = weighted mean.

The Weighted Mean


Ex4. Grades on an exams for a student are

Exam
First test
Second
test
Final exam

Score
60
69

Weight
1
1

75

Xw = XW / W = 60(1) + 69(1) + 75(2) /


1+1+2

The Geometric Mean


Geometric mean: Computed by taking the
nth root of the product of the n observations
making up the sample GM = n ( x1
xn)
Ex5. The Geometric mean of 5, 6, 8 and 12
is
The fourth root of the product of 5, 6, 8 and
12 = 7.33

Measures of Dispersion
Definition: A measure of dispersion indicates
to what degree the individual observations
are spread about their mean.
Measure of dispersion are listed below:
Range.
Mean absolute deviation.
Variance.
Standard deviation.
The Interquartile Range

The Range
Range: Is the difference between the highest
observation and lowest observation.
Ex.6, Consider the following data:
120 49 25 90 20
The range is = 120 20 =100.

Mean Absolute Deviation


Is a measure given by the following formula:
MAD = Xi - X / n
Where n = number of observations in a sample.
Xi = ith observation
X = mean of data.
Ex. Scores of eight students on a test are
73 82 64 61 63 68 52 73
Mean = 67
MAD = 56/8 = 7
Note: The greater the value of MAD the greater the
dispersion of data around its mean the less we can depend
on such data to represent the mean.

Variance And Standard


Deviation for A Population
For a population:
Variance 2 = (xi )2 / N
Standard deviation = 2
Ex8. For the following set of data calculate the
variance and standard deviation.
110 145 125 95 150
= 110+145+125+95+150 / 5 = 125.
2 = (xi )2 / N
= (110 125)2 + (145 125)2 + (125 125)2 + (95125)2 + (150 125)2 / 5 = 430
Standard deviation = 430 = 20.74.

Variance and Standard


Deviation for A Sample
For a sample:
Variance s2 = (xi x)2 / n-1
Standard deviation s = s2
Ex. 2 = (xi )2 / n-1
= (110 125)2 + (145 125)2 + (125 125)2
+ (95-125)2 + (150 125)2 / 4
Standard deviation = 2.

Standard Deviation
Ex9. A sample of 7 items has been selected
from a population.
87 120 54 92 73 80 63
Mean = 81.2
We calculate the sum of observations
squared to be 49,047
Variance = 49047 7(81.2)2 / 6 = 465.9
Standard deviation = 21.58

The Interquartile Range


Observations (data) are sometimes
compared by use of their relative ranking.
Quartiles are another measure of dispersion
that separates large data sets in to four
quarters.
The first quartile = Q1 = 25th percentile=
Q1 = ( n+1)/4
Third quartile= Q3 = 75th percentile= Q3 =
3(n+1)/4
Second quartile= Q2 = 50th percentile= Q2
= median

The Interquartile Range


The inter-quartile range measures the
spread in the middle 50% of the data and it
equals the difference between the
observations at the 75th percentile and at
the 25th percentile.
IQR = Q3 - Q1
The five point summary: consists of the
minimum, Q1, median = Q2, Q3, and
maximum of the data set such that
min < Q1 < Q2 < Q3 < max

Frequency Distributions
Researchers more often work with large sets of
data where computations of central tendency or
dispersion measures become tedious.
In this case researchers begin with summarizing
the data in to grouped sets (frequency
distribution) and use statistical software programs
such as excel for calculations of central tendency
or dispersion measures as well as graph
representations of data.

Frequency Distribution
In a frequency distribution table data is
classified in to categories or classes. Each
class has an upper and lower boundary,
mid-point, interval and frequency.
A frequency distribution: provides order to
the data by dividing then in to classes and
recording the number of observations in
each class.

Frequency Distribution
Ex10. Place the following set of data in to a
frequency table
25 38 58 71 83 22 44 88 62
65
Solution: The classes can be defined as
follows
20-40, 41-60, 61-80, 81-100.

Frequency Distribution
Class

tally

frequency

20-40

111

41-60

11

61-80

111

81-100

11

When to Use Each of the


Measures of Central Tendency?

Before we can discuss the different cases


where each measure of central tendency is
used it is important that we explain three
types of measurements of variables, the
interval-ratio level, the ordinal level and the
nominal level.

When to Use Each of the


Measures of Central Tendency?
The interval-ratio level: The categories of
nominal level variable have no numerical quality
to them they have categories that range on a
scale from low to high but the exact distance
between categories of scores is not defined.
Variables measured at the interval-ratio level
allow for classification and ranking. They also
are measured in units that have equal intervals.
For example, age categories 20-30, 30-40, 40-50
have equal intervals.
Other examples include income, number of
children, and number of bonds or stocks.

When to Use Each of the


Measures of Central Tendency?
The nominal level: Variables that are
measured at the nominal level have scores
or categories that are not numeric in
nature.
Examples include sex, religion, race, zip
code and place of birth.
At this level the only mathematical
operation can be used to compare relative
sizes of the categories.

When to Use Each of the


Measures of Central Tendency?
The ordinal level: Variable measured at this
level are more sophisticated than the ones
measured at the nominal level as they
have scores or categories that can be
ranked from low to high.
Example to this would be socioeconomic
status can be classified in to upper class,
middle class, working class and lower
class.
Numbers can be used to represent each
rank and thus number manipulation can be

When to Use Each of the


Measures of Central Tendency?
The Mean:
The mean is the most commonly used measure
of central tendency. The computation of the
mean therefore involves addition and division
as such it should be used with variables
measured at the interval-ratio level. On the
other hand it can also be used by researchers
to calculate the mean of variables measured at
the ordinal level since it is a more flexible to
calculate than the median and is a central
feature of many statistically advanced
techniques.

When to Use Each of the


Measures of Central Tendency?
The Mean:
Measures that are used when one needs to
locate the scores that split the distribution
in to thirds of fourths or the point below
which a given percentage of the cases fall.
These measures can be found for any
ordinal or interval ration level.

When to Use Each of the


Measures of Central Tendency?
Percentiles, Deciles and Quartiles:
Measures that are used when one needs to
locate the scores that split the distribution
in to thirds of fourths or the point below
which a given percentage of the cases fall.
These measures can be found for any
ordinal or interval ration level.

When to Use Each of the


Measures of Central Tendency?
The Median:
The median represents the exact center of
distribution of scores.
It is the score that falls in the exact middle position
of a distribution, half the scores fall above it and half
fall below it.
It is calculated after ranking the scores from high to
low and then choosing this middle value.
The median cannot be calculated for variables
measured at the nominal level.
The median can be found for either the ordinal or
interval ratio data.

When to Use Each of the


Measures of Central Tendency?
The Mode:
The mode for any set of data is the value
that occurs with greatest frequency.
The mode is the appropriate measure of
central tendency when the variable under
consideration is nominal.
It is also used for interval-ratio and ordinal
level measured variables.

Analysis Strategies for Quantitative Data

Quantitative data analysis is the analysis of


numeric data using a variety of statistical
techniques.
Data analysis techniques:
Descriptive vrs inferential statistics.
Univariate vrs multivariate statistics
Parametric vrs nonparametric statistics.

Analysis Strategies for


Quantitative Data
Descriptive methods: are procedures for
summarizing data, with the intention of
discovering trends and patterns,
summarizing results for ease of
understanding and communication.
The outcome of these strategies is usually
called descriptive statistics and includes
results such as frequency tables, means
and correlations.

Analysis Strategies for


Quantitative Data
Inferential techniques: are generated after
descriptive results have been examined.
They are normally used for testing
hypotheses or for confirming or
disconfirming the results obtained from the
descriptive statistics.
An example is the use of t-tests.

Analysis Strategies for


Quantitative Data
Univariate statistics: involve linking one
variable that is the focal point of the
analysis (eg, predicted event, single
dependent variable in an experiment) with
one or more variables (eg, predictors,
independent variables).

Analysis Strategies for


Quantitative Data
Multivariate statistics: Link two or more sets
of variables to each other such as the
simultaneous relationship between multiple
dependent predicted and independent
predictor variables.
These multivariate analyses are followed by
simpler univariate ones to determine the
more important a) relationships between
variables or b) differences between groups.

Analysis Strategies for


Quantitative Data
Parametric statistics: are very powerful
techniques but they require that data meet
certain assumption (independence,
normality, homogeneity of variance)
Examples:
T-test
ANOVA
Pearson Correlation

Analysis Strategies for


Quantitative Data
Nonparametric statistics: require a few if any
assumptions about the population under study,
they can be used with ordinal or nominal data.
Examples:
the Mann-Whitney Test
the Wilcoxon Signed-Rank Test
the Kruskal-Wallis Test
Spearman Correlation

Parametric vs. nonparametric tests


There are two types of test data and
consequently different types of analysis.
As the table shows, parametric data has an
underlying normal distribution which allows
for more conclusions to be drawn as the
shape can be mathematically described.
Anything else is non-parametric.

Parametric vs. nonparametric tests


Parametric

Nonparametric

Assumed
distribution

Normal

Any

Assumed variance

Homogenous

Any

Typical data

Ratio or Interval

Ordinal or Nominal

Independent

Any

Mean

Median

Can draw more


conclusions

Simplicity; Less
affected by outliers

Data set
relationships
Usual central
measure
Benefits

Correlation and Regression


Statisticians are often interested in the
relationship and interaction between two
variables and the strength or weakness of
this relationship.
For example, the relationship between the
salary and the spending for a person over
a period of time. Salary is a variable that
takes on different values over a period of
time and so is the spending.

Scatter Plots
Scatter diagrams are plots of the paired
observations of X and Y on a graph with the
independent variable X placed on the horizontal
axis and the dependent variable Y placed on the
vertical axis.
Scatter plots provide a picture of the data
including the range of each variable, patterns of
values over the range, a suggestion to the
relationship between the variables ( linear
equation or a curve) and the number and place
of the outliers ( data representing error in
calculation , stands out of other data).

Scatter Plots
Scatterplots are important and effective in
the visualization of data scatter around the
line of best fit or sometimes the curve of
best fit.
The scatterplot indicates the spread of
data about the mean and hence an
indication or a measure of dispersion.

Scatter Plots
The more scattered the data about the line
of best fit the weaker the relationship
(correlation) between the two variables
under study.
The closer the points to the line of best fit
the stronger the correlation between the
two variables.
The regression line is the line that best fits
the data once plotted on the scatter
diagram.

Scatter Plot

Correlation Coefficient
Correlation is an index that measures the
strength of the relationship between two
variables under study.
When two variables correlate they have effect on
each other. This effect can be in the positive
sense or the negative sense.
If the variables correlate in the positive sense
then an increase in the value of one results in an
increase in the value of the other.
While a negative effect means an increase in the
value of one variable results in the decrease of
the value of the other.

Correlation Coefficient
The correlation factor is an index of values ranging

from -1 to +1.
1 indicates a perfect strong correlation between two
variables indicated in the scatterplot by how close
the points are to the line of best fit (in this case they
would be very close and some actually fall straight
on the line).
0 indicates no correlation between the two variables
that is the change in one has no effect what so ever
on the other.
The sign indicates the direction of the correlation,
positive means positive correlation and negative
means negative reciprocal correlation.

Correlation Coefficient and the


ScatterPlot
All would be evident in the scatterplot.
The closer the points to the line of best fit the
stronger the correlation. the farther away the points
with more scatter around the line of best fit, the
weaker the correlation.
The sign of the correlation is also evident in the
scatterplot.
A positive correlation is indicated by a straight line
moving in the positive increase direction, as x
increases, y increases.
The negative sign is indicated by the decreasing
negative line, as x increases, y decreases.

Degree of Correlation

Regression Analysis
Regression analysis is a tool for building
statistical models that characterize relationships
between a dependent variable and one or more
independent variables, all of which are numerical.
A regression model that involves a single
independent variable is called simple regression.
Simple Regression is fitting the data in to the
best linear equation that represents the
dependent variable Y as a linear equation of the
dependent variable
X : Y = aX + b, where a and b are real coefficients.

Regression Line

Linear and Nonlinear


Relationships
What is the difference between a linear
relationship and nonlinear relationship
between two variables?
A linear relationship means the effect of
X and Y is the same at all values of X
A nonlinear relationship means the effect
of X and Y changes over values of X.
The scatterplot shows the difference
between patterns of correlations between
dependent variables across the groups.

Linear and Nonlinear


Relationships
In a linear relationship the slope of the line is constant
throughout the line.
To obtain the slope of the straight line we can use any
two points on the line and calculate the rate of change in
y over the rate of change in x.
In all cases regardless of what points are taken on the
line, the slope of the line will be a constant value and
hence the changein Yover the change in X willalways
be the same.
Thescatter plot ofalinear relationshipreveals the
presenceofthe straight line modeling thebehavior
between both variables.
The scatter of the points may take the shape of a
straight line with a negative slope or one with a positive
slope.

Linear and Nonlinear


Relationships
The scatterplot of nonlinear relationships takes the shape
of a curve that can be modeled using a number of
nonlinear equations such as the exponential function or
the polynomial functions.
Theslope of a curve is not a constant valueasis the
case of a straight line.
In fact the slope is different at each point on the curve
and we usually representthe slope by a function called
the derivative.
If the scatter is condensed about the line of best fit
(regression line) it indicates a strong correlation between
both variables andin the case the points are spread
about the line with more distance between each other
and the line of best fit the correlation is weak.

linear function
polynomial function.
exponential function.

Nonlinear Correlation

Nonlinear Regression

Example of Linear Regression


As an example to creating a linear
regression model consider the following set
of data:
Y
X
1

2.5

4.5

5.25

SPSS and the Regression


Equation
the equation of the line of best fit is Y= mX+b , m is
slope and b is the y intercept. Therefore using the
above results we have Y= 0.93X -0.03 with a
regression coefficient of r^2= 0.98.
Best-fit values

Slope

X-intercept when Y=0.0

0.93330.07698
-0.03333
0.2755
0.03571

1.071

Goodness of Fit

0.9800

Y-intercept when X=0.0

You might also like