0% found this document useful (0 votes)
68 views10 pages

Psych Stat Reviewer Midterms

This document discusses linear regression analysis. It defines key terms like the regression line, slope, intercept, correlation coefficient, standard error, ANOVA, and t-value. It explains that regression analysis allows you to predict a criterion variable (y) based on a predictor variable (x) and assess the effect that x has on y. The goal is to find the line of best fit that has the minimum deviation from data points on a scatterplot and explains the most variation in y. SPSS can be used to perform regression, output important statistics, and test whether the regression model is a better predictor than chance. Psychologists use regression to discover relationships between variables and quantify how much one variable influences another.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
68 views10 pages

Psych Stat Reviewer Midterms

This document discusses linear regression analysis. It defines key terms like the regression line, slope, intercept, correlation coefficient, standard error, ANOVA, and t-value. It explains that regression analysis allows you to predict a criterion variable (y) based on a predictor variable (x) and assess the effect that x has on y. The goal is to find the line of best fit that has the minimum deviation from data points on a scatterplot and explains the most variation in y. SPSS can be used to perform regression, output important statistics, and test whether the regression model is a better predictor than chance. Psychologists use regression to discover relationships between variables and quantify how much one variable influences another.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 10

PSYCHOLOGICAL STATISTICS

VICTORIANO, HAZEL T.
I. REGRESSION ANALYSIS the minimum deviation of the points from it.
− Regression analysis is an extension of correlational − The sum of the deviations (∑d) of the scattergram
analysis points from the regression line should be minimal.
− linear regression gives us a measure of the effect − Actually, the precise criterion is the sum of the
that x has on y, the technique allows us to predict y, squared deviations known as least squares solution.
from x. How to draw the regression line?
− Regression, numerically describes important − Click graphs
features of a scattergram relating two variables. − Click legacy dialogs
− Use: − Click scatter/dot
− it allows the researcher to make predictions − Click simple scatter
− (for example, when choosing the best applicant for a − Click define then transfer the y and x variables
job on the basis of an aptitude or ability test). − Click ok
x variable − Double click the graph and click add fit line and click
− Predictor or explanatory variable or independent linear
variable − Click close
− the horizontal dimension (X-axis) should always be Theoretical background and regression equations
used to represent the variable from which the a (the intercept, or constant)
prediction is being made − is the expected mean value of Y when all X=0.
y variable b (the slope of the line)
− The criterion variable or the dependent variable − indicates the steepness of a line
− the vertical dimension (Y-axis) should always
represent what is being predicted. − The slope and the intercept indicates the location
− heoretical background and regression equations where it intersects an axis.

The line through a set of points on a scattergram is called − The slope and the intercept define the linear
the regression line. relationship between two variables, and can be used
− In order to establish an objective criterion, the to estimate an average rate of change.
regression line is chosen which gives the closest fit to
the points on the scattergram. R
− In other words, the procedure ensures that there is a − is the correlation between the predicted values and
minimum sum of distances of the regression line to the observed values of Y
the points in the scattergram. − is the correlation between the predicted values and
− So, in theory, one could keep trying different the observed values of Y
possible regression lines until one is found which has − The correlation between x and y is a simple
1
PSYCHOLOGICAL STATISTICS
VICTORIANO, HAZEL T.
Pearson’s r, and is represented on the output by R variation in your sample data.
(also known as Multiple R). − When you perform a t-test, you're usually trying to
R squared find evidence of a significant difference between
− is the square of this coefficient and indicates the population means (2-sample t) or between the
percentage of variation explained by population mean and a hypothesized value (1-
your regression line out of the total variation. sample t).
− Is a statistical measure of ho close the data are to − The t-value measures the size of the difference
the fitted regression line. relative to the variation in your sample data.
− It is also known as the coefficient of determination − If the t – value is smaller than the hypothesized
or the coefficient of multiple determination for value, then the t-statistic will be negative, if it is
multiple regression. larger, the t-statistic will be positive.
− 100% indicates that the model explains all the How to examine the predictors Using SPSS?
variability of the response data around its mean. − Click analyze
Standard Error − Click regression
− also known as the standard error of the estimate, − Click linear
represents the average distance that the observed − Transfer the dependent and independent variable/s
values fall from the regression line. − Click statistics, check estimates, confidence, model
− is a measure of the accuracy of predictions. fit and R squared change and click continue
− Conveniently, it tells you how wrong − Click ok
the regression model is on average using the units of Important parts of the output
the response variable.
− The smaller the standard error, the less the spread
and the more likely it is that any sample mean is
close to the population mean. − R of 0.97 denotes strong correlation, this correlation
ANOVA is important because it shows how well the data
− consists of calculations that provide information points cluster around the line of best fit.
about levels of variability within a regression model The prediction will obviously be better when the
and form a basis for tests of significance. correlation is high.
− is used to determine whether there are any Adjusted R Square
statistically significant differences between the
means of three or more independent (unrelated) − The R square is adjusted by SPSS to account for the
groups. number of participants and variables in the analysis.
T – value
− measures the size of the difference relative to the R square is too optimistic, as the line of best fit is
2
PSYCHOLOGICAL STATISTICS
VICTORIANO, HAZEL T.
based on a sample, not the population. You can see that the value of b is –12.39.

We want our results to generalize to the population, This means that, for every one pence rise in price, sales
so Adjusted R Square adjusts the figure to give a drop by 12.39 (thousands).
more realistic estimate.
While the standard error could be used as a measure of
In our example, the variance explained is reported to error of prediction, SPSS gives us the confidence limits,
be 93%. which are based on the standard error.
Standard error
Standard errors are estimated. The confidence limits show us that we are 95% confident
This figure is an estimate of the variance of y, for each that our population slope can be found within the
value of x. interval -14.31 and -10.47.

The intercept, a

The value of the intercept (1513.96) is also given,


The summary table, which you are used to, shows you
along with the standard error (77.43).
whether your regression line (line of best fit) is
significantly different from 0: that is, it predicts better
In this output, the value of a is given in the row
than would be expected by chance.
labelled ‘Constant’. Confidence limits are also given.

Remember, if the slope b = 0, then the line of best fit is


The values of a and b allow us to state the regression
horizontal. In this case, the F-value is 194.16, with an
equation. We translate the algebraic formula Y = bx
associated probability of < 0.001.
+a
Formula:
This means that such a result is highly unlikely to have
arisen by sampling error, assuming the null hypothesis to
be true.

Why are Psychologists interested in using linear


regression?
− using linear regression allows them to discover the
effect of one variable
− linear regression will answer the question ‘By how
The slope, b
much will y change, if x changes
3
PSYCHOLOGICAL STATISTICS
VICTORIANO, HAZEL T.
− to assess the effect that x has on y is applied
− use linear regression to suggest that a score on one − Correlation lies between +1 to -1
variable influenced the score on the other variable − A zero correlation indicates that there is no
Examples of how Linear Regression is used? relationship between the variables
assess the effect of stress on symptoms of the − A correlation of –1 indicates a perfect negative
common cold (e.g. runny nose, sore throat, cough) correlation
predict children’s mathematical ability from a − A correlation of +1 indicates a perfect positive
measure of their reading ability. correlation

II. CORRELATIONS

Coefficient of Correlation

− A measure of the strength of the linear relationship


− Correlational analysis is used to assess the
between two variables that is defined in terms of the
magnitude and direction of a relationship.
(sample) covariance of the variables divided by their
(sample) standard deviations
− we ask whether two variable covary (does Y get s
− Represented by “r”
larger as X gets larger?)
− r lies between +1 to -1
Magnitude and Direction
Is design primarily to examine linear relationships
− -1 < r < +1
between variables.
− The + and – signs are used for positive linear
− Correlation a LINEAR association between two
correlations and negative linear correlations,
random variables
respectively
− Correlation analysis show us how to determine both
Positive correlation means that high scores on Y are
the nature and strength of relationship between two
associated with high scores on X;
variables
Negative correlation means that higher scores on Y are
− When variables are dependent on time, correlation
4
PSYCHOLOGICAL STATISTICS
VICTORIANO, HAZEL T.
associated with lower scores on X; − For example, if you are trying to find the correlation
between a high calorie diet and diabetes, you might
Interpreting Coefficients find a high correlation of .8.
− strong correlation: r > .70 or r < –.70
− moderate correlation: r is between .30 & .70 or r is − PPMC will not give you any information about the
between –.30 and –.70 slope of the line; It only tells you whether there is a
− weak correlation: r is between 0 and .30 or r is relationship.
between 0 and –.30 . In summary, correlation
Is used to asses the magnitude and direction of a
Pearson Product Moment Correlation or PPMC or relationship.
Pearson r shows the linear relationship between two sets
of data. How to analyze your data using Pearson r in SPSS?
− Click analyze
− The results will be between -1 and 1. − Click correlate
− Click bivariate
− The closer the value of r gets to zero, the greater the − Highlight and transfer the 2 variables to the variable
variation the data points are around the line of best area
fit. − Click Pearson, two – tailed, and flag significant
relationship
− Click ok

III. CORRELATION ANALYSIS FOR CATEGORICAL DATA


Categorical variable
− is a variable that can take on one of a limited, and
usually fixed number of possible values, assigning
each individual or other unit of observation to a
Is there a significant relationship between pretest and
particular group or nominal category on the basis of
posttest scores?
some qualitative property.
− A categorical or discrete variable is one that has two
Potential problems with Pearson correlation.
or more categories (values).
Two types of categorical variable
− The PPMC is not able to tell the difference between
Nominal - No intrinsic ordering to its categories. For
dependent and independent variables.
example, gender, (male and female) with no intrinsic
ordering to the categories.
5
PSYCHOLOGICAL STATISTICS
VICTORIANO, HAZEL T.
Ordinal - Has a clear ordering. For example, temperature Procedures on how to analyze using spearman rho:
as a variable with three orderly categories (low, medium − Click analyze
and high). − Click correlate
Spearman Rho − Click bivariate
− Pearson’s r and Spearman’s rho are very similar. − Transfer the variable you wished to analyze
− They are both correlation coefficients, interpreted in − Click spearman
the same way. − Click ok
− Spearman's rank correlation coefficient
or Spearman's rho, named after Charles Spearman
and often denoted by the Greek letter p (rho) or
as rs
− is a nonparametric measure of rank correlation
(statistical dependence between the rankings of
two variables).
− It assesses how well the relationship between two
Results
variables can be described using
What does the results represents?
a monotonic function.
− rs = 0.921
Pearson’s r Spearman’s rho
− Denotes strong relationship
Used when your data Used when your data do not
− p – value = 0.00
meet the assumptions conform to these
− significant
for a parametric test. assumptions.
− The way we rate others according to their
attractiveness is related to how attractive we believe
Transforms the original
ourselves to be.
scores into ranks before
IV. MEASURES OF ASSOCIATION
performing further
Chi-Square or X2
calculations.
− measures the association between two categorical
variables.
Monotonic Means/ Monotone Function
− Is used in analyzing associations of categorical data.
is a function between ordered sets that preserves or
− E.g. color of their blouse or shirt they are wearing, by
reverses the given order
ethnic group, religion or the country in which they
live (it does not make sense to order them
numerically).
− a nonparametric statistical test often used for

6
PSYCHOLOGICAL STATISTICS
VICTORIANO, HAZEL T.
categorical data used when we have one variable only, as in the example
− only a test of significance, not a measure of a above.
relationship between variables − X2 test for independence: 2 x 2
X2 rxy used when we are looking for an association
Chi square is about the Correlation is about the between two variables, with two levels (e.g. the
test of significance strength of the association between drinking alcohol [drinks/does not
relationship drink] and smoking [smokes/does not smoke]).
Chi square uses frequency Correlation uses the actual − X2 test for independence: r x 2
(by counting) data used when we are looking for an association
between two variables, where one has two levels
The analyses of the relationships between categorical (smokes/does not smoke) and the other has more than
variables include the following: two levels (heavy drinker, moderate drinker, does not
− Frequency counts shown in the form of a table drink). This is called an r x 2, because there are several
− Inferential tests, which show us whether the rows and two columns.
relationship between the variables is likely to have One-variable X2 (goodness-of-fit test)
been due to sampling error, assuming the null Encode the data in the SPSS
hypothesis is true. Weight the cases
Effect size: x2 can be converted to a statistic called − Click Data
Cramer’s V − Click weight cases
− this is interpreted in the same way as any other − Click weight cases by and Insert frequency
correlation coefficient. − Click OK
Analyze the Data
− Click on Analyze
− Click Nonparametric tests
− Click Legacy Dialogs and
− Click Chi-square
− Transfer the variables
− Click OK
Output for one-variable X2
For a one-variable x2, it is important to report the value
of the test statistic, the degrees of freedom, and

The measures of association that we are going to associated probability level.

discuss in this chapter are as follows: However, the first section of the x2 output confirms our

− One-variable X2 (goodness-of-fit test) calculations


7
PSYCHOLOGICAL STATISTICS
VICTORIANO, HAZEL T.
Output for one-variable X2 people who smoke and drink, the expected
Sample interpretation frequencies in the cells will be different.
The x2 value of 53.6, DF = 3 was found to have an The rationale for 2 x 2 chi-square
associated probability value of 0.0001. This means that, if The resulting X2 value is compared with the value
the null hypothesis were true, such a value would rarely that would be expected to have arisen if the null
occur (once in ten thousand). Thus we can accept that hypothesis were true (i.e. there really is no association
there is a significant difference between the observed between the two variables).
and expected frequencies, and can conclude that all The trouble with this is that x2 has an assumption:
brands of chocolate are not equally popular. The table you must not have more than 25% of cells (in this case,
below shows that more people prefer chocolate B (60) 25% of 4 = 1) with an expected frequency of less than 5.
than the other bars of chocolate. Weight Cases
Another example: − Click Data
− Click weight cases
− Click weight cases by and Insert frequency

Another example: − Click OK


Analyze the data
− Click Analyze
− Click Descriptive statistics
X2 test for independence: 2 x 2 − Click crosstabs
X2 enables us to discover whether there is a − Move the smoke variable to the column and drink
relationship or association between two variable to row
categorical variables (e.g. the association between − Check the statistics chi – square & Cramers V
smoking [smokes/does not smoke] and drinking − Click continue then ok
[drinks/does not drink]).
The rationale for 2 x 2 chi-square
− The test calculates the expected frequencies in the
cells. In other words, there are 110 students. The
test calculates how many we can expect to find in
each cell, if there is really no relation- ship between
smoking and drinking (i.e. the null hypothesis is
true).
− The expected frequencies for each cell are computed
in a way similar to the one-variable case of X2,
except that, since we have different numbers of
8
PSYCHOLOGICAL STATISTICS
VICTORIANO, HAZEL T.
of sampling error. Cramer’s V was found to be 0.33 – thus
nearly 11% of the variation in frequencies of smoking can
be explained by drinking. It can therefore be concluded
that there is a significant association between smoking
and drinking.
V. PHI COEFFICIENT CORRELATION
Phi Coefficient (rφ)
When both variables are dichotomous, the resulting
correlation is called a phi-coefficient.
You need to read the Pearson Chi-Square row developed to measure the strength of association
X2 value is 12.12, DF = 1, p < 0.001. So the probability of between two variables.
obtaining a X2 of this magnitude is very remote – less Steps in the analysis:
than 1 in a 1000 chance . . . therefore we conclude that − Click Analyze
there is an association between smoking and drinking – in − Click Descriptive Statistics
students anyway. − Click CrossTabs
The output also gives Cramer’s V: this shows an − Click Statistics
effect size of 0.33. If you square 0.33, you will obtain a − Select the Phi and Cramer's V option
value of 0.1089. Thus nearly 11% of the variation in − Click OK
frequency counts of smoking can be explained by Interpretation Guide of the Phi Coefficient
drinking. -1.0 to -0.71 strong negative association.
-0.70 to -0.31 weak negative association.
-0.30 to +0.30 little or no association.
+0.31 to +0.70 weak positive association.
+0.71 to +1.0 strong positive association.
VI. POINT-BISERIAL CORRELATION (RPB)
Point – Biserial Correlation
− In situations where one variable is dichotomous and
the other consists of regular numerical scores
Interpretation:
(interval or ratio scale), the resulting correlation is
A 2 x 2 X2 was carried out to discover whether there
called a point-biserial correlation.
was a significant relationship between smoking and
− point biserial correlation coefficient (rpb) is
drinking. The X2 value of 12.12 had an associated
a correlation coefficient used when one variable
probability value of < 0.001, DF = 1, showing that such an
(e.g. Y) is dichotomous
association is extremely unlikely to have arisen as a result
− used to measure the strength and direction of the
9
PSYCHOLOGICAL STATISTICS
VICTORIANO, HAZEL T.
association that exists between one continuous − Click Bivariate
variable and one dichotomous variable. − Transfer the variables
− It is a special case of the Pearson r correlation, which − Make sure that the Pearson checkbox is checked in
is applied when you have two continuous variables, the –Correlation Coefficients– area (although it is
whereas in this case one of the variables is measured selected by default in SPSS Statistics).
on a dichotomous scale. − Click continue
Assumptions of Point - Biserial − Click OK
Assumption #1: P value
One of your two variables should be measured on − If the p value is greater than 0.05, no significance fail
a continuous scale. Examples of continuous to reject null hypothesis/acept;
variables include revision time (measured in hours), − If the p value is less than 0.05, significant, reject null
intelligence (measured using IQ score), exam hypothesis.
performance (measured from 0 to 100), weight
(measured in kg)
Assumption #2:
Your other variable should
be dichotomous. Examples of dichotomous
variables include gender (two groups: male or female),
employment status (two groups: employed or
unemployed), smoker (two groups: yes or no)
Assumption #3:
There should be no outliers for the continuous
variable for each category of the dichotomous variable.
You can test for outliers using boxplots.
Assumption #4:
Your continuous variable should be approximately
normally distributed for each category of the
dichotomous variable.
Assumption #5:
Your continuous variable should have equal
variances for each category of the dichotomous variable.
Steps in the analysis:
− Click Analyze
− Click Correlate
10

You might also like