0% found this document useful (0 votes)
6 views2 pages

Correlation and Regression

The document discusses correlation, regression, and analyzing relationships between variables using R-Studio commands. It covers topics like scatterplots, correlation coefficients, regression lines, residuals, and identifying influential cases.

Uploaded by

tanya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views2 pages

Correlation and Regression

The document discusses correlation, regression, and analyzing relationships between variables using R-Studio commands. It covers topics like scatterplots, correlation coefficients, regression lines, residuals, and identifying influential cases.

Uploaded by

tanya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

5.

Correlation and regression


DATE LEARNED @October 25, 2023

RETENTION 🟧🟧🟧🟧🟧🟧
NEXT REP. October 29, 2023

subject data analysis

notes

R-STUDIO COMMANDS

1. Scatterplot

>plot(Y~X) where Y is the dependent variable and X the explanatory variable

2. Correlation coefficient (measures the possible linear association between


the variables)

>cor(X,Y)

3. Regression line:

fit<-lm(Y~X)
* intercept → slope of the function

To get the regression line on the scatter plot:

>abline(fit)

or you can just do:

>abline(lm(Y~X)

To calculate the score of a student that got a 70 on the midterm for example:

>predict(fit, data.frame(xvariable=70)) (you can do this manually as well)

Residuals and more information:

>summary(lm(Y~X))

Coefficient of determination (R²) → how much percentage does the x variable


explain about the variation in the y variable.

* hay que hacer la raiz cuadrada del valor, no me acuerdo para qué

RESIDUAL PLOT

5. Correlation and regression 1


>fit.res <- resid(fit)
>plot(fit.res~midterm,ylab=”Residuals”,main=”Residual Plot”
(midterm=dependent variable)

>abline(0,0) → to plot horizontal line

Anlaysing influential cases

FIRST WAY

>identify(Y~X) (And you can click on the cases you want to identify as outliers
in the scatter plot con el ratón del ordenador). You will see clearly which are
the outliers.

Press "esc" TWICE to get out of the screen and it will show the values x that
you have clicked on.

SECOND WAY

>plot(Y~X, col=”lightblue”)
>text(Y~X, labels=rownames(dataset))

To eliminate the cases, now assign to a new data frame the info
without the outliers:

>exam_new <- exam[-c(2,18),]

(exam_new = the new dataset, exam = your previous dataset)

(2 and 18 are the outliers you identified previously)

5. Correlation and regression 2

You might also like