0% found this document useful (0 votes)
94 views17 pages

Linear Regression Analysis HUDM 5122: Introduction To R Johnny Wang

This document provides an outline for an introduction to linear regression analysis using R. It covers key R concepts like workspaces, data types, basic math operations, scripts vs console, plotting, loops and vectors, functions and packages. It also discusses eigenvalues and eigenvectors and provides examples to manually calculate them from sample datasets.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
94 views17 pages

Linear Regression Analysis HUDM 5122: Introduction To R Johnny Wang

This document provides an outline for an introduction to linear regression analysis using R. It covers key R concepts like workspaces, data types, basic math operations, scripts vs console, plotting, loops and vectors, functions and packages. It also discusses eigenvalues and eigenvectors and provides examples to manually calculate them from sample datasets.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Linear Regression Analysis

HUDM 5122

Introduction to R
Johnny Wang

January 25, 2023


Intro to R

Outline for today:


▶ R workspace
▶ R data types and objects
▶ basic math
▶ scripts vs. console
▶ plotting
▶ loops and vector operations
▶ functions and packages
▶ Eigenvalues and Eigenvectors
R Workspace

Open R

A working directory is where you are reading data from/writing


data to. Let’s make a working directory for this class.

A workspace is the collection of all the objects you have created.


We now have a blank workspace.
R Objects

Types of data in R:
▶ numeric: real numbers (like doubles)
▶ integer: integers
▶ character: holds non-numeric values (like strings)
▶ logical: holds true/false values (like booleans)
R Objects

R allows you to create a different set of objects than most


programming languages:
▶ vector: store a one-dimensional, ordered set of objects in same
class
▶ matrix: store a two-dimensional, indexed set of objects in
same class
▶ data frame: store a two-dimensional, indexed, named set of
objects in multiple classes
▶ array: store an n-dimensional, indexed set of objects in same
class
▶ list: store a one-dimensional ordering of any collection of
objects
Concatenation: c(), cbind(), rbind()
R Objects

You can see which objects are in your workspace:


▶ ls() lists all objects
▶ rm() removes an object
▶ attach() attaches a data frame (so columns are now objects
under their names)
▶ detach() detaches an object
▶ search() lists all attached objects (and packages)
R Basic Math

Computations in R:
▶ Simple algebraic: +, −, ∗, /
▶ Matrix computations: element-by-element is simple algebraic,
matrix is % ∗ %
▶ Exponential/Log: exp(), log()
R Basic Math

What’s with < −? Why not just use = to assign values?


▶ they do similar things, but have a different scope
▶ = is for concrete instantiation
▶ < − can be declared within a function... and exist outside of
the function
▶ (some of you will have learned that such behavior is bad
encapsulation)
▶ ex: mean(x = 1:10) vs. mean(x <- 1:10). Does x exist in
the workspace?
A few more R Warnings

R confuses some programmers:


▶ R uses $ in a manner analogous to the way other languages
use dot.
▶ R has several one-letter reserved words: c, q, s, t, C, D,
F, I, and T.
▶ (not really, but pretend)
▶ advice: do not use T or F. Ever.
▶ python friends: beware x[−3]
▶ Careful about vectors. Think C, not linear algebra
▶ (try x*y; try again with different lengths!)

▶ Enjoy the bugs.


R Scripts

The console executes a single command right away

Scripts allow you to save a set of commands


▶ save a set of executable commands
▶ write a function, which applies an action to a set of inputs
▶ to run a script, source("demoscript.R")
▶ to make a function available for use:
1. save latest version of function
2. run source file for that function

▶ Let’s write a function to calculate a mean

▶ Let’s modify it to exclude data above a particular threshold


Functions and Packages

One of the most useful parts of R is the package library


▶ R has lots of built in functions, like mean(), min(), max(), etc
▶ sometimes you want to do something fancy and R does not
have a built in function (ex: support vector machines)
▶ often, there will be a package to do what you want
▶ a package is a library of functions that you can call
▶ download a package by Packages & Data > Package Installer
(install all dependencies!)
▶ attach a package by library(package name)
▶ then use the functions in the package
▶ search() also displays all attached packages
Plotting

Plotting in R works by layers:


▶ plot() plots the inputs on a new plot
▶ type controls type of plotting (“p”, “l”, “o”, etc)
▶ pch controls point symbols
▶ lty controls line type
▶ col controls color
▶ lwd controls line width
▶ cex controls point size
▶ points() adds a set of points to your plot
▶ lines() adds a set of lines
▶ hist() creates a new histogram
A Bit More R

We did not finish the R intro last time. Let’s load Auto.csv.
> Auto <- read.csv("Auto.csv",header=T,na.strings="?")
> Auto <- na.omit(Auto)
> attach(Auto)
> names(Auto)
> class(mpg)
> class(cylinders)

What values can cylinders take? What about mpg?


A Bit More R

Quantitative vs. Categorical variables:


▶ Quantitative: real-valued variable for which arithmetic
operations make sense
▶ Categorical: variables that are not quantitative
So what type of variable should mpg be? What about cylinders?

In R, categorical variables are called factors. We can coerce data


into this format:
> cylinders <- as.factor(cylinders)
> class(cylinders)
> levels(cylinders)
A Bit More R

Matrix vs. data frames:


▶ Matrix: n × m array of same type of data
▶ Data Frame: n × m array that can have different types of
data
We can coerce data frames into matrices (and vice versa):
> class(Auto)
> Auto.mat <- as.matrix(Auto)
> class(Auto.mat)
> Auto.mat[,2]
> class(Auto.mat[,2])
> Auto[1:2,2]
> Auto.mat[1,2]+Auto.mat[2,2]
> Auto[1,2]+Auto[2,2]
> detach(Auto) # For later

What happened?
A Bit More R
The console executes a single command right away

Scripts allow you to save a set of commands


▶ save a set of executable commands
▶ write a function, which applies an action to a set of inputs
▶ to run a script, go to File > Source File and select script to
run
▶ to make a function available for use:
1. save latest version of function
2. run source file for that function
Let’s make a script to load the cleaned version of Auto.csv,
load.auto.R:
rm(list=ls()) # Clears all objects
# Load auto.csv, turn ?’s into NAs, read top row as names
Auto <- read.csv("Auto.csv",header=T,na.strings="?")
Auto <- na.omit(Auto) # Remove NAs
attach(Auto) # Attach dataset
Eigenvalues and Eigenvectors

▶ Files example1.csv and example2.csv


In this problem we will manually go through all of the steps for finding
Eigenvalues and Eigenvectors. Basic computations like finding the
eigenvalues for a matrix may be done using R.
a. Load example1.csv. Find the column means and the row means
for the data. What do these values tell us about this data set?
b. Center the data and and find the empirical covariance matrix, Σ̂.
This should be a 5-by-5 matrix. What do the diagonal values of the
covariance matrix tell us about this data set? What do the off
diagonal elements tell us about this data set?
c. Give the eigenvalues and associated eigenvectors of Σ̂. Why does
this matrix have the same left eigenvectors as right eigenvectors?

x⊤ ⊤
left Σ̂ = λxleft , Σ̂xright = λxright

You might also like