0% found this document useful (0 votes)
5 views

Lecture S2

Machine learning notes

Uploaded by

Melody Whelan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Lecture S2

Machine learning notes

Uploaded by

Melody Whelan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

R and R studio

Oliver Hinder

12/23/2020
R and R studio

I R is a data analysis environment


I The programming language
I The system of libraries and books around it.
I R Studio is an Integrated Development environment.
I If you are already a computer programmer, you can use your editing
tools to work in R.
I R Studio includes an editor, console, environment viewer, history
I Access to help files
I Goal of this lecture: a quick start guide to R if either
I you have programmed before but not in R
I you are a little rusty in R
R Markdown

I R Markdown is a format for writing documents and presentations.


I This is an R Markdown presentation.
I For more details see http://rmarkdown.rstudio.com.
I When you click the Knit a document will be generated that includes
both content as well as the output of any embedded R code.
I HWs and projects should be done in Rmarkdown
Action item: Open this presentation
(IntroductionToSoftwareTools/LectureS2.Rmd) and try knitting it.
Cheatsheets

I Click Help in the menu bar then Cheetsheets then click the
cheatsheet you want to look at.
Action item: Explore R Markdown Cheat Sheet, RStudio IDE Cheat
Sheet and R Markdown Cheat Sheet
Interactive R

R can be used as a calculator


3+5

## [1] 8
Run a function
x = rnorm(20)
y = rexp(20)
plot(x,y)
2.0
1.5
y

1.0
0.5
0.0

−2 −1 0 1

x
What does a function do?

Type ? in front of a function in the console to find out what a function


does
?rnorm
?rexp
Creating your own function

add_two <- function(input_number) {


output_number <- input_number + 2
return(output_number)
}
add_two(3)

## [1] 5
Adding comments

# This is a comment in R.
# Remember use comments to document your code.
Some data types

Some standard data types


I Logical
TRUE == FALSE # True equals false?

## [1] FALSE
!FALSE # Not false?

## [1] TRUE
FALSE | TRUE # Either true or false?

## [1] TRUE
More data types
I Floats
x <- 1.0
# Double means a floating point number stored with 64 bits of pr
# https://en.wikipedia.org/wiki/Double-precision_floating-point_
typeof(x)

## [1] "double"
class(x)

## [1] "numeric"
y <- 1
class(y) # This is still a floating point number

## [1] "numeric"
I Integers
z <- 1L # Add L to specify that is an integer
class(z)

## [1] "integer"
More data types

I Characters
my_string <- "hello"
class(my_string)

## [1] "character"
print(my_string)

## [1] "hello"
I Dates
course_start <- as.Date("2020-01-18")
course_end <- as.Date("2020-05-01")
course_length <- course_end - course_start
print(course_length)

## Time difference of 104 days


Some data structures

I Vectors (numbers)
I You can apply functions over entire vectors (vectorization)
a <- c(3, 5, 3, 7, 10)
b = c(1, 2, 3, 4, 5)
1/b

## [1] 1.0000000 0.5000000 0.3333333 0.2500000 0.2000000


sum((a-mean(a))^2)/(length(a)-1)

## [1] 8.8
var(a)

## [1] 8.8
More data structures

I Factors are more efficient ways of storing ordered categorical data


I Each unique categorical value is replaced by an integer
I Less memory that storing each string
factored_data <- factor(c("rain", "sun",
"rain", "rain",
"sun"))
factored_data

## [1] rain sun rain rain sun


## Levels: rain sun
Sequences

I You can construct a sequence like a for loop


seq(from=-5, to=5, by=2)

## [1] -5 -3 -1 1 3 5
seq(-5, 5, 2)

## [1] -5 -3 -1 1 3 5
seq(-5, 5, by=2)

## [1] -5 -3 -1 1 3 5
?seq

I You can construct a sequence using :


-2:4

## [1] -2 -1 0 1 2 3 4
Matrices

I Create a matrix from a single vector


mat <- matrix(nrow = 3, ncol = 2, c(1,2,3,4,5,6))
mat

## [,1] [,2]
## [1,] 1 4
## [2,] 2 5
## [3,] 3 6
More matrices

I Create matrix by binding columns together


mat2 <- cbind(1:4, c("dog", "cat", "bird", "dog"))
mat2

## [,1] [,2]
## [1,] "1" "dog"
## [2,] "2" "cat"
## [3,] "3" "bird"
## [4,] "4" "dog"
I Create matrix by binding rows together
mat3 <- rbind(c(1,2,4,5), c(6,7,0,4))
mat3

## [,1] [,2] [,3] [,4]


## [1,] 1 2 4 5
## [2,] 6 7 0 4
Data frames

I Like a matrix, but you can have different data types.


I Columns can be named
I Data frames are the primary data structure you will work with.
I The better R packages all work on data frames.
I ggplot2, tidyr, reshape, dplyr
students <- data.frame(c("Cedric","Fred","George","Cho",
"Draco","Ginny"),
c(3,2,2,1,0,-1),
c("H", "G", "G", "R", "S", "G"))
Data frames
I naming the columns:
names(students) <- c("name", "year", "house")
students

## name year house


## 1 Cedric 3 H
## 2 Fred 2 G
## 3 George 2 G
## 4 Cho 1 R
## 5 Draco 0 S
## 6 Ginny -1 G
I Access each column by its name
students$name

## [1] "Cedric" "Fred" "George" "Cho" "Draco" "Ginny"


students$year

## [1] 3 2 2 1 0 -1
students$house
Dataframes from vectors

I You can also create a dataframe from a set of vectors


hogwart_students <- data.frame(name = students$name,
year = students$year,
house = students$house)
hogwart_students

## name year house


## 1 Cedric 3 H
## 2 Fred 2 G
## 3 George 2 G
## 4 Cho 1 R
## 5 Draco 0 S
## 6 Ginny -1 G
Tips for coding

I use descriptive variable names, e.g., hogwarts_students rather than


x
I don’t be afraid to use Google/stack overflow to figure out issues
I play around in the console to understand how things work
I come to office hours!
Installing packages

What makes R powerful is the available packages.


install.packages("name_of_package_goes_here")

For this course:


# paste this into command line
install.packages(c("DMwR", "rattle"),
dependencies = c("Depends", "Suggests"))

will install most of the packages we need.


R studio will often detect when a package is not installed and suggest you
install it.
Modern dive
As an introduction to R packages we will find useful in this course we
cover Chapter 1-4 of modern dive before switching to the applied
predictive modeling textbook for the core course content.
Learning checkpoint

I Read Chapter 1 of Modern Dive


I Create a Rmarkdown document containing the answers to the
learning check points in Chapter 1 of ModernDive
I Submit the Rmd file with your answers via canvas.

You might also like