Course 7 - R Programming
Course 7 - R Programming
Programming Languages
Programming means giving instruc�ons to a computer to perform an ac�on or set of ac�ons. Even if this is your first
�me programming, you already have plenty of experience telling a computer what to do. For example, you've
probably used a spreadsheet func�on to sort your data or perform calcula�ons, or you might have used SQL to tell a
computer how to pull data from a database or join two different data tables. Programming goes even further. It gives
you the highest level of control over your data. SQL can communicate with databases, but a general-purpose
programming language lets you create your own applica�ons and build your own func�ons from scratch. To
program, you first need to know a programming language. In this video, we'll learn about the basics of programming
languages and how they can help you work with your data.
Programming languages are the words and symbols we use to write instruc�ons for computers to follow. You can
think of a programming language as a bridge that connects humans and computers, and allows them to
communicate. Programming languages have their own set of rules for how these words and symbols should be
used, called syntax. Syntax shows you how to arrange the words and symbols you enter so they make sense to a
computer. Coding is wri�ng instruc�ons to the computer in the syntax of a specific programming language. Just like
the variety of human languages around the world, there's lots of different programming languages available to
communicate with computers. There's a language for almost anything you want to do, from designing websites, to
developing video games, to working with data. For example, Python is a general-purpose language that can be used
for all sorts of things, from working with ar�ficial intelligence to crea�ng virtual reality experiences. Javascript works
well for developing online apps and is an essen�al part of web browsers. Some other popular programming languages
for data analysis include SAS, Scala, and Julia. While programming languages can look different on the surface, they
all share similar structures and coding concepts. Once you learn your first language, you'll find it easier to learn
others.
Coming up, we'll explore R's many capabili�es. Before that, let's talk about some benefits of using any programming
language to work with your data. I'll highlight three. Programming helps you clarify the steps of your analysis, saves
time, and lets you easily reproduce and share your work.
Let's start with clarity. Programming languages have specific rules and guidelines for giving instruc�ons to the
computer. When you're telling a computer what to do, your instruc�ons have to be very clear. There can't be any
inconsistency in the way you write code. If there is, the code won't work. Transla�ng your thoughts into code forces
you to figure out exactly how to write each step of your analysis and how all the steps fit together. It gives your
analysis a level of precision that makes it really powerful.
Using a programming language for data analysis also saves you lots of �me. For example, take the process of cleaning
and transforming your data. With one line of code, you can create a separate dataset without any missing values.
With another line, you can apply mul�ple filters on your data. This lets you spend less �me preparing your data and
more �me on the analysis itself.
Finally, programming languages make it easy to reproduce your analysis. Data analysis is most useful when you can
reproduce your work and share it with other people. They can double-check it and help you solve problems. Code
automa�cally stores all of the steps of your analysis so you can reproduce, and share your work at any �me in the
future, weeks, months, or even years later. Here's an example. Let's say you're working on a project. You've collected
and cleaned your data and started your analysis, but the results don't add up. You suspect a mistake was made in the
process. You'd like to discuss the issue with a teammate and get their feedback. If you used a spreadsheet, you both
might have to redo the en�re analysis to discover the error. There's no easy way to record and reproduce your steps
in a spreadsheet, but if you use a programming language, all your work can be reproduced and shared in a moment,
from loading the data, to crea�ng visualiza�ons, to repor�ng the results. Plus, you can easily update your analysis
and fix any errors simply by changing the code.
From spreadsheets to SQL to R
Although the programming language R might be new to you, it actually has a lot of similarities to the other tools you have explored
in this program. In this reading, you will compare spreadsheet programs, SQL, and R to have a better sense of how to use each
moving forward.
• They all use filters: for example, you can easily filter a dataset using any of these tools. In R, you can use the filter function.
This performs the same task as a basic SELECT-FROM-WHERE SQL query. In a spreadsheet, you can create a filter using the
menu options.
• They all use functions: In spreadsheets, you use functions in formulas, and in SQL, you include them in queries. In R, you
will use functions in the code that is part of your analysis.
The table below presents key questions to explore a few more ways that these tools compare to each other. You can use this as a
general guide as you begin to navigate R.
The lubridate package that you are about to install is part of the tidyverse. The tidyverse is a collection of packages
in R with a common design philosophy for data manipulation, exploration, and visualization. For a lot of data
analysts, the tidyverse is an essential tool. You will learn more about the tidyverse later on in this course.
Why RStudio?
One of your core tasks as an analyst will be converting raw data into insights that are accurate, useful, and interesting.
That can be tricky to do when the raw data is complex. R and RStudio are designed to handle large data sets, which
spreadsheets might not be able to handle as well. RStudio also makes it easy to reproduce your work on different
datasets. When you input your code, it's simple to just load a new dataset and run your scripts again. You can also
create more detailed visualizations using RStudio.
For example, imagine you are analyzing sales data for every city across an entire country. That is a lot of data from a lot
of different groups–in this case, each city has its own group of data.
• Using RStudio makes it easy to take a specific analysis step and perform it for each group using basic code. In
this example, you could calculate the yearly average sales data for every city.
• RStudio also allows for flexible data visualization. You can visualize differences across the cities effectively using
plotting features like facets–which you’ll learn more about later on.
• You can also use RStudio to automatically create an output of summary stats—or even your visualized plots—for
each group.
As you learn more about R and RStudio moving forward in this program, you’ll get a better understanding of when
RStudio should be your data analysis tool of choice.
MODULE 2
Programming Fundamentals
The basic concepts of R are:
1. Functions:
Func�ons are a body of reusable code used to perform specific tasks in R. Func�ons begin with func�on names like
print or paste, and are usually followed by one or more arguments in parentheses. An argument is informa�on that
a func�on in R needs in order to run. Here's a simple func�on in ac�on. (Here ‘Coding in R’ is an argument).
We'll start our func�on in the console with the func�on name ‘print’. This func�on name will return whatever we
include in the values in parentheses. We'll type an open parenthesis followed by a quota�on mark. Both the close
parenthesis and end quote automa�cally pop up because RStudio recognizes this syntax. Now we just have to add
the text string. We'll type “Coding in R”. Then we'll press enter.
Success! The code returns the words "Coding in R."
If you want to find out more about the print func�on or any
func�on, all you have to do is type a ques�on mark, the func�on name, and a set of parentheses { ?print()}. This
returns a page in the Help window, which helps you learn more about the func�ons you're working with.
Keep in mind that func�ons are case-sensi�ve, so typing Print with a Capital P brings back an error message.
Func�ons are great, but it can be prety �me-consuming to type out lots of values. To save �me, we can use variables
to represent the values.
2, 3 and 4. Variables, Comments and Data Types:
Variables let us call out the values any �me we need to. A variable is a representa�on of a value in R that can be
stored for use later during programming. Variables can also be called objects. As a data analyst, you'll find variables
are very useful when programming. For example, if you want to filter a dataset, just assign a variable to the func�on
you used to filter the data. That way, all you have to do is use that variable to filter the data later. When naming a
variable in R, you can use a short phrase. A variable name should start with a leter and can also contain numbers
and underscores. So, the variable 5penguin wouldn't work well
because it starts with a number. Also, just like func�ons,
variable names are case-sensi�ve. Using all lower case leters is
good prac�ce whenever possible. Now, before we get to coding
a variable, let's add a comment.
Comments are helpful when you want to describe or
explain what's going on in your code. Use them as
much as possible so that you and everyone can
understand the reasoning behind it. Comments should
be used to make an R script more readable. A comment
shouldn't be treated as code, so we'll put a # in front of
it. Then we'll add our comment, (# Here's an example of
a variable).
Every Variables are associated with a data type. Every data can be of different types. Like:
Numeric (3.14, 1500)
Character (Sam, Bob, My name is Maverick)
Logical (True / False)
Complex (7+5i, 3-9i; In complex data type you have two parts. First one is ‘real part’ and then you have ‘imaginary
part’. In 7+5i: ‘7’ represents the real part and ‘5i’ represents the imaginary part.)
Now let's go ahead with our example. It makes sense to use a variable name to connect to what the variable is
represen�ng. So, we'll type the variable name ‘first_variable’. Then a�er the variable name, we'll type a < sign,
followed by a -. This is the assignment operator. It assigns the value to the variable. It looks like an arrow, which makes
sense, since it's poin�ng from the value to the variable. There are other assignment operators that work too, but it's
always good to s�ck with just one type in your code. Next, we'll add the value that our variable will represent. We'll
use the text, "This is my variable." If we type the variable and hit Run, it will return the value that the variable
represents. This is a very basic way of using a variable.
For now, let's assign a variable to a different data type, numeric. We'll name this ‘second_variable’, and type our
assignment operator. We'll give it the numeric value 12.5. The Environment pane in the upper- right part of our work
space now shows both of our variables and their values.
5. Vectors: A vector is a group of data elements of the same type stored in a sequence in R. You can make a vector
using the combined func�on. In R this func�on is just the leter c followed by the values you want in your vector
inside parentheses like, c(x,y,z…).
All right, let's create a vector. Imagine this vector is for a measurement data that we need to analyze. We'll start our
code with the variable vec_1 to assign to the vector. Then we'll type c and the open parenthesis. Then we'll type our
list of numbers separated by commas. We'll then close our parentheses and press enter. This �me when we type our
variable and press enter, it returns our vector. We can use this vector anywhere in our analysis with only its variable
name vec_1. The values in the vector will automa�cally be applied to our analysis.
6. Pipes:
A pipe is a tool in R for expressing a sequence of mul�ple opera�ons. A pipe is represented by a % sign, followed by
a > sign, and another % sign (%>%). It's used to apply the output of one func�on into another func�on. Pipes can
make your code easier to read and understand. For example, this pipe filters and sorts the data.
• Vectors
• Data frames
• Matrices
• Arrays
Think of a data structure like a house that contains your data.
This reading will focus on vectors. Later on, you’ll learn more about data
frames, matrices, and arrays.
There are two types of vectors: atomic vectors and lists. Coming up, you’ll
learn about the basic properties of atomic vectors and lists, and how to use
R code to create them.
Atomic vectors
First, we will go through the different types of atomic vectors. Then, you will learn how to use R code to create, identify,
and name the vectors.
Earlier, you learned that a vector is a group of data elements of the same type, stored in a sequence in R. You cannot
have a vector that contains both logicals and numerics.
There are six primary types of atomic vectors: logical, integer, double, character (which contains strings), complex,
and raw. The last two–complex and raw–aren’t as common in data analysis, so we will focus on the first four. Together,
integer and double vectors are known as numeric vectors because they both contain numbers. This table summarizes
the four primary types:
Creating vectors
One way to create a vector is by using the c() function (called the “combine” function). The c() function in R combines
multiple values into a vector. In R, this function is just the letter “c” followed by the values you want in your vector inside
the parentheses, separated by a comma: c(x, y, z, …).
For example, you can use the c() function to store numeric data in a vector.
To create a vector of integers using the c() function, you must place the letter "L" directly after each number.
You can determine what type of vector you are working with by using the typeof() function. Place the code for the
vector inside the parentheses of the function. When you run the function, R will tell you the type. For example:
typeof(c(a, b))
Notice that the output of the typeof function in this example is “character”. Similarly, if you use the typeof function on
a vector with integer values, then the output will include “integer” instead:
typeof(c(1L , 3L))
You can determine the length of an existing vector–meaning the number of elements it contains–by using the length()
function. In this example, we use an assignment operator to assign the vector to the variable x. Then, we apply the
length() function to the variable. When we run the function, R tells us the length is 3.
length(x)
#> [1] 3
You can also check if a vector is a specific type by using an is function: is.logical(), is.double(), is.integer(),
is.character(). In this example, R returns a value of TRUE because the vector contains integers.
is.integer(x)
In this example, R returns a value of FALSE because the vector does not contain characters, rather it contains logicals.
is.character(y)
Naming vectors
All types of vectors can be named. Names are useful for writing readable code and describing objects in R. You can
name the elements of a vector with the names() function. As an example, let’s assign the variable x to a new vector with
three elements.
x <- c(1, 3, 5)
You can use the names() function to assign a different name to each element of the vector.
names(x) <- c("a", "b", "c")
Now, when you run the code, R shows that the first element of the vector is named a, the second b, and the third c.
#> a b c
#> 1 3 5
Remember that an atomic vector can only contain elements of the same type. If you want to store elements of different
types in the same data structure, you can use a list.
Creating lists
Lists are different from atomic vectors because their elements can be of any type—like dates, data frames, vectors,
matrices, and more. Lists can even contain other lists.
You can create a list with the list() function. Similar to the c() function, the list() function is just list followed by the
values you want in your list inside parentheses: list(x, y, z, …). In this example, we create a list that contains four different
kinds of elements: character ("a"), integer (1L), double (1.5), and logical (TRUE).
Like we already mentioned, lists can contain other lists. If you want, you can even store a list inside a list inside a list—
and so on.
list(list(list(1 , 3, 5)))
We run the function, then R tells us that the list contains four elements, and that the elements consist of four different
types: character (chr), integer (int), number (num), and logical (logi).
#> List of 4
#> $ : int 1
Let’s use the str() function to discover the structure of our second example. First, let’s assign the list to the variable z to
make it easier to input in the str() function.
str(z)
#> List of 1
#> $ :List of 1
The indentation of the $ symbols reflect the nested structure of this list. Here, there are three levels (so there is
a list within a list within a list).
Naming lists
Lists, like vectors, can be named. You can name the elements of a list when you first create it with the list() function:
$`Chicago`
[1] 1
$`New York`
[1] 2
$`Los Angeles`
[1] 3
Additional resource
To learn more about vectors and lists, check out R for Data Science, Chapter 20: Vectors. R for Data Science is a classic
resource for learning how to use R for data science and data analysis. It covers everything from cleaning to visualizing to
communicating your data. If you want to get more details about the topic of vectors and lists, this chapter is a great place
to start.
Dates and times in R
In this reading, you will learn how to work with dates and times in R using the lubridate package. Coming up, you will
use tools in the lubridate package to convert different types of data in R into date and date-time formats.
If you haven't already installed tidyverse, you can use the install.packages() function to do so:
• install.packages("tidyverse")
Next, load the tidyverse and lubridate packages using the library() function. First, load the core tidyverse to make it
available in your current R session:
• library(tidyverse)
Then, load the lubridate package:
• library(lubridate)
Now you’re ready to be introduced to the tools in the lubridate package.
Types
In R, there are three types of data that refer to an instant in time:
• A date ("2016-08-16")
• A time within a day (“20:11:59 UTC")
• And a date-time. This is a date plus a time ("2018-03-31 18:15:48 UTC")
The time is given in UTC, which stands for Universal Time Coordinated, more commonly called Universal Coordinated
Time. This is the primary standard by which the world regulates clocks and time.
For example, to get the current date you can run the today() function. The date appears as year, month, and day.
today()
To get the current date-time you can run the now() function. Note that the time appears to the nearest second.
now()
When working with R, there are three ways you are likely to create date-time formats:
• From a string
• From an individual date
• From an existing date/time object
R creates dates in the standard yyyy-mm-dd format by default.
ymd("2021-01-20")
When you run the function, R returns the date in yyyy-mm-dd format.
It works the same way for any order. For example, month, day, and year. R still returns the date in yyyy-mm-dd format.
Or, day, month, and year. R still returns the date in yyyy-mm-dd format.
dmy("20-Jan-2021")
These functions also take unquoted numbers and convert them into the yyyy-mm-dd format.
ymd(20210120)
ymd_hms("2021-01-20 20:11:59")
mdy_hm("01/20/2021 08:01")
You can use the function as_date() to convert a date-time to a date. For example, put the current date-time—now()—in
the parentheses of the function.
as_date(now())
Data structures
Recall that a data structure is like a house that contains your data.
Data frames:
Data frames are the most common way of storing and analyzing data in R, so it’s important to understand what they are
and how to create them. A data frame is a collection of columns–similar to a spreadsheet or SQL table. Each column
has a name at the top that represents a variable, and includes one observation per row. Data frames help summarize
data and organize it into a format that is easy to read and use.
For example, the data frame below shows the “diamonds” dataset, which is one of the preloaded datasets in R. Each
column contains a single variable that is related to diamonds: carat, cut, color, clarity, depth, and so on. Each row
represents a single observation.
There are a few key things to keep in mind when you are working with data frames:
• First, columns should be named.
• Second, data frames can include many different types of data, like numeric, logical, or character.
• Finally, elements in the same column should be of the same type.
You will learn more about data frames later on in the program, but this is a great starting point.
If you need to manually create a data frame in R, you can use the data.frame() function. The data.frame() function takes
vectors as input. In the parentheses, enter the name of the column, followed by an equals sign, and then the vector you
want to input for that column. In this example, the x column is a vector with elements 1, 2, 3, and the y column is a vector
with elements 1.5, 5.5, 7.5.
If you run the function, R displays the data frame in ordered rows and columns.
x y
1 1 1.5
2 2 5.5
3 3 7.5
In most cases, you won’t need to manually create a data frame yourself, as you will typically import data from another
source, such as a .csv file, a relational database, or a software program.
Files
Let’s go over how to create, copy, and delete files in R. For more information on working with files in R, check out R
documentation: files. R documentation is a tool that helps you easily find and browse the documentation of almost all R
packages on CRAN. It’s a useful reference guide for functions in R code. Let’s go through a few of the most useful
functions for working with files.
Use the dir.create function to create a new folder, or directory, to hold your files. Place the name of the folder in the
parentheses of the function.
dir.create ("destination_folder")
Use the file.create() function to create a blank file. Place the name and the type of the file in the parentheses of the
function. Your file types will usually be something like .txt, .docx, or .csv.
file.create (“new_text_file.txt”)
file.create (“new_word_file.docx”)
file.create (“new_csv_file.csv”)
If the file is successfully created when you run the function, R will return a value of TRUE (if not, R will return FALSE).
file.create (“new_csv_file.csv”)
[1] TRUE
Copying a file can be done using the file.copy() function. In the parentheses, add the name of the file to be copied.
Then, type a comma, and add the name of the destination folder that you want to copy the file to.
If you check the Files pane in RStudio, a copy of the file appears in the relevant folder:
You can delete R files using the unlink() function. Enter the file’s name in the parentheses of the function.
unlink (“some_.file.csv”)
Additional resource
If you want to learn more about working with data frames, matrices, and arrays in R, check out the Data Wrangling
section of Stat Education's Introduction to R course. The section includes modules on data frames, matrices, and arrays
(and more), and each module contains helpful examples of key coding concepts.
--------------------------------------------------------------------------------------------------------------------------------------
Optional: Matrices
A matrix is a two-dimensional collection of data elements. This means it has both rows and columns. By contrast, a
vector is a one-dimensional sequence of data elements. But like vectors, matrices can only contain a single data type.
For example, you can’t have both logicals and numerics in a matrix.
To create a matrix in R, you can use the matrix() function. The matrix() function has two main arguments that you enter
in the parentheses. First, add a vector. The vector contains the values you want to place in the matrix. Next, add at least
one matrix dimension. You can choose to specify the number of rows or the number of columns by using the code nrow
= or ncol =.
For example, imagine you want to create a 2x3 (two rows by three columns) matrix containing the values 3-8. First, enter
a vector containing that series of numbers: c(3:8). Then, enter a comma. Finally, enter nrow = 2 to specify the
number of rows.
matrix(c(3:8), nrow = 2)
If you run the function, R displays a matrix with three columns and two rows (typically referred to as a “2x3”) that contain
the numeric values 3, 4, 5, 6, 7, 8. R places the first value (3) of the vector in the uppermost row, and the leftmost column
of the matrix, and continues the sequence from left to right.
[1,] 3 5 7
[2,] 4 6 8
You can also choose to specify the number of columns (ncol = ) instead of the number of rows (nrow = ).
matrix(c(3:8), ncol = 2)
When you run the function, R infers the number of rows automatically.
[,1] [,2]
[1,] 3 6
[2,] 4 7
[3,] 5 8
Operators & Calcula�ons
Operator is a symbol that names the type of opera�on or calcula�on to be performed in a formula.
Imagine we have our hands on some e-commerce sales data that we need to analyze. Throughout our analysis we
will use variables that R will store so that we reference them whenever we need to. We will work with assignment
operators. Assignment operators are used to assign values to variables and vectors.
So, if we have a bunch of sales figures that we want to include in a vector, we can use assignment operator to assign
them to a variable.
Now, whenever we want to use the sales figure, we just type the variable we assigned.
Let’s checkout Arithme�c Operators. Arithme�c operators are used to complete math calcula�ons. Plus sign (+) do
addi�on in variables, Minus sign (-) do subtrac�on. An asterisk sign (*) used to do mul�plica�on and slash sign (/) do
division.
Logical operators
Logical operators return a logical data type such as TRUE or FALSE.
Let’s check out an example of how you might use logical operators to analyze data. Imagine you are working with the
airquality dataset that is preloaded in RStudio. It contains data on daily air quality measurements in New York from May
to September of 1973.
The data frame has six columns: Ozone (the ozone measurement), Solar.R (the solar measurement), Wind (the wind
measurement), Temp (the temperature in Fahrenheit), and the Month and Day of these measurements (each row
represents a specific month and day combination).
Let’s go through how the AND, OR, and NOT operators might be helpful in this situation.
AND example
Imagine you want to specify rows that are extremely sunny and windy, which you define as having a Solar measurement
of over 150 and a Wind measurement of over 10.
In R, you can express this logical statement as Solar.R > 150 & Wind > 10.
Only the rows where both of these conditions are true fulfill the criteria:
OR example
Next, imagine you want to specify rows where it’s extremely sunny or it’s extremely windy, which you define as having a
Solar measurement of over 150 or a Wind measurement of over 10.
In R, you can express this logical statement as Solar.R > 150 | Wind > 10.
All the rows where either of these conditions are true fulfill the criteria:
NOT example
Now, imagine you just want to focus on the weather measurements for days that aren't the first day of the month.
Finally, imagine you want to focus on scenarios that aren't extremely sunny and not extremely windy, based on your
previous definitions of extremely sunny and extremely windy. In other words, the following statement should not be true:
either a Solar measurement greater than 150 or a Wind measurement greater than 10.
Notice that this statement is the opposite of the OR statement used above. To express this statement in R, you can put
an exclamation point (!) in front of the previous OR statement: !(Solar.R > 150 | Wind > 10). R will apply the NOT
operator to everything within the parentheses.
----------------------------------------------------------------------------------------------------------------------------------------
Let’s discuss how to create conditional statements in R using three related statements:
• if()
• else()
• else if()
if statement
The if statement sets a condition, and if the condition evaluates to TRUE, the R code associated with the if statement is
executed.
In R, you place the code for the condition inside the parentheses of the if statement. The code that has to be executed if
the condition is TRUE follows in curly braces (expr). Note that in this case, the second curly brace is placed on its own
line of code and identifies the end of the code that you want to execute.
if (condition) {
expr
x <- 4
Next, let’s create a conditional statement: if x is greater than 0, then R will print out the string “x is a positive
number".
if (x > 0) {
Since x = 4, the condition is true (4 > 0). Therefore, when you run the code, R prints out the string “x is a positive
number".
But if you change x to a negative number, like -4, then the condition will be FALSE (-4 > 0). If you run the code, R will not
execute the print statement. Instead, a blank line will appear as the result.
else statement
The else statement is used in combination with an if statement. This is how the code is structured in R:
if (condition) {
expr1
} else {
expr2
The code associated with the else statement gets executed whenever the condition of the if statement is not
TRUE. In other words, if the condition is TRUE, then R will execute the code in the if statement (expr1); if the
condition is not TRUE, then R will execute the code in the else statement (expr2).
x <- 7
x <- 7
if (x > 0) {
} else {
print ("x is either a negative number or zero")
Since 7 is greater than 0, the condition of the if statement is true. So, when you run the code, R prints out “x is a
positive number”.
But if you make x equal to -7, the condition of the if statement is not true (-7 is not greater than 0). Therefore, R will
execute the code in the else statement. When you run the code, R prints out “x is either a negative number or
zero”.
x <- -7
if (x > 0) {
} else {
else if statement
In some cases, you might want to customize your conditional statement even further by adding the else if statement. The
else if statement comes in between the if statement and the else statement. This is the code structure:
if (condition1) {
expr1
} else if (condition2) {
expr2
} else {
expr3
If the if condition (condition1) is met, then R executes the code in the first expression (expr1). If the if condition is not met,
and the else if condition (condition2) is met, then R executes the code in the second expression (expr2). If neither of the
two conditions are met, R executes the code in the third expression (expr3).
In our previous example, using only the if and else statements, R can only print “x is either a negative number
or zero” if x equals 0 or x is less than zero. Imagine you want R to print the string “x is zero” if x equals 0. You
need to add another condition using the else if statement.
Let’s try an example. First, create a variable x equal to negative 1 (“-1”), and run the code to save the variable to
memory.
x <- -1
x <- -1
if (x < 0) {
} else if (x == 0) {
print("x is zero")
} else {
Run the code. Since -1 is less than 0, the condition for the if statement evaluates to TRUE, and R prints “x is a
negative number”.
If you make x equal to 0, R will first check the if condition (x < 0), and determine that it is FALSE. Then, R will evaluate
the else if condition. This condition, x==0, is TRUE. So, in this case, R prints “x is zero”.
If you make x equal to 1, both the if condition and the else if condition evaluate to FALSE. So, R will execute the else
statement and print “x is a positive number”.
As soon as R discovers a condition that evaluates to TRUE, R executes the corresponding code and ignores the rest.
Basic Concepts of R
Function A body of reusable code for performing specific tasks in R
Data Types An attribute that describes a piece of data based on its values, its programming language, or
the operations it can perform.
Vector A group of data elements of the same type stored in a one-dimensional sequence in R.
Pipe A tool in R for expressing a sequence of multiple operations, represented with %>%
Available R packages
To make the most of R for your data analysis, you will need to install packages.
Packages are units of reproducible R code that you can use to add more functionality to R.
The best part is that the R community creates and shares packages so that other users can access them! In this reading,
you will learn more about widely used packages and where to find them.
Package documentation
Packages will not only include the code itself, but also documentation that explains the package’s author, function, and
any other packages that you will need to download. When you are using CRAN, you can find the package documentation
in the DESCRIPTION file.
• Tidyverse: the tidyverse is a collection of R packages specifically designed for working with data. It’s a standard
library for most data analysts, but you can also download the packages individually.
• Quick list of useful R packages: this is RStudio Support’s list of useful packages with installation instructions and
functionality descriptions.
• CRAN Task Views: this is an index of CRAN packages sorted by task. You can search for the type of task you
need to perform and it will pull up a page with packages related to that task for you to explore.
You will discover more packages throughout this course and as you use R more often, but this is a great starting point for
building your own library.
Welcome to the Tidyverse
Packages are a big part of what makes R so great. Packages offer a helpful combina�on of code, reusable R func�ons,
descrip�ve documenta�on, tests for checking operability, and sample data sets. And for lots of data analysts, at the
top of the list of useful packages is �dyverse.
Tidyverse is actually a collec�on of packages in R with a common design philosophy for data manipula�on,
explora�on, and visualiza�on. Using �dyverse can help you work your way through prety much the en�re data
analysis process. The packages in �dyverse work together naturally. Tidyverse is considered a key part of
programming for most R users. The principles associated with �dyverse, which you'll learn both here and at your job,
have been widely adopted by the R community.
Okay, let's install the �dyverse. Earlier, you learned how to find Base R packages using the func�on install packages.
To install packages like the �dyverse that aren't in Base R, we'll use the install packages func�on. As we discussed
earlier, this func�on calls the �dyverse and other packages from CRAN.
Let's talk about why CRAN was created. Since packages not in Base R are mostly made by R users, people need a
reliable way to check and validate submited code. CRAN makes sure any R content open to the public meets the
required quality standards. So, if it's sourced through CRAN, you can feel good that the package is authen�c and
valid. Another major source of packages and other R content is GitHub.
Now, we'll get back to installing the �dyverse. We'll first type install.packages. Then, between the parentheses, we'll
type �dyverse in quotes. The quotes aren't always necessary, but best prac�ce is to use quotes to make sure that we
are accurate; install.packages (“�dyverse”). We'll press Enter and wait for RStudio to install �dyverse. When we click
on our packages tab, we come across a lot of new packages on the list. That's �dyverse. You might have no�ced that
none of the packages are checked off. We need to load them first before we can use them. But that's a mighty long
list. So, let's just load the package named �dyverse for now, using the library func�on; library (�dyverse). The return
shows that not only was �dyverse loaded, but eight other packages were too. It also shows a list of conflicts. Conflicts
happen when packages have func�ons with the same names as other func�ons. Basically, the last package loaded is
the one whose func�ons will be used, so we'll s�ck with the �dyverse func�ons. But it's important to note that these
messages only appear once. So, as you get more used to R, you'll be able to figure out if you want to use certain
func�ons over others.
The loaded packages are ggplot2, �bble, �dyr, readr, purrr, dplyr, stringr, and forcats. These packages are the core
of the �dyverse because you'll use them in almost every analysis. All of them work together to make your data
analysis smooth and efficient. With these packages, �dyverse helps you do everything from impor�ng and
transforming data to exploring and visualizing it.
The packages available in �dyverse change a lot, but you can always check for updates by running �dyverse_update()
in your console. You can then update the packages in a couple of ways. If you use the update packages func�on, it'll
update all of your packages. That might take a while. So, if you just want to update one package, you can use the
install packages func�on again with the package name as your argument in parentheses. You should update packages
regularly to make sure you've got the latest version in your code.