Unit 3 Big Data
Unit 3 Big Data
What is R
Why Use R?
• It is a great resource for data analysis, data visualization, data science and machine
learning
• It provides many statistical techniques (such as statistical tests, classification,
clustering and data reduction)
• It is easy to draw graphs in R, like pie charts, histograms, box plot, scatter plot, etc++
• It works on different platforms (Windows, Mac, Linux)
• It is open-source and free
• It has a large community support
• It has many packages (libraries of functions) that can be used to solve different
problems
R Print Output
Print
Unlike many other programming languages, you can output code in R without using a print
function:
Example
"Hello World!"
However, R does have a print() function available if you want to use it. This might be useful
if you are familiar with other programming languages, such as Python, which often uses
the print() function to output code.
Example
print("Hello World!")
And there are times you must use the print() function to output code, for example when
working with for loops (which you will learn more about in a later chapter):
Example
for (x in 1:10) {
print(x)
}
R Comments
Comments
Comments can be used to explain R code, and to make it more readable. It can also be used to
prevent execution when testing alternative code.
Comments starts with a #. When executing code, R will ignore anything that starts with #.
Example
# This is a comment
"Hello World!"
Example
"Hello World!" # This is a comment
R Variables
Creating Variables in R
R does not have a command for declaring a variable. A variable is created the moment you
first assign a value to it. To assign a value to a variable, use the <- sign. To output (or print)
the variable value, just type the variable name:
Example
name <- "John"
age <- 40
Compared to many other programming languages, you do not have to use a function to
print/output variables in R. You can just type the name of the variable:
Example
name <- "John Doe"
You can also concatenate, or join, two or more elements, by using the paste() function.
Example
text <- "awesome"
• A variable name must start with a letter and can be a combination of letters, digits,
period(.)
and underscore(_). If it starts with period(.), it cannot be followed by a digit.
• A variable name cannot start with a number or underscore (_)
• Variable names are case-sensitive (age, Age and AGE are three different variables)
• Reserved words cannot be used as variables (TRUE, FALSE, NULL, if...)
Variables can store data of different types, and different types can do different things.
In R, variables do not need to be declared with any particular type, and can even change type
after they have been set:
Example
my_var <- 30 # my_var is type of numeric
my_var <- "Sally" # my_var is now of type character (aka string)
R has a variety of data types and object classes. You will learn much more about these as you
continue to get to know R.
We can use the class() function to check the data type of a variable:
Example
# numeric
x <- 10.5
class(x)
# integer
x <- 1000L
class(x)
# complex
x <- 9i + 3
class(x)
# character/string
x <- "R is exciting"
class(x)
# logical/boolean
x <- TRUE
class(x)
***************************************************************************
• Environment tab: It shows the variables that are generated during the
course of programming in a workspace that is temporary.
• History tab: In this tab, you’ll see all the commands that are used till
now from the start of usage of R Studio.
• To the right bottom, you have another panel, which contains multiple tabs, such as
files,
plots, packages, help, and viewer.
• The Files tab shows the files and directories that are available within
the default workspace of R.
• The Plots tab shows the plots that are generated during the course of
programming.
• The Packages tab helps you to look at what are the packages that are
already installed in the R Studio and it also gives a user interface to
install new packages.
• The Help tab is the most important one where you can get help from
the R Documentation on the functions that are in built-in R.
• The final and last tab is that the Viewer tab which can be used to see
the local web content that’s generated using R.
Features of R Studio
• A friendly user interface
• writing and storing reusable programmes
• All imported data and newly created objects (such as variables, functions, etc.) are
easily accessible.
• Comprehensive assistance for any item Code autocompletion
• The capacity to organise and share your work with your partners more effectively
through the creation of projects.
• Plot snippets
• Simple terminal and console switching
• Tracking of operational history
• There are numerous articles from RStudio Support on using the IDE.
Set the working directory in R Studio
R is always pointed at a directory on our computer. We can find out which directory by
running the getwd() function. Note: this function has no arguments. We can set the working
directory manually in two ways:
• The first way is to use the console and using the command
setwd(“directorypath”).
You can use this function setwd() and give the path of the directory which you
want to be the working directory for R studio, in the double codes.
• The second way is to set the working directory from the GUI.
To set the working directory from the GUI you have to click on this 3 dots button.
When you click this, this will open up a file browser, which will help you to
choose your working directory.
• Once you choose your working directory, you need to use this setting button in the
more tab and click it and then you get a popup menu, where you need to select
“Set as working directory”.
This will select the current directory, which you have chosen using this file browser as your
working directory. Once you set the working directory, you are ready to program in R Studio.
Create an RStudio project
Step 1: Select the FILE option and select create option.
Step 2: Then select the New Project option.
In the example below, we use the + operator to add together two values:
Example
10 + 5
• Arithmetic operators
• Assignment operators
• Comparison operators
• Logical operators
• Miscellaneous operators
R Arithmetic Operators
Arithmetic operators are used with numeric values to perform common mathematical
operations:
+ Addition x+y
- Subtraction x-y
* Multiplication x*y
/ Division x/y
^ Exponent x^y
R Assignment Operators
Example
my_var <- 3
my_var <<- 3
3 -> my_var
3 ->> my_var
== Equal x == y
!= Not equal x != y
> Greater than x>y
R Logical Operators
Operator Description
&& Logical AND operator - Returns TRUE if both statements are TRUE
R Miscellaneous Operators
***************************************************************************
R Decision Making
R If ... Else
Conditions and If Statements
== Equal x == y
!= Not equal x != y
These conditions can be used in several ways, most commonly in "if statements" and loops.
The if Statement
An "if statement" is written with the if keyword, and it is used to specify a block of code to
be executed if a condition is TRUE:
Example
a <- 33
b <- 200
if (b > a) {
print("b is greater than a")
}
In this example we use two variables, a and b, which are used as a part of the if statement to
test whether b is greater than a. As a is 33, and b is 200, we know that 200 is greater than 33,
and so we print to screen that "b is greater than a".
Else If
The else if keyword is R's way of saying "if the previous conditions were not true, then try
this condition":
Example
a <- 33
b <- 33
if (b > a) {
print("b is greater than a")
} else if (a == b) {
print ("a and b are equal")
}
In this example a is equal to b, so the first condition is not true, but the else if condition is
true, so we print to screen that "a and b are equal".
If Else
The else keyword catches anything which isn't caught by the preceding conditions:
Example
a <- 200
b <- 33
if (b > a) {
print("b is greater than a")
} else if (a == b) {
print("a and b are equal")
} else {
print("a is greater than b")
}
In this example, a is greater than b, so the first condition is not true, also the else if condition
is not true, so we go to the else condition and print to screen that "a is greater than b".
Example
a <- 200
b <- 33
if (b > a) {
print("b is greater than a")
} else {
print("b is not greater than a")
}
Nested If Statements
You can also have if statements inside if statements, this is called nested if statements.
Example
x <- 41
if (x > 10) {
print("Above ten")
if (x > 20) {
print("and also above 20!")
} else {
print("but not above 20.")
}
} else {
print("below 10.")
}
AND
The & symbol (and) is a logical operator, and is used to combine conditional statements:
Example
a <- 200
b <- 33
c <- 500
if (a > b & c > a) {
print("Both conditions are true")
}
OR
The | symbol (or) is a logical operator, and is used to combine conditional statements:
Example
a <- 200
b <- 33
c <- 500
if (a > b | a > c) {
print("At least one of the conditions is true")
}
***************************************************************************
R Looping Statements
R While Loop
Loops
Loops are handy because they save time, reduce errors, and they make code more readable.
• while loops
• for loops
R While Loops
With the while loop we can execute a set of statements as long as a condition is TRUE:
Example
i <- 1
while (i < 6) {
print(i)
i <- i + 1
}
In the example above, the loop will continue to produce numbers ranging from 1 to 5. The
loop will stop at 6 because 6 < 6 is FALSE.
The while loop requires relevant variables to be ready, in this example we need to define an
indexing variable, i, which we set to 1.
Break
With the break statement, we can stop the loop even if the while condition is TRUE:
Example
The loop will stop at 3 because we have chosen to finish the loop by using
the break statement when i is equal to 4 (i == 4).
Next
With the next statement, we can skip an iteration without terminating the loop:
Example
i <- 0
while (i < 6) {
i <- i + 1
if (i == 3) {
next
}
print(i)
}
When the loop passes the value 3, it will skip it and continue to loop.
Yahtzee!
If .. Else Combined with a While Loop
Example
dice <- 1
while (dice <= 6) {
if (dice < 6) {
print("No Yahtzee")
} else {
print("Yahtzee!")
}
dice <- dice + 1
}
R For Loop
For Loops
Example
for (x in 1:10) {
print(x)
}
This is less like the for keyword in other programming languages, and works more like an
iterator method as found in other object-orientated programming languages.
With the for loop we can execute a set of statements, once for each item in a vector, array,
list, etc..
You will learn about lists and vectors, etc in a later chapter.
Example
for (x in fruits) {
print(x)
}
Example
for (x in dice) {
print(x)
}
The for loop does not require an indexing variable to set beforehand, like with while loops.
Break
With the break statement, we can stop the loop before it has looped through all the items:
Example
for (x in fruits) {
if (x == "cherry") {
break
}
print(x)
}
The loop will stop at "cherry" because we have chosen to finish the loop by using
the break statement when x is equal to "cherry" (x == "cherry").
***************************************************************************