0% found this document useful (0 votes)
2 views25 pages

Unit 3 Big Data

R is a programming language widely used for statistical computing and data visualization, offering a range of features for data analysis, machine learning, and graphical representation. It is open-source, platform-independent, and supported by a large community, with numerous packages available for various tasks. The document also covers R's syntax, variable creation, data types, operators, and the R Studio IDE, providing a comprehensive introduction for beginners.

Uploaded by

John
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views25 pages

Unit 3 Big Data

R is a programming language widely used for statistical computing and data visualization, offering a range of features for data analysis, machine learning, and graphical representation. It is open-source, platform-independent, and supported by a large community, with numerous packages available for various tasks. The document also covers R's syntax, variable creation, data types, operators, and the R Studio IDE, providing a comprehensive introduction for beginners.

Uploaded by

John
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

Introduction to R

What is R

R is a popular programming language used for statistical computing and graphical


presentation.

Its most common use is to analyze and visualize data.

Why Use R?

• It is a great resource for data analysis, data visualization, data science and machine
learning
• It provides many statistical techniques (such as statistical tests, classification,
clustering and data reduction)
• It is easy to draw graphs in R, like pie charts, histograms, box plot, scatter plot, etc++
• It works on different platforms (Windows, Mac, Linux)
• It is open-source and free
• It has a large community support
• It has many packages (libraries of functions) that can be used to solve different
problems

R Print Output
Print

Unlike many other programming languages, you can output code in R without using a print
function:

Example
"Hello World!"

However, R does have a print() function available if you want to use it. This might be useful
if you are familiar with other programming languages, such as Python, which often uses
the print() function to output code.

Example
print("Hello World!")

And there are times you must use the print() function to output code, for example when
working with for loops (which you will learn more about in a later chapter):
Example
for (x in 1:10) {
print(x)
}
R Comments
Comments

Comments can be used to explain R code, and to make it more readable. It can also be used to
prevent execution when testing alternative code.

Comments starts with a #. When executing code, R will ignore anything that starts with #.

This example uses a comment before a line of code:

Example
# This is a comment
"Hello World!"

This example uses a comment at the end of a line of code:

Example
"Hello World!" # This is a comment
R Variables
Creating Variables in R

Variables are containers for storing data values.

R does not have a command for declaring a variable. A variable is created the moment you
first assign a value to it. To assign a value to a variable, use the <- sign. To output (or print)
the variable value, just type the variable name:

Example
name <- "John"
age <- 40

name # output "John"


age # output 40
Print / Output Variables

Compared to many other programming languages, you do not have to use a function to
print/output variables in R. You can just type the name of the variable:
Example
name <- "John Doe"

name # auto-print the value of the name variable


R Concatenate Elements
Concatenate Elements

You can also concatenate, or join, two or more elements, by using the paste() function.

To combine both text and a variable, R uses comma (,):

Example
text <- "awesome"

paste("R is", text)


R Variable Names (Identifiers)
Variable Names
A variable can have a short name (like x and y) or a more descriptive name (age, carname,
total_volume). Rules for R variables are:

• A variable name must start with a letter and can be a combination of letters, digits,
period(.)
and underscore(_). If it starts with period(.), it cannot be followed by a digit.
• A variable name cannot start with a number or underscore (_)
• Variable names are case-sensitive (age, Age and AGE are three different variables)
• Reserved words cannot be used as variables (TRUE, FALSE, NULL, if...)

# Legal variable names:


myvar <- "John"
my_var <- "John"
myVar <- "John"
MYVAR <- "John"
myvar2 <- "John"
.myvar <- "John"

# Illegal variable names:


2myvar <- "John"
my-var <- "John"
my var <- "John"
_my_var <- "John"
my_v@ar <- "John"
TRUE <- "John"
R Data Types
Data Types

In programming, data type is an important concept.

Variables can store data of different types, and different types can do different things.

In R, variables do not need to be declared with any particular type, and can even change type
after they have been set:

Example
my_var <- 30 # my_var is type of numeric
my_var <- "Sally" # my_var is now of type character (aka string)

R has a variety of data types and object classes. You will learn much more about these as you
continue to get to know R.

Basic Data Types

Basic data types in R can be divided into the following types:

• numeric - (10.5, 55, 787)


• integer - (1L, 55L, 100L, where the letter "L" declares this as an integer)
• complex - (9 + 3i, where "i" is the imaginary part)
• character (a.k.a. string) - ("k", "R is exciting", "FALSE", "11.5")
• logical (a.k.a. boolean) - (TRUE or FALSE)

We can use the class() function to check the data type of a variable:

Example
# numeric
x <- 10.5
class(x)

# integer
x <- 1000L
class(x)

# complex
x <- 9i + 3
class(x)

# character/string
x <- "R is exciting"
class(x)

# logical/boolean
x <- TRUE
class(x)

***************************************************************************

R Graphical User Interface


Introduction to R Studio
•••
R Studio is an integrated development environment(IDE) for R. IDE is a GUI, where you can
write your quotes, see the results and also see the variables that are generated during the
course of programming.
• R Studio is available as both Open source and Commercial software.
• R Studio is also available as both Desktop and Server versions.
• R Studio is also available for various platforms such as Windows, Linux, and
macOS.
Introduction to R studio for beginners:
Rstudio is an open-source tool that provides Ide to use R language, and enterprise-ready
professional software for data science teams to develop share the work with their team.
R Studio can be downloaded from its official Website (https://rstudio.com/) and instructions
for installation are available on How to Install RStudio for R programming in Windows?
After the installation process is over, the R Studio interface looks like:

Top VS Code Extensions for Web Developer in 2024!! GeeksforGeeks


Share
17:24
• The console panel(left panel) is the place where R is waiting for you to tell it what
to do, and see the results that are generated when you type in the commands.
• To the top right, you have the Environmental/History panel. It contains 2 tabs:

• Environment tab: It shows the variables that are generated during the
course of programming in a workspace that is temporary.
• History tab: In this tab, you’ll see all the commands that are used till
now from the start of usage of R Studio.
• To the right bottom, you have another panel, which contains multiple tabs, such as
files,
plots, packages, help, and viewer.

• The Files tab shows the files and directories that are available within
the default workspace of R.
• The Plots tab shows the plots that are generated during the course of
programming.
• The Packages tab helps you to look at what are the packages that are
already installed in the R Studio and it also gives a user interface to
install new packages.
• The Help tab is the most important one where you can get help from
the R Documentation on the functions that are in built-in R.
• The final and last tab is that the Viewer tab which can be used to see
the local web content that’s generated using R.
Features of R Studio
• A friendly user interface
• writing and storing reusable programmes
• All imported data and newly created objects (such as variables, functions, etc.) are
easily accessible.
• Comprehensive assistance for any item Code autocompletion
• The capacity to organise and share your work with your partners more effectively
through the creation of projects.
• Plot snippets
• Simple terminal and console switching
• Tracking of operational history
• There are numerous articles from RStudio Support on using the IDE.
Set the working directory in R Studio
R is always pointed at a directory on our computer. We can find out which directory by
running the getwd() function. Note: this function has no arguments. We can set the working
directory manually in two ways:
• The first way is to use the console and using the command
setwd(“directorypath”).
You can use this function setwd() and give the path of the directory which you
want to be the working directory for R studio, in the double codes.
• The second way is to set the working directory from the GUI.
To set the working directory from the GUI you have to click on this 3 dots button.
When you click this, this will open up a file browser, which will help you to
choose your working directory.

• Once you choose your working directory, you need to use this setting button in the
more tab and click it and then you get a popup menu, where you need to select
“Set as working directory”.
This will select the current directory, which you have chosen using this file browser as your
working directory. Once you set the working directory, you are ready to program in R Studio.
Create an RStudio project
Step 1: Select the FILE option and select create option.
Step 2: Then select the New Project option.

Step 3: Then choose the path and directory name.


Finally, project are created in a specific location:
Navigating directories in R studio
• getwd(): Returns the current working directory.
• setwd(): Set the working directory.
• dir(): Return the list of the directory.
• sessionInfo(): Return the session of the windows.
• date(): Return the current date.
Creating your first R script
Here we are adding two numbers in R studio.
How to Perform Various Operations in RStudio
We’ll see some common tasks, their codes in R Studio
Installing R packages
Syntax:
install.packages('package_name')
Loading R package
Syntax:
library(package_name)
Help on an R package
help(package_name)
***************************************************************************
Features of R
Why Use R?
• Statistical Analysis: R is designed for analysis and It provides an extensive
collection of graphical and statistical techniques, By making a preferred choice
for statisticians and data analysts.
• Open Source: R is an open – source software, which means it is freely available
to anyone. It can be accessble by a vibrant community of users and developers.
• Data Visulaization : R boasts an array of libraries like ggplot2 that enable the
creation of high-quality, customizable data visualizations.
• Data Manipulation : R offers tools that are for data manipulation and
transformation. For example: IT simplifies the process of filtering , summarizing
and transforming data.
• Integration : R can be easily integrate with other programming languages and
data sources. IT has connectors to various databases and can be used in
conjunction with python, SQL and other tools.
• Community and Packages: R has vast ecosystem of packages that extend its
functionality. There are packages that can help you accomplish needs of
analytics.
Features of R Programming Language
• R Packages: One of the major features of R is it has a wide availability of
libraries. R has CRAN(Comprehensive R Archive Network), which is a
repository holding more than 10, 0000 packages.
• Distributed Computing: Distributed computing is a model in which
components of a software system are shared among multiple computers to
improve efficiency and performance. Two new packages ddR and multidplyr
used for distributed programming in R were released in November 2015.
Statistical Features of R
• Basic Statistics: The most common basic statistics terms are the mean, mode,
and median. These are all known as “Measures of Central Tendency.” So using
the R language we can measure central tendency very easily.
• Static graphics: R is rich with facilities for creating and developing interesting
static graphics. R contains functionality for many plot types including graphic
maps, mosaic plots, biplots, and the list goes on.
• Probability distributions: Probability distributions play a vital role in statistics
and by using R we can easily handle various types of probability distributions
such as Binomial Distribution, Normal Distribution, Chi-squared Distribution,
and many more.
• Data analysis: It provides a large, coherent, and integrated collection of tools
for data analysis.
• *********************************************************************
R Operators
Operators

Operators are used to perform operations on variables and values.

In the example below, we use the + operator to add together two values:

Example
10 + 5

R divides the operators in the following groups:

• Arithmetic operators
• Assignment operators
• Comparison operators
• Logical operators
• Miscellaneous operators

R Arithmetic Operators

Arithmetic operators are used with numeric values to perform common mathematical
operations:

Operator Name Example

+ Addition x+y

- Subtraction x-y

* Multiplication x*y

/ Division x/y
^ Exponent x^y

%% Modulus (Remainder from division) x %% y

%/% Integer Division x%/%y

R Assignment Operators

Assignment operators are used to assign values to variables:

Example
my_var <- 3

my_var <<- 3

3 -> my_var

3 ->> my_var

my_var # print my_var


R Comparison Operators

Comparison operators are used to compare two values:

Operator Name Example

== Equal x == y

!= Not equal x != y
> Greater than x>y

< Less than x<y

>= Greater than or equal to x >= y

<= Less than or equal to x <= y

R Logical Operators

Logical operators are used to combine conditional statements:

Operator Description

& Element-wise Logical AND operator. It returns TRUE if both


elements are TRUE

&& Logical AND operator - Returns TRUE if both statements are TRUE

| Elementwise- Logical OR operator. It returns TRUE if one of the


statement is TRUE

|| Logical OR operator. It returns TRUE if one of the statement is


TRUE.
! Logical NOT - returns FALSE if statement is TRUE

R Miscellaneous Operators

Miscellaneous operators are used to manipulate data:

Operator Description Example

: Creates a series of numbers in a sequence x <- 1:10

%in% Find out if an element belongs to a vector x %in% y

%*% Matrix Multiplication x <-


Matrix1
%*%
Matrix2

***************************************************************************
R Decision Making
R If ... Else
Conditions and If Statements

R supports the usual logical conditions from mathematics:

Operator Name Example

== Equal x == y

!= Not equal x != y

> Greater than x>y

< Less than x<y

>= Greater than or equal to x >= y

<= Less than or equal to x <= y

These conditions can be used in several ways, most commonly in "if statements" and loops.

The if Statement

An "if statement" is written with the if keyword, and it is used to specify a block of code to
be executed if a condition is TRUE:

Example
a <- 33
b <- 200
if (b > a) {
print("b is greater than a")
}

In this example we use two variables, a and b, which are used as a part of the if statement to
test whether b is greater than a. As a is 33, and b is 200, we know that 200 is greater than 33,
and so we print to screen that "b is greater than a".

R uses curly brackets { } to define the scope in the code.

Else If

The else if keyword is R's way of saying "if the previous conditions were not true, then try
this condition":

Example
a <- 33
b <- 33

if (b > a) {
print("b is greater than a")
} else if (a == b) {
print ("a and b are equal")
}

In this example a is equal to b, so the first condition is not true, but the else if condition is
true, so we print to screen that "a and b are equal".

You can use as many else if statements as you want in R.

If Else

The else keyword catches anything which isn't caught by the preceding conditions:

Example
a <- 200
b <- 33

if (b > a) {
print("b is greater than a")
} else if (a == b) {
print("a and b are equal")
} else {
print("a is greater than b")
}

In this example, a is greater than b, so the first condition is not true, also the else if condition
is not true, so we go to the else condition and print to screen that "a is greater than b".

You can also use else without else if:

Example
a <- 200
b <- 33

if (b > a) {
print("b is greater than a")
} else {
print("b is not greater than a")
}
Nested If Statements

You can also have if statements inside if statements, this is called nested if statements.

Example
x <- 41

if (x > 10) {
print("Above ten")
if (x > 20) {
print("and also above 20!")
} else {
print("but not above 20.")
}
} else {
print("below 10.")
}
AND

The & symbol (and) is a logical operator, and is used to combine conditional statements:

Example

Test if a is greater than b, AND if c is greater than a:

a <- 200
b <- 33
c <- 500
if (a > b & c > a) {
print("Both conditions are true")
}
OR

The | symbol (or) is a logical operator, and is used to combine conditional statements:

Example

Test if a is greater than b, or if c is greater than a:

a <- 200
b <- 33
c <- 500

if (a > b | a > c) {
print("At least one of the conditions is true")
}
***************************************************************************
R Looping Statements
R While Loop
Loops

Loops can execute a block of code as long as a specified condition is reached.

Loops are handy because they save time, reduce errors, and they make code more readable.

R has two loop commands:

• while loops
• for loops

R While Loops

With the while loop we can execute a set of statements as long as a condition is TRUE:

Example

Print i as long as i is less than 6:

i <- 1
while (i < 6) {
print(i)
i <- i + 1
}

In the example above, the loop will continue to produce numbers ranging from 1 to 5. The
loop will stop at 6 because 6 < 6 is FALSE.

The while loop requires relevant variables to be ready, in this example we need to define an
indexing variable, i, which we set to 1.

Note: remember to increment i, or else the loop will continue forever.

Break

With the break statement, we can stop the loop even if the while condition is TRUE:

Example

Exit the loop if i is equal to 4.


i <- 1
while (i < 6) {
print(i)
i <- i + 1
if (i == 4) {
break
}
}

The loop will stop at 3 because we have chosen to finish the loop by using
the break statement when i is equal to 4 (i == 4).

Next

With the next statement, we can skip an iteration without terminating the loop:

Example

Skip the value of 3:

i <- 0
while (i < 6) {
i <- i + 1
if (i == 3) {
next
}
print(i)
}

When the loop passes the value 3, it will skip it and continue to loop.

Yahtzee!
If .. Else Combined with a While Loop

To demonstrate a practical example, let us say we play a game of Yahtzee!

Example

Print "Yahtzee!" If the dice number is 6:

dice <- 1
while (dice <= 6) {
if (dice < 6) {
print("No Yahtzee")
} else {
print("Yahtzee!")
}
dice <- dice + 1
}
R For Loop
For Loops

A for loop is used for iterating over a sequence:

Example
for (x in 1:10) {
print(x)
}

This is less like the for keyword in other programming languages, and works more like an
iterator method as found in other object-orientated programming languages.

With the for loop we can execute a set of statements, once for each item in a vector, array,
list, etc..

You will learn about lists and vectors, etc in a later chapter.

Example

Print every item in a list:

fruits <- list("apple", "banana", "cherry")

for (x in fruits) {
print(x)
}
Example

Print the number of dices:

dice <- c(1, 2, 3, 4, 5, 6)

for (x in dice) {
print(x)
}

The for loop does not require an indexing variable to set beforehand, like with while loops.

Break

With the break statement, we can stop the loop before it has looped through all the items:
Example

Stop the loop at "cherry":

fruits <- list("apple", "banana", "cherry")

for (x in fruits) {
if (x == "cherry") {
break
}
print(x)
}

The loop will stop at "cherry" because we have chosen to finish the loop by using
the break statement when x is equal to "cherry" (x == "cherry").

***************************************************************************

You might also like