0% found this document useful (0 votes)
46 views37 pages

Text Book of Principal R Progamming For Data Analytics - 05

This document discusses RStudio, an integrated development environment for the R programming language. It provides a customizable workbench with tools for working with R like a console, editor, plots, workspace, and help. The editor has features like syntax highlighting and code completion. Code can be executed directly from the editor by line, selection, or file. RStudio also supports authoring Sweave and TeX documents and runs on Windows, Mac, and Linux. Real-life uses of R by companies like Facebook, Ford, Google, and the National Weather Service are also summarized.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views37 pages

Text Book of Principal R Progamming For Data Analytics - 05

This document discusses RStudio, an integrated development environment for the R programming language. It provides a customizable workbench with tools for working with R like a console, editor, plots, workspace, and help. The editor has features like syntax highlighting and code completion. Code can be executed directly from the editor by line, selection, or file. RStudio also supports authoring Sweave and TeX documents and runs on Windows, Mac, and Linux. Real-life uses of R by companies like Facebook, Ford, Google, and the National Weather Service are also summarized.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

Book I: Module & Tutorial

Principle of R Programming Language


for Data Processing and Analysis
By Azhari
Department of Computer Science and Electronics
Universitas Gadjah Mada

Book
R for Fundamental Data Analysis in Market Research
Sujata Ramnarayan

2020-08-20
Page |1

Chapter I
Introduction R & Application
1 Chapter I Introduction

1.1 Overview
R is a programming language, and open source software that broadly used by numerous
purposes, such as data analysis, graphing, and reporting. R is commonly used in statistical
analysis, scientific computing, machine learning, and data visualization. Since it allows for
programming as well, it makes it more powerful than some other statistical tools for data
processing and analysis.
R was developed by Ross Ihaka and Robert Gentleman at the University of Auckland, New
Zealand. R made its first appearance in 1993. Since mid-1997 there has been a core group
(called the "R Core Team") who can modify the R source code archive.
At its core, R is an interpreted computer language that enables modular programming with
branches and loops and functions. R can integrate procedures written in C, C++, .Net, Python,
or FORTRAN languages for greater efficiency.
R is freely available under the GNU General Public License and comes with pre-compiled
binaries for various operating systems including Linux, Windows and Mac. R is free software
distributed in GNU-style copies and is an official part of the GNU Project called GNU S.

1.2 Features of R
The following are the important features of R −
1. R is a well-developed, simple and effective programming language which includes
conditionals, loops, user defined recursive functions and input and output facilities.
2. R has an effective data handling and storage facility,
3. R provides a suite of operators for calculations on arrays, lists, vectors and matrices.
4. R provides a large, coherent and integrated collection of tools for data analysis.
5. R provides graphical facilities for data analysis and display either directly at the computer
or printing at the papers.
R is the world's most widely used statistical programming language. It's the data scientist's
choice and is backed by an active and talented community of contributors. R is taught in
universities and used in mission-critical business applications.

1.3 Applications of R Programming


Some of the important applications of R Programming Language in the domain of Data
Science are: (https://data-flair.training/blogs/r-applications/ )

(1) Finance
Data Science is most widely used in the financial industry. R is the most popular tool for this
role. This is because R provides an advanced statistical suite that is able to carry out all the
necessary financial tasks.
Page |2

With the help of R, financial institutions are able to perform downside risk measurement,
adjust risk performance and utilize visualizations like candlestick charts, density plots,
drawdown plots, etc.
R also provides tools for moving averages, autoregression and time-series analysis which
forms the crux of financial applications. R is being widely used for credit risk analysis at firms
like ANZ and portfolio management.
Finance industries are also leveraging the time-series statistical processes of R, to model the
movement of their stock-market and predict the prices of shares. R also provides facilities for
financial data mining through its packages like quantmod, pdfetch, TFX, pwt, etc. R makes it
easy for you to extract data from online assets. With the help of RShiny, you can also
demonstrate your financial products through vivid and engaging visualizations.
AD
(2) Banking
Just like financial institutions, banking industries make use of R for credit risk modeling and
other forms of risk analytics.
Banks make heavy usage of the Mortgage Haircut Model that allows them to take over the
property in case of loan defaults. Mortgage Haircut Modelling involves sales price
distribution, the volatility of the sales price and the calculation of expected shortfall. For these
purposes, R is often used alongside proprietary tools like SAS.
R is also used in conjunction with Hadoop to facilitate the analysis of customer quality,
customer segmentation, and retention.
Bank of America makes use of R for financial reporting. With the help of R, the data scientists
at BOA are able to analyze financial losses and make use of R’s visualization tools.

(3) Healthcare
Genetics, Bioinformatics, Drug Discovery, Epidemiology are some of the fields in healthcare
that make heavy usage of R. With the help of R, these companies are able to crunch data and
process information, providing an essential backdrop for further analysis and data processing.
For more advanced processing like drug discovery, R is most widely used for performing pre-
clinical trials and analyzing the drug-safety data. It also provides a suite for performing
exploratory data analysis and vivid visualization tools to its users.
R is also popular for its Bioconductor package that provides various functionalities for
analyzing the genomic data. R is also used for statistical modeling in the field
of epidemiology, where data scientists analyze and predict the spread of diseases.

(4) Social Media


For many beginners in Data Science and R, social media is a data playground. Sentiment
Analysis and other forms of social media data mining are some of the important statistical
tools that are used with R.
Social Media is also a challenging field for Data Science because the data prevalent on social
media websites is mostly unstructured in nature. R is used for social media analytics, for
segmenting potential customers and targeting them for selling your products.
Furthermore, mining user sentiment is another popular category in social media analytics.
With the help of R, companies are able to model statistical tools that analyze user sentiments,
allowing them to improve their experiences.
Page |9

Figure 2.5 Run the R program codes with the OneCompiler IDE (at https://onecompiler.com/r)

2.4.2 Jdoodle R Compiler


JDoodle is an online programming platform where you can learn, teach, practice, develop,
assess and collaborate. JDoodle aims to be a one-stop shop for anything programming.
Currently, JDoodle has the following services:

• An online compiler and IDE service for 76+ languages and 2 Databases, with
collaborative programming and code sharing features.
• A REST-based compiler API to integrate compilers to your applications.
• An IDE Plugin solution to include IDEs to your web applications without using APIs.
• An Online Assessment and Course Platform for teaching and assessing
programming.
• Fullscreen - side-by-side code and output is available. click the "" icon near execute
button to switch.
• Dark Theme available. Click on "" icon near execute button and select dark theme.
Page |4

(8) Real-Life Use Cases of R Language


R applications are not enough until you don’t know how people/companies are using the R
programming language.
1. Facebook – Facebook uses R to update status and its social network graph. It is also used
for predicting colleague interactions with R.
2. Ford Motor Company – Ford relies on Hadoop. It also relies on R for statistical analysis
as well as carrying out data-driven support for decision making.
3. Google – Google uses R to calculate ROI on advertising campaigns and to predict
economic activity and also to improve the efficiency of online advertising.
4. Foursquare – R is an important stack behind Foursquare’s famed recommendation engine.
5. John Deere – Statisticians at John Deere use R for time series modeling and also
geospatial analysis in a reliable and reproducible way. The results are then integrated with
Excel and SAP.
6. Microsoft – Microsoft uses R for the Xbox matchmaking service and also as a statistical
engine within the Azure ML framework.
7. Mozilla – It is the foundation behind the Firefox web browser and uses R to visualize
web activity.
8. New York Times – R is used in the news cycle at The New York Times to crunch data
and prepare graphics before they go for printing.
9. Thomas Cook – Thomas Cook uses R for prediction and also Fuzzy Logic Systems to
automate price settings of their last-minute offers.
10. National Weather Service – The National Weather Service uses R at its River Forecast
Centers. Thus, it is used to generate graphics for flood forecasting.
11. Twitter – R is part of Twitter’s Data Science toolbox for sophisticated statistical modeling.
12. Trulia – Trulia, the real-estate analysis website uses R for predicting house prices and
local crime rates.
13. ANZ Bank – ANZ, the fourth largest bank in Australia uses R for its credit risk analysis.
Page |5

Chapter II
R IDE (Integrated Development Enviroment)
2 IDE (Integrated Development Enviroment)

2.1 RStudio
2.1.1 Overview of RStudio
RStudio is an integrated development environment (IDE) for the R programming language.
Some of its features include:

• Customizable workbench with all of the tools required to work with R in one place
(console, source, plots, workspace, help, history, etc.).
• Syntax highlighting editor with code completion.
• Execute code directly from the source editor (line, selection, or file).
• Full support for authoring Sweave and TeX documents.
• Runs on Windows, Mac, and Linux, and has a community-maintained FreeBSD port.
• Can also be run as a server, enabling multiple users to access the RStudio IDE using a
web browser.

Figure 2.1 RStudio IDE


Page |6

2.2 Simple R Program with R Studio


Simple Program with R Studio can be seen and tryied in book module: module 01

R Script Code R Enviroment

R Console

Figure 2.2 RStudio IDE with Simple R Code

• R Script: As the name suggest, here you get space to write codes. To run those codes,
simply select the line(s) of code and press Ctrl + Enter. Alternatively, you can click
on little ‘Run’ button location at top right corner of R Script.
• R Console: This area shows the output of code you run. Also, you can directly write
codes in console. Code entered directly in R console cannot be traced later. This is
where R script comes to use.
• R environment: This space displays the set of external elements added. This includes
data set, variables, vectors, functions etc. To check if data has been loaded properly
in R, always look at this area.
• Graphical Output: This space display the graphs created during exploratory data
analysis. Not just graphs, you could select packages, seek help with embedded R’s
official documentation.
Page |7

2.3 Visual Studio Code


2.3.1 R Programming in Visual Studio Code
The visual studio code IDE (VS Code) can support and execute the R programming Language
by installing the R extention into VS Code Environment. By the R-extention in In VS code, its
can facilitate the extended syntax highlighting, code completion, linting, formatting,
interacting with R terminals, viewing data, plots, workspace variables, help pages, managing
packages

Figure 2.3 Adding R extention into VS code (see more at: https://code.visualstudio.com/docs/languages/r)

2.3.2 Simple R Program with Visual Studio Code


P a g e | 21

# Create a vector.
apple_colors <- c('green','green','yellow','red','red','red','green')

# Create a factor object.


factor_apple <- factor(apple_colors)

# Print the factor.


print(factor_apple)
print(nlevels(factor_apple))
When we execute the above code, it produces the following result −
[1] green green yellow red red red green
Levels: green red yellow
[1] 3

4.6 Data Frames


Data frames are tabular data objects. Unlike a matrix in data frame each column can contain
different modes of data. The first column can be numeric while the second column can be
character and third column can be logical. It is a list of vectors of equal length.
Data Frames are created using the data.frame() function.
# Create the data frame.
BMI <- data.frame(
gender = c("Male", "Male","Female"),
height = c(152, 171.5, 165),
weight = c(81,93, 78),
Age = c(42,38,26)
)
print(BMI)
When we execute the above code, it produces the following result −
gender height weight Age
1 Male 152.0 81 42
2 Male 171.5 93 38
3 Female 165.0 78 26
Page |9

Figure 2.5 Run the R program codes with the OneCompiler IDE (at https://onecompiler.com/r)

2.4.2 Jdoodle R Compiler


JDoodle is an online programming platform where you can learn, teach, practice, develop,
assess and collaborate. JDoodle aims to be a one-stop shop for anything programming.
Currently, JDoodle has the following services:

• An online compiler and IDE service for 76+ languages and 2 Databases, with
collaborative programming and code sharing features.
• A REST-based compiler API to integrate compilers to your applications.
• An IDE Plugin solution to include IDEs to your web applications without using APIs.
• An Online Assessment and Course Platform for teaching and assessing
programming.
• Fullscreen - side-by-side code and output is available. click the "" icon near execute
button to switch.
• Dark Theme available. Click on "" icon near execute button and select dark theme.
P a g e | 10

Figure 2.6 Run the R program codes with the JDoodle IDE (at https://www.jdoodle.com/execute-r-online/)

2.4.3 Online IDE R Complier


Online R Compiler is a web-based tool powered by ACE code editor. This tool can be used to
learn, build, run, test your programs. You can open the code from your local and continue to
build using this IDE. Scripts and the results can be downloaded. Features of this tool

• Simple & Clean Design, Lightweight, Easy & Fast


• Interactive program execution which makes the user to give program inputs at real
time
• Helpful for beginners to learn and practice programs
• Dark & Light theme options and customizable code editor with more themes
• Options to Copy or Download the Output of the Program
• Expandable Output Terminal
• Coding sharing option helps you to save your code in cloud so that it can be accessed
anytime and anywhere with internet
P a g e | 28

Operator Description Example


3 print(v^t)
it produces the following result −
[1] 256.000 166.375 1296.000

5.2 Relational Operators


Following table shows the relational operators supported by R language. Each element of the
first vector is compared with the corresponding element of the second vector. The result of
comparison is a Boolean value.

Operator Description Example


> Checks if each element of the first Live Demo
vector is greater than the 1 v <- c(2,5.5,6,9)
corresponding element of the 2 t <- c(8,2.5,14,9)
second vector. 3 print(v>t)
it produces the following result −
[1] FALSE TRUE FALSE FALSE
< Checks if each element of the first Live Demo
vector is less than the 1 v <- c(2,5.5,6,9)
corresponding element of the 2 t <- c(8,2.5,14,9)
second vector. 3 print(v < t)
it produces the following result −
[1] TRUE FALSE TRUE FALSE
== Checks if each element of the first Live Demo
vector is equal to the 1 v <- c(2,5.5,6,9)
corresponding element of the 2 t <- c(8,2.5,14,9)
second vector. 3 print(v == t)
it produces the following result −
[1] FALSE FALSE FALSE TRUE
<= Checks if each element of the first Live Demo
vector is less than or equal to the 1 v <- c(2,5.5,6,9)
corresponding element of the 2 t <- c(8,2.5,14,9)
second vector. 3 print(v<=t)
it produces the following result −
[1] TRUE FALSE TRUE TRUE
>= Checks if each element of the first Live Demo
vector is greater than or equal to 1 v <- c(2,5.5,6,9)
the corresponding element of the 2 t <- c(8,2.5,14,9)
second vector. 3 print(v>=t)
it produces the following result −
[1] FALSE TRUE FALSE TRUE
!= Checks if each element of the first Live Demo
vector is unequal to the 1 v <- c(2,5.5,6,9)
corresponding element of the 2 t <- c(8,2.5,14,9)
second vector. 3 print(v!=t)
it produces the following result −
[1] TRUE TRUE TRUE FALSE

Source: https://www.tutorialspoint.com/r/r_operators.htm
P a g e | 30

it produces the following result −


[1] 3+0i 1+0i 1+0i 2+3i
[1] 3+0i 1+0i 1+0i 2+3i
[1] 3+0i 1+0i 1+0i 2+3i
-> Called Right Assignment Live Demo
or 1 c(3,1,TRUE,2+3i) -> v1
->> 2 c(3,1,TRUE,2+3i) ->> v2
3 print(v1)
4 print(v2)
it produces the following result −
[1] 3+0i 1+0i 1+0i 2+3i
[1] 3+0i 1+0i 1+0i 2+3i

5.5 Miscellaneous Operators


These operators are used to for specific purpose and not general mathematical or
logical computation.

Operator Description Example


: Live Demo
Colon operator. It creates the 1 v <- 2:8
series of numbers in sequence for 2 print(v)
a vector. it produces the following result −
[1] 2 3 4 5 6 7 8
%in% Live Demo
1 v1 <- 8
2 v2 <- 12
3 t <- 1:10
This operator is used to identify if 4 print(v1 %in% t)
an element belongs to a vector. 5 print(v2 %in% t)
it produces the following result −
[1] TRUE
[1] FALSE

%*% Live Demo


1 M = matrix( c(2,6,5,1,10,4), nrow = 2,ncol = 3,byrow
= TRUE)
2 t = M %*% t(M)
This operator is used to multiply a 3 print(t)
matrix with its transpose. it produces the following result −
[,1] [,2]
[1,] 65 82
[2,] 82 117
P a g e | 13

2.4.5 Paiza R Online

Figure 2.9 Run the R program codes with Paiza R Online (at https://paiza.io/en/projects/new?language=r )
P a g e | 32

students_score_test1 <- c(79, 82, 84, 91, 83, 88)


students_score_test2 <- c(87, 80, 85, 90, 95, 76)

for (i in 1:6) {
2
cat("\n",i, students_score_test1[i], students_score_test2[i],
students_score_test1[i]+students_score_test2[i])
}

randomdata1 <- rnorm(30) #Create a vector filled with random normal values

tot <- 0
for (i in 1:length(randomdata1)) {
cat("\n", format(i, width=2, justify = "right"),
3 format(randomdata1[i], width = 8, justify = "right", digits =2))
tot <- tot + randomdata1[i]
}
cat("\ntotal =", tot)

(2) Next Statement


A next statement is one of the control statements in R programming that is used to skip the
current iteration of a loop without terminating the loop. Whenever a next statement is
encountered, further evaluation of the code is skipped and the next iteration of the loop starts.

R Syntax & Code Comments & ouput


for (value in vector) {
commands/block of code
:
if (condition is TRUE) {
next
}
}

(3) while Loops in R


A while loop in R is a close cousin of the for loop in R. However, a while loop will check a
logical condition, and keep running the loop as long as the condition is true. If the condition
in the while loop in R is always true, the while loop will be an infinite loop, and our program
will never stop running. This is something we definitely want to avoid! When writing a while
loop in R, we want to ensure that at some point the condition will be false so the loop can stop
running.
The while loops struture play a major role in heavy analytical tasks like simulation and
optimization. Optimization is the act of looking for a set of parameters that either maximize
or minimize some goal.

R Syntax Example R Code


while (Boolean_expression) {
commands/block of code
:
:
}
P a g e | 15

We can use the class() function to check the data type of a variable:

R Code R Comments
1 sales_manager <- "Julian" #type character/string
message1 <- "This is my firs R code"
message2 <- "List my friends and colours"
house_temperature <- 30 #type of numeric

print(message1)
cat("class of vars message is ",class(message1),"\n")

print(house_temperature)
cat("class of vars house temperature is ",class(house_temperature),"\n")

2 # Run based on IDE at:


https://geekflare.com/online-compiler/r

The variables can be assigned values using leftward, rightward and equal to operator. The
values of the variables can be printed using print() or cat() function. The cat() function
combines multiple items into a continuous print output.

Example in R Codes Comments


1 list_students = c("Linda", "Andi", "Andri", "Yulia") # Assignment using equal operator
list_colors <- c("red", "green", "yellow", "blue") # Assignment using leftward operator
c(1500, 3750, 8250, 3300) -> list_bananas_prices # Assignment using rightward operator
young_boy <- TRUE
single_parent <- FALSE

print(list_colors)
print(list_bananas_prices)
print(list_students)
cat("class of vars students is ",class(list_students),"\n")
cat("class of vars young_boy is ",class(young_boy),"\n")
print(single_parent)

2 run based on RStudio IDE


P a g e | 16

Chapter IV
R – Fundamental Data Structures
4 Chapter IV R – Fundamental Data Structures
https://www.datamentor.io/r-programming/matrix/
In every programming language, we must utilize a variety of variables to store a variety of
information, or data structure, or objetc’s data while programming. Variables are only
reserved memory spaces for the storage of values. This implies that we must set aside some
memory when we create a variable.
There are many other data types that we might want to save information for, including
character, wide character, integer, floating point, double floating point, Boolean, etc. The
operating system allots memory and determines what can be kept in the reserved memory
based on the data type of a variable.
R does not designate the variables as any particular data type, unlike other programming
languages like C and java. With the use of R-Objects, the variables are assigned, and the R-
data Object's type becomes the variables' data type. In R Programming language, the data
object’s type can be categorized as following:

• Vectors
• Lists
• Matrices
• Arrays
• Factors
• Data Frames

The vector object is the most basic of these objects, and it has six data types, commonly known
as six classes of vectors. The atomic vectors serve as the foundation for the rest of the R-
Objects.
The most fundamental data types in R programming are R-objects called vectors, which carry
items of various classes, as seen above. Please keep in mind that the number of classes in R is
not limited to the six categories listed above. For example, we may combine many atomic
vectors to form an array whose class is array.

Object Modes Possible several modes


vector numeric, character, complex or logical No
list numeric, character, complex, logical Yes
function, expression
matrix numeric, character, complex or logical No
array numeric, character, complex or logical No
factor numeric or character No
data frame numeric, character, complex or logical Yes
ts numeric, character, complex or logical No
Source: Emmanuel Paradis, 2005, R for Beginners Emmanuel Paradis,

In the most common meaning, a vector is a variable. Categorical variable is the factor. Arrays
are k-dimensional tables, often known as matrices. Arrays with k = 2 have a special situation.
P a g e | 17

The array's elements or matrices are all of the same mode. The data frame is a composite table
that contains one or more vectors and/or factors that are all the same length but may have
distinct modes. Because 'ts' is a time series record, it includes Attributes such as frequency and
date.
After all, a list can include any sort of object, including another list! The modes and lengths of
vectors are adequate to explain the data. Other information is necessary for Other objects,
which is provided through a non-specific property. We can mention dim as one of these
characteristics. this is correct.

4.1 Vectors
A vector is the most common data structure in R. It is a sequence of elements of the same basic
type. The vector() function can be used to create a vector. The default mode is logical, but we
can use constructors such as character(), numeric(), etc., to create a vector of a specific type.
Elements of a vector can be accessed using vector indexing as shown in example 2. The vector
used for indexing can be logical, integer or character vector.

Example in R Codes Comments


student_ID <- c(1011,1015,1020,1023,1036,1040,1048, 1052)
student_active_status <- c(TRUE,FALSE, TRUE,TRUE,FALSE,TRUE,FALSE, TRUE)
student_city <- c("Yogya", "Yogya", "Solo", "Jakarta", "Surabaya", "Jakarta", "Bandung",
"Semarang")

print(student_ID)
1 print(student_active_status)
print(student_city)

cat("type", class(student_ID),"\n")
cat("type", class(student_active_status),"\n")
cat("type", class(student_city),"\n")

#output

score_project <- c(91.2, 82.1, 71.9, 53.5, 90.2); print(score_project)


#define sequence data
2 vector_test1 <- 80:95; print(vector_test1)
#define with steps=0,4
vector_test2 <- seq(80, 84, by=0.4); print(vector_test2)

#output

vector_biodata <- c("name"="Andini", "age"=19, "speaks"=c("English","Arab"), 100) #Define tag-tag of data


print(vector_biodata)
print(vector_biodata[c(1:2)]) #access data variable by indexing
3 print(vector_biodata[-2]) #access data without index 2
print(vector_biodata[c(F, F, T, T, T)]) #show/access data by index TRUE
print(vector_biodata[c("name", "speaks1", "age")]) #access data variable by tag indexing
#output
P a g e | 18

4.2 Lists
A list is an R-object which can contain many different types of elements inside it like vectors,
functions and even another list inside it. In 2nd example: product name, Rate, Available Stock,
and in order number are called tags which makes it easier to reference the components of the
list. However, tags are optional. We can create the same list without the tags. In such scenario,
numeric indices are used by default.

Example in R Codes Output


student_accounting <- list("Andini Maharani", "Dian Larasati",
"Melani Susiana")
student_management <- list("Riani Sastra", "Himawan Putra",
"Yudi Mahendra ", "Sarah Marranti")

1 str(student_accounting)
str(student_management)
cat("Manajemen students :");
str(student_management[1])
str(student_management[2])

product1 <- list("product name" = "Lemon", "Rate" = 3.5,


"Available Stock" = TRUE, "in order number" = 10:3)
product2 <- list("product name" = "Milk", "Rate" = 3.2,
"Available Stock" = TRUE, "in order number" = 3:10)
2 str(product1)
str(product2)
str(product2[1]); str(product2[4])

book_data1 <- list("title" = "Accounting I", "author"=


"Indra Swadika", "year"= 2020, "page"=140, "price"=265.5,
"publishiner"= "UGM Press", "available stcok"=TRUE)
book_data2 <- list("title" = "Management Scinece",
"author"= "Judica Hadi", "year"= 2018, "page"=312, "price"=538.2,
"publishiner"= "UGM Press", "available stcok"=TRUE)
book_data3 <- list("title" ="Management I", "Author"="Julia Stone",
"year" = 2021, "page"= 205, "price"=465.5, "UGM Press",
3 "Available stokc"=FALSE)
book_economy_list <- c(book_data1,book_data2)

str(book_economy_list)
str(book_economy_list[1])
str(book_economy_list[2])
str(book_economy_list[8])
str(book_economy_list[9])
P a g e | 19

4.3 Matrices
A matrix is a two-dimensional rectangular data set. It can be created using a vector input to
the matrix function. A matrix is a 2-dimensional table of like elements. Matrix elements can
be either numeric, character, or logical. An array is the generalization of matrices to 3 or more
dimensions (commonly known as stratified tables).
Most of you should be familiar with matrices from mathematics, see examples Matrices A 3x4;
B 4x4; and C 2x2. In general, each element of a X matrice can be conseidered as rows and colomns,
and write as xij show in matrice X.

10 12 15 18
1 0 0 1 40 43 44 45 x11 x12 x13 x14
0.5 0.8
A(x) = (0 cos x − sin x 1) B = 60 62 64 66 C=( ) 𝑿 = (x21 x22 x23 x24 )
0.5 0.1 x31 x32 x33 x34
0 sin x cos x 1 30 34 36 38
(50 51 52 53)

Matrices can be created using the matrix() function. According to the R documentation the
usage of the matrix().
Var_data  matrix(data = NA, nrow = 1, ncol = 1, byrow = FALSE, dimnames = NULL)

• data: a data vector (default NA)


• nrow: desired number of rows (first dimension, ‘top down’; default 1).
• ncol: desired number of columns (second dimension, ‘left to right’; default 1).
• byrow: logical, whether or not to fill by row (default FALSE; fill by column).
• dimnames: optional list of length 2 with row names and column names (default NULL).

Comment
Example in R Codes
s
M1 = matrix( c(10, 12, 15, 18, 40, 43, 44, 45), nrow = 2, ncol = 4, byrow = TRUE)
M2 = matrix( c(100, 100, 100, 100, 200, 200, 200, 200), nrow = 2, ncol = 4, byrow = TRUE)
M3 = M1 + M2
M4 = as.vector(M3)
M5 <- matrix(data = 20:9, nrow = 2, byrow = TRUE)
M6 <- matrix(data = 20:9, nrow = 2, byrow = FALSE)
1
print(M1); print(dim(M1)); print(nrow(M1)); print(ncol(M1))
print(M2); print(length(M2)); print(M3); print(M4);

print(M5); print(M6);

Matrices (as vectors) can only contain data of one type. We can create numeric matrices,
integer matrices, character matrices, and logical matrices by adding the corresponding values
in the data argument when creating a matrix. In Example 2, show matrixes as vectors on
different types (double, integer, character, and logical).
P a g e | 20

(1) Applications of R Matrices


In geology, matrices are used for taking surveys and also used for plotting graphs, statistics,
and studies in different fields.

• To represent the real world data is like traits of people’s population. They are the
best representation method for plotting common survey things.
• In robotics and automation, matrices are the best elements for the robot movements.
• Matrices are used in calculating the gross domestic products in Economics.
Therefore, it helps in calculating the efficiency of goods and products.
• In computer-based application, matrices play a vital role in the projection of three-
dimensional image into a two-dimensional screen, creating a realistic seeming
motion.
• In physical related applications, matrices can be applied in the study of an electrical
circuit.

4.4 Arrays
While matrices are confined to two dimensions, arrays can be of any number of dimensions.
The array function takes a dim attribute which creates the required number of dimension. In
the below example we create an array with two elements which are 3x3 matrices each.
# Create an array.
a <- array(c('green','yellow'),dim = c(3,3,2))
print(a)

When we execute the above code, it produces the following result −


,,1

[,1] [,2] [,3]


[1,] "green" "yellow" "green"
[2,] "yellow" "green" "yellow"
[3,] "green" "yellow" "green"

,,2

[,1] [,2] [,3]


[1,] "yellow" "green" "yellow"
[2,] "green" "yellow" "green"
[3,] "yellow" "green" "yellow"

4.5 Factors
Factors are the r-objects which are created using a vector. It stores the vector along with the
distinct values of the elements in the vector as labels. The labels are always character
irrespective of whether it is numeric or character or Boolean etc. in the input vector. They are
useful in statistical modeling.
Factors are created using the factor() function. The nlevels functions gives the count of levels.
P a g e | 21

# Create a vector.
apple_colors <- c('green','green','yellow','red','red','red','green')

# Create a factor object.


factor_apple <- factor(apple_colors)

# Print the factor.


print(factor_apple)
print(nlevels(factor_apple))
When we execute the above code, it produces the following result −
[1] green green yellow red red red green
Levels: green red yellow
[1] 3

4.6 Data Frames


Data frames are tabular data objects. Unlike a matrix in data frame each column can contain
different modes of data. The first column can be numeric while the second column can be
character and third column can be logical. It is a list of vectors of equal length.
Data Frames are created using the data.frame() function.
# Create the data frame.
BMI <- data.frame(
gender = c("Male", "Male","Female"),
height = c(152, 171.5, 165),
weight = c(81,93, 78),
Age = c(42,38,26)
)
print(BMI)
When we execute the above code, it produces the following result −
gender height weight Age
1 Male 152.0 81 42
2 Male 171.5 93 38
3 Female 165.0 78 26
P a g e | 22

4.7 Vectors

A vector is simply a list of items that are of the same type.

To combine the list of items to a vector, use the c() function and separate
the items by a comma.

In the example below, we create a vector variable called fruits, that combine
strings:

4.7.1 Using a For-Loop on a DataFrame


Just in the same way as we did with the above matrix, we can loop
through a dataframe, which is also a 2-dimensional data structure:
super_sleepers <- data.frame(rating=1:4,
animal=c('koala', 'hedgehog', 'sloth', 'panda'),
country=c('Australia', 'Italy', 'Peru', 'China'),
avg_sleep_hours=c(21, 18, 17, 10))

print(super_sleepers)
P a g e | 23

Matrix is a rectangular arrangement of numbers in rows and columns. In a matrix,


as we know rows are the ones that run horizontally and columns are the ones that
run vertically. In R programming, matrices are two-dimensional, homogeneous data
structures. These are some examples of matrices:

4.7.2 Syntax
The basic syntax for creating a matrix in R is −
matrix(data, nrow, ncol, byrow, dimnames)
Following is the description of the parameters used −
• data is the input vector which becomes the data elements of the matrix.
• nrow is the number of rows to be created.
• ncol is the number of columns to be created.
• byrow is a logical clue. If TRUE then the input vector elements are
arranged by row.
• dimname is the names assigned to the rows and columns.
4.7.3 Example

#matrixes & Tabel

#W4-00
writeLines("\n# Experiment w4-00")

# Elements are arranged sequentially by row.


M <- matrix(c(3:14), nrow = 4, byrow = TRUE)
print(M)

# Elements are arranged sequentially by column.


N <- matrix(c(3:14), nrow = 4, byrow = FALSE)
print(N)

# Define the column and row names.


rownames = c("row1", "row2", "row3", "row4")
colnames = c("col1", "col2", "col3")

P <- matrix(c(3:14), nrow = 4, byrow = TRUE, dimnames = list(rownames,


colnames))
print(P)
P a g e | 24

# Create a 3x3 matrix


A = matrix(
c(1, 2, 3, 4, 5, 6, 7, 8, 9),
nrow = 3,
ncol = 3,
byrow = TRUE
)
cat("The 3x3 matrix:\n")
print(A)

cat("Dimension of the matrix:\n")


print(dim(A))

cat("Number of rows:\n")
print(nrow(A))

cat("Number of columns:\n")
print(ncol(A))

cat("Number of elements:\n")
print(length(A))
# OR
print(prod(dim(A)))

# R program to illustrate
# access submatrices in a matrix

# Create a 3x3 matrix


A = matrix(
c(1, 2, 3, 4, 5, 6, 7, 8, 9),
nrow = 3,
ncol = 3,
byrow = TRUE
)
cat("The 3x3 matrix:\n")
print(A)

cat("Accessing the first three rows and the first two columns\n")
print(A[1:3, 1:2])

# Create a 3x3 matrix


A = matrix(
c(1, 2, 3, 4, 5, 6, 7, 8, 9),
nrow = 3,
ncol = 3,
byrow = TRUE
P a g e | 25

)
cat("The 3x3 matrix:\n")
print(A)

# Editing the 3rd rows and 3rd column element


# from 9 to 30
# by direct assignments
A[3, 3] = 30

cat("After edited the matrix\n")


print(A)

# R program to illustrate
# concatenation of a row in metrics

# Create a 3x3 matrix


A = matrix(
c(1, 2, 3, 4, 5, 6, 7, 8, 9),
nrow = 3,
ncol = 3,
byrow = TRUE
)
cat("The 3x3 matrix:\n")
print(A)

# Creating another 1x3 matrix


B = matrix(
c(10, 11, 12),
nrow = 1,
ncol = 3
)
cat("The 1x3 matrix:\n")
print(B)

# Add a new row using rbind()


C = rbind(A, B)

cat("After concatenation of a row:\n")


print(C)

# R program to illustrate
# column deletion in metrics

# Create a 3x3 matrix


A = matrix(
P a g e | 26

c(1, 2, 3, 4, 5, 6, 7, 8, 9),
nrow = 3,
ncol = 3,
byrow = TRUE
)
cat("Before deleting the 2nd column\n")
print(A)

# 2nd-row deletion
A = A[, -2]

cat("After deleted the 2nd column\n")


print(A)
P a g e | 27

Chapter V
Operators
5 Chapter V Operators

5.1 Arithmetic Operators


Following table shows the arithmetic operators supported by R language. The operators act
on each element of the vector.
Operator Description Example
+ Adds two vectors Live Demo
1 v <- c( 2,5.5,6)
2 t <- c(8, 3, 4)
3 print(v+t)
it produces the following result −
[1] 10.0 8.5 10.0
− Subtracts second vector from the Live Demo
first 1 v <- c( 2,5.5,6)
2 t <- c(8, 3, 4)
3 print(v-t)
it produces the following result −
[1] -6.0 2.5 2.0

* Multiplies both vectors Live Demo


1 v <- c( 2,5.5,6)
2 t <- c(8, 3, 4)
3 print(v*t)
it produces the following result −
[1] 16.0 16.5 24.0
/ Divide the first vector with the Live Demo
second 1 v <- c( 2,5.5,6)
2 t <- c(8, 3, 4)
3 print(v/t)
When we execute the above code, it produces the
following result −
[1] 0.250000 1.833333 1.500000

%% Give the remainder of the first Live Demo


vector with the second 1 v <- c( 2,5.5,6)
2 t <- c(8, 3, 4)
3 print(v%%t)
it produces the following result −
[1] 2.0 2.5 2.0
%/% The result of division of first vector Live Demo
with second (quotient) 1 v <- c( 2,5.5,6)
2 t <- c(8, 3, 4)
3 print(v%/%t)
it produces the following result −
[1] 0 1 1

^ The first vector raised to the Live Demo


exponent of second vector 1 v <- c( 2,5.5,6)
2 t <- c(8, 3, 4)
P a g e | 28

Operator Description Example


3 print(v^t)
it produces the following result −
[1] 256.000 166.375 1296.000

5.2 Relational Operators


Following table shows the relational operators supported by R language. Each element of the
first vector is compared with the corresponding element of the second vector. The result of
comparison is a Boolean value.

Operator Description Example


> Checks if each element of the first Live Demo
vector is greater than the 1 v <- c(2,5.5,6,9)
corresponding element of the 2 t <- c(8,2.5,14,9)
second vector. 3 print(v>t)
it produces the following result −
[1] FALSE TRUE FALSE FALSE
< Checks if each element of the first Live Demo
vector is less than the 1 v <- c(2,5.5,6,9)
corresponding element of the 2 t <- c(8,2.5,14,9)
second vector. 3 print(v < t)
it produces the following result −
[1] TRUE FALSE TRUE FALSE
== Checks if each element of the first Live Demo
vector is equal to the 1 v <- c(2,5.5,6,9)
corresponding element of the 2 t <- c(8,2.5,14,9)
second vector. 3 print(v == t)
it produces the following result −
[1] FALSE FALSE FALSE TRUE
<= Checks if each element of the first Live Demo
vector is less than or equal to the 1 v <- c(2,5.5,6,9)
corresponding element of the 2 t <- c(8,2.5,14,9)
second vector. 3 print(v<=t)
it produces the following result −
[1] TRUE FALSE TRUE TRUE
>= Checks if each element of the first Live Demo
vector is greater than or equal to 1 v <- c(2,5.5,6,9)
the corresponding element of the 2 t <- c(8,2.5,14,9)
second vector. 3 print(v>=t)
it produces the following result −
[1] FALSE TRUE FALSE TRUE
!= Checks if each element of the first Live Demo
vector is unequal to the 1 v <- c(2,5.5,6,9)
corresponding element of the 2 t <- c(8,2.5,14,9)
second vector. 3 print(v!=t)
it produces the following result −
[1] TRUE TRUE TRUE FALSE

Source: https://www.tutorialspoint.com/r/r_operators.htm
P a g e | 29

5.3 Logical Operators


Following table shows the logical operators supported by R language. It is applicable only to
vectors of type logical, numeric or complex. All numbers greater than 1 are considered as
logical value TRUE.
Each element of the first vector is compared with the corresponding element of the second
vector. The result of comparison is a Boolean value.

Operator Description Example


& It is called Element-wise Logical Live Demo
AND operator. It combines each 1 v <- c(3,1,TRUE,2+3i)
element of the first vector with the 2 t <- c(4,1,FALSE,2+3i)
corresponding element of the 3 print(v&t)
second vector and gives a output it produces the following result −
TRUE if both the elements are [1] TRUE TRUE FALSE TRUE
TRUE.
| It is called Element-wise Logical Live Demo
OR operator. It combines each 1 v <- c(3,0,TRUE,2+2i)
element of the first vector with the 2 t <- c(4,0,FALSE,2+3i)
corresponding element of the 3 print(v|t)
second vector and gives a output it produces the following result −
TRUE if one the elements is TRUE. [1] TRUE FALSE TRUE TRUE
! It is called Logical NOT operator. Live Demo
Takes each element of the vector 1 v <- c(3,0,TRUE,2+2i)
and gives the opposite logical 2 print(!v)
value. it produces the following result −
[1] FALSE TRUE FALSE FALSE

&& Called Logical AND operator. Live Demo


Takes first element of both the 1 v <- c(3,0,TRUE,2+2i)
vectors and gives the TRUE only if 2 t <- c(1,3,TRUE,2+3i)
both are TRUE. 3 print(v&&t)
it produces the following result −
[1] TRUE
|| Called Logical OR operator. Takes Live Demo
first element of both the vectors 1 v <- c(0,0,TRUE,2+2i)
and gives the TRUE if one of them 2 t <- c(0,3,TRUE,2+3i)
is TRUE. 3 print(v||t)
it produces the following result −
[1] FALSE

5.4 Assignment Operators


These operators are used to assign values to vectors.

Operator Description Example


<− Called Left Assignment Live Demo
or 1 v1 <- c(3,1,TRUE,2+3i)
= 2 v2 <<- c(3,1,TRUE,2+3i)
or 3 v3 = c(3,1,TRUE,2+3i)
<<− 4 print(v1)
5 print(v2)
6 print(v3)
P a g e | 30

it produces the following result −


[1] 3+0i 1+0i 1+0i 2+3i
[1] 3+0i 1+0i 1+0i 2+3i
[1] 3+0i 1+0i 1+0i 2+3i
-> Called Right Assignment Live Demo
or 1 c(3,1,TRUE,2+3i) -> v1
->> 2 c(3,1,TRUE,2+3i) ->> v2
3 print(v1)
4 print(v2)
it produces the following result −
[1] 3+0i 1+0i 1+0i 2+3i
[1] 3+0i 1+0i 1+0i 2+3i

5.5 Miscellaneous Operators


These operators are used to for specific purpose and not general mathematical or
logical computation.

Operator Description Example


: Live Demo
Colon operator. It creates the 1 v <- 2:8
series of numbers in sequence for 2 print(v)
a vector. it produces the following result −
[1] 2 3 4 5 6 7 8
%in% Live Demo
1 v1 <- 8
2 v2 <- 12
3 t <- 1:10
This operator is used to identify if 4 print(v1 %in% t)
an element belongs to a vector. 5 print(v2 %in% t)
it produces the following result −
[1] TRUE
[1] FALSE

%*% Live Demo


1 M = matrix( c(2,6,5,1,10,4), nrow = 2,ncol = 3,byrow
= TRUE)
2 t = M %*% t(M)
This operator is used to multiply a 3 print(t)
matrix with its transpose. it produces the following result −
[,1] [,2]
[1,] 65 82
[2,] 82 117
P a g e | 31

Chapter V
R – Loopings & Control Structures
6 R - Loops

6.1 R – Looping structure


Programming languages provide various control structures that allow for more complicated
execution paths.
A loop statement allows us to execute a statement or group of statements multiple times and
the following is the general form of a loop statement in most of the programming languages.

(1) for Loops in R


For loops are pretty much the only looping construct that you will need in R. While you may
occasionally find a need for other types of loops, in my experience doing data analysis, I’ve
found very few situations where a for loop wasn’t sufficient.
In R, for loops take an interator variable and assign it successive values from a sequence or
vector. For loops are most commonly used for iterating over the elements of an object (list,
vector, etc.)

R Syntax & R Example Comments/outputs


#element of vectors
for (element in vectors) {
commands/block of code
:
}
list_days <- c("Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat")
list_months <- list ("JKan", "Feb", "Mar", "Apr", "May", "Jun", "Jul",
"Aug", "Sep", "Oct", "Nov", "Dec")

for (aDay in list_days) {


print(aDay)
}
1 for (aMonth in list_months) {
print(aMonth)
}
cat("\n\n")
for( grade in grading_acconting ) {
cat (grade, " ")
}
P a g e | 32

students_score_test1 <- c(79, 82, 84, 91, 83, 88)


students_score_test2 <- c(87, 80, 85, 90, 95, 76)

for (i in 1:6) {
2
cat("\n",i, students_score_test1[i], students_score_test2[i],
students_score_test1[i]+students_score_test2[i])
}

randomdata1 <- rnorm(30) #Create a vector filled with random normal values

tot <- 0
for (i in 1:length(randomdata1)) {
cat("\n", format(i, width=2, justify = "right"),
3 format(randomdata1[i], width = 8, justify = "right", digits =2))
tot <- tot + randomdata1[i]
}
cat("\ntotal =", tot)

(2) Next Statement


A next statement is one of the control statements in R programming that is used to skip the
current iteration of a loop without terminating the loop. Whenever a next statement is
encountered, further evaluation of the code is skipped and the next iteration of the loop starts.

R Syntax & Code Comments & ouput


for (value in vector) {
commands/block of code
:
if (condition is TRUE) {
next
}
}

(3) while Loops in R


A while loop in R is a close cousin of the for loop in R. However, a while loop will check a
logical condition, and keep running the loop as long as the condition is true. If the condition
in the while loop in R is always true, the while loop will be an infinite loop, and our program
will never stop running. This is something we definitely want to avoid! When writing a while
loop in R, we want to ensure that at some point the condition will be false so the loop can stop
running.
The while loops struture play a major role in heavy analytical tasks like simulation and
optimization. Optimization is the act of looking for a set of parameters that either maximize
or minimize some goal.

R Syntax Example R Code


while (Boolean_expression) {
commands/block of code
:
:
}
P a g e | 33

(4) Repeat Loop in R


There may be a situation when you need to execute a block of code several number of times.
In general, statements are executed sequentially. The first statement in a function is executed
first, followed by the second, and so on. But after some points, the loop of block command
must be exit/break from loop. The basic syntax for creating a repeat loop in R is by using
break statements.

R Syntax Example R Code


repeat {
commands/block of code
:
if (condition) {
commands/block of code
break
}
}

6.2 R – Control Structure


Decision structures, or Control structures require the programmer to specify one or more
conditions to be evaluated or tested by the program, along with a statement or statements to
be executed if the condition is determined to be true, and optionally, other statements to be
executed if the condition is determined to be false.
Following is the general form of a typical decision making structure found in most of the
programming languages

(1) if- Statement


It is one of the control statements in R programming that consists of a Boolean expression and
a set of statements. If the Boolean expression evaluates to TRUE, the set of statements is
executed. If the Boolean expression evaluates to FALSE, the statements after the end of the If
statement are executed. The basic syntax for the If statement is given below:

R Syntax Example R Code


if (Boolean_expression is TRUE) {
Commands/block of code
:
}

(2) if-else Statement


While an If statement is followed by an else statement, which contains a block of code to be
executed when the Boolean expression in the If the statement evaluates to FALSE. The basic
syntax of it is given below:
P a g e | 34

R Syntax Example R Code


if (Boolean_expression is TRUE) {
Commands/block of code 1
:
} else {
Commands/block of code 2
:
}

(3) multiple if-else Statementi


An else if statement is included between if and else statements. Multiple Else-If statements
can be included after an If statement. Once an If a statement or an Else if statement evaluates
to TRUE, none of the remaining else if or Else statement will be evaluated.
The basic syntax of it is given below:

R Syntax Example R Code


if (Boolean_expression1) {
Commands/block of code 1
:
} else if (Boolean_expression2) {
Commands/block of code 2
:
} else if (Boolean_expression3) {
Commands/block of code 3
:
} else {
Commands/block of code 3
:
}

(4) Switch Statement


The switch statement is one of the control statements in R programming which is used to
equate a variable against a set of values. Each value is called a case.
The basic syntax for a switch statement is as follows:

R Syntax Example R Code


switch(expression,
case1, Commands/block of code 1
case2, Commands/block of code 2
case3, Commands/block of code 3
}

AD
P a g e | 35

https://www.geeksforgeeks.org/r-programming-for-data-science/

Most common Data Science in R Libraries


• Dplyr: For performing data wrangling and data analysis, we use the dplyr
package. We use this package for facilitating various functions for the
Data frame in R. Dplyr is actually built around these 5 functions. You can
work with local data frames as well as with remote database tables. You
might need to:
Select certain columns of data.
Filter your data to select specific rows.
Arrange the rows of your data into order.
Mutate your data frame to contain new columns.
Summarize chunks of your data in some way.
• Ggplot2: R is most famous for its visualization library ggplot2. It provides
an aesthetic set of graphics that are also interactive.The ggplot2 library
implements a “grammar of graphics” (Wilkinson, 2005). This approach
gives us a coherent way to produce visualizations by expressing
relationships between the attributes of data and their graphical
representation.
• Esquisse: This package has brought the most important feature of
Tableau to R. Just drag and drop, and get your visualization done in
minutes. This is actually an enhancement to ggplot2.It allows us to draw
bar graphs, curves, scatter plots, histograms, then export the graph or
retrieve the code generating the graph.
• Tidyr: Tidyr is a package that we use for tidying or cleaning the data. We
consider this data to be tidy when each variable represents a column and
each row represents an observation.
• Shiny: This is a very well known package in R. When you want to share
your stuff with people around you and make it easier for them to know
and explore it visually, you can use shiny. It’s a Data Scientist’s best
friend.
• Caret: Caret stands for classification and regression training. Using this
function, you can model complex regression and classification problems.
• E1071: This package has wide use for implementing clustering, Fourier
Transform, Naive Bayes, SVM and other types of miscellaneous functions.
• Mlr: This package is absolutely incredible in performing machine learning
tasks. It almost has all the important and useful algorithms for
performing machine learning tasks. It can also be termed as the
extensible framework for classification, regression, clustering, multi-
classification and survival analysis.

6.3 Other worth mentioning R libraries:


• Lubridate
• Knitr
• DT(DataTables)
P a g e | 36

• RCrawler
• Leaflet
• Janitor
• Plotly

6.4 Applications of R for Data Science


Top Companies that use R for Data Science:
• Google: At Google, R is a popular choice for performing many analytical
operations. The Google Flu Trends project makes use of R to analyze
trends and patterns in searches associated with flu.
• Facebook Facebook makes heavy use of R for social network analytics. It
uses R for gaining insights about the behavior of the users and establishes
relationships between them.
• IBM: IBM is one of the major investors in R. It recently joined the R
consortium. IBM also utilizes R for developing various analytical
solutions. It has used R in IBM Watson – an open computing platform.
• Uber: Uber makes use of the R package shiny for accessing its charting
components. Shiny is an interactive web application that’s built with R for
embedding interactive visual graphics.

You might also like