R Script
R Script
1
Table of Contents
1 R and RStudio ................................................................................................................................. 4
1.1 R vs RStudio............................................................................................................................. 5
1.2 Introduction to R ..................................................................................................................... 5
1.2.1 How can you use R?........................................................................................................ 5
1.2.2 Getting Started with R..................................................................................................... 5
1.3 Advantages of R ...................................................................................................................... 6
2 Installation of R Packages ............................................................................................................... 7
2.1 R Packages............................................................................................................................... 7
2.2 Command ................................................................................................................................ 7
2.3 File Import in R ........................................................................................................................ 7
3 Essentials of the R Language .......................................................................................................... 8
3.1 File R1(calculation) .................................................................................................................. 8
3.2 File R2(vectors) ....................................................................................................................... 9
3.3 File R3(matrices) ..................................................................................................................... 9
3.4 File R4(Arrays and Lists) ........................................................................................................ 10
3.4.1 Arrays ............................................................................................................................ 10
3.4.2 Lists ............................................................................................................................... 10
3.5 File R5(Loops) ........................................................................................................................ 10
3.6 File R6(Factors and Data Frame) ........................................................................................... 11
3.6.1 Factors ........................................................................................................................... 11
3.6.2 Data Frame .................................................................................................................... 11
3.7 File R7(Conditional and Control Flows)................................................................................. 11
3.8 File R8(importing from excel) ............................................................................................... 13
3.9 File R9(text) ........................................................................................................................... 13
3.10 File R10(Apply Function: A Versatile Tool for Data Manipulation) ....................................... 13
3.10.1 apply () .......................................................................................................................... 13
3.10.2 lapply() .......................................................................................................................... 14
3.10.3 sapply() ......................................................................................................................... 14
3.10.4 vapply() ......................................................................................................................... 14
3.10.5 tapply() .......................................................................................................................... 14
4 Data Visualisation using R............................................................................................................ 15
4.1 R11(Histograms).................................................................................................................... 15
4.2 R12(Boxplot) ......................................................................................................................... 15
4.3 R13(Line plot) ........................................................................................................................ 15
4.3.1 Line Plot with single series of data ............................................................................... 15
2
4.3.2 Line Plot with multiple series of data ........................................................................... 16
4.4 R14(Scatter Plots).................................................................................................................. 16
4.5 R15(Bar Chart)....................................................................................................................... 17
5 Descriptive Statistics Using R....................................................................................................... 18
5.1 Summary (Descriptive Statistics 1) ....................................................................................... 18
5.2 Measure of Central Tendency (Descriptive Statistics 2) ....................................................... 18
5.2.1 Arithmetic Mean, Median and Mode ............................................................................ 18
5.2.2 Mode using function from the data frame ..................................................................... 19
5.3 Measure of Dispersion(Descriptive Statistics 3) ................................................................... 19
5.3.1 Range ............................................................................................................................ 19
5.3.2 Variance ........................................................................................................................ 19
5.3.3 Standard Deviation........................................................................................................ 19
5.4 Datasheet (Descriptive Statistics 4) ...................................................................................... 19
5.4.1 Tribbles in R: A Concise Data Frame Creation............................................................... 19
5.5 Student data case (Descriptive Statistics5) ........................................................................... 20
6 Relationship between two variables .............................................................................................. 20
6.1 Covariance (Descriptive Statistics6) ...................................................................................... 20
6.2 Correlation (Descriptive Statistics7) ..................................................................................... 20
6.3 Coefficient of Determination (Descriptive Statistics8) ......................................................... 21
7 Citation.......................................................................................................................................... 22
8 Regression using R ....................................................................................................................... 23
8.1 Simple Regression using R..................................................................................................... 23
8.2 Multiple Regression using R .................................................................................................. 23
8.2.1 Case1 ............................................................................................................................. 23
8.2.2 Case2 ............................................................................................................................. 23
3
1 R and RStudio
R's open-source nature, extensive statistical capabilities, powerful data visualisation tools, and a vibrant
community make it a compelling choice for data scientists, statisticians, and researchers.
• R is a programming language for statistical computing and graphics.
• R is the successor language of the ‘S’ language.
• The name of this language ‘R’ has been derived from the first alphabet of its developers’ names,
viz., Robert Gentleman and Ross Ihaka.
• R provides many graphical and statistical tools such as linear, and nonlinear modelling,
classification, classical statistical tests, and clustering etc.
• It runs on various UNIX platforms and other systems including Windows and MacOS.
• R offers integrated software facilities including operators for calculations, intermediate tools
for data manipulation, data visualisation, data storage and handling facilities.
• R is a fully developed language. It consists of various loops, conditionals, and other output-
input functions. For this reason, it is popularly known as ‘R Environment.’
RStudio
• RStudio is an integrated development environment (IDE).
• It is particularly designed to work with the R programming language.
• RStudio can be broadly divided into 4 panes:
Console Plots
4
1.1 R vs RStudio
Basis R RStudio
Elaborative process R is the core engine for RStudio is more elaborative in nature as
performing data analysis and it provides a more user- friendly
computations. However, it is environment for working with R.
less elaborative than RStudio.
1.2 Introduction to R
Imagine R as a powerful Swiss Army knife for data analysis. It's a programming language and software
environment designed specifically for statistical computing and data visualisation. Think of it as a tool
that allows you to explore, manipulate, and extract insights from data, no matter how complex or messy
it might be.
5
3. Learn the Basics: Start with basic R syntax, data structures (vectors, matrices, data
frames), and fundamental statistical functions.
4. Explore Packages: Discover and install packages that cater to your specific needs, such as
tidyverse for data manipulation and visualisation, caret for machine learning, and ggplot2
for advanced plotting.
1.3 Advantages of R
R has become a cornerstone for data analysis and statistical computing due to its numerous advantages:
1. Open-Source and Free:
➢Cost-Effective: No licensing fees, making it accessible to everyone.
➢Community-Driven: A large and active community contributes to its development and
provides extensive support.
2. Comprehensive Statistical Capabilities:
➢ Statistical Tests: R offers a wide range of statistical tests for hypothesis testing, regression
analysis, and more.
➢ Machine Learning: Powerful machine-learning algorithms for classification, regression,
clustering, and predictive modelling.
3. Data Visualisation:
➢ High-Quality Graphics: Create stunning visualisations with packages like ggplot2, lattice,
and plotly.
➢ Customisation: Tailor plots to specific needs, including interactive visualisations.
4. Flexibility and Extensibility:
➢ Package Ecosystem: A vast collection of packages (over 18,000) for various statistical and
data analysis tasks.
➢ Custom Function Creation: Develop custom functions to tailor the analysis to specific
requirements.
5. Reproducible Research:
➢ Version Control: Track changes and ensure reproducibility.
➢ R Markdown: Create dynamic documents combining code, output, and narrative text.
6. Platform Independence:
➢ Cross-Platform Compatibility: Runs on Windows, macOS, and Linux.
7. Strong Community Support:
➢ Active Forums and Communities: Online resources for help and collaboration.
➢ Tutorials and Documentation: Extensive documentation and tutorials available.
8. Integration with Other Tools:
➢ Interoperability: Seamlessly integrates with other tools like Python, SQL, and Hadoop.
9. Data Wrangling and Manipulation:
➢ Powerful Data Manipulation: Efficiently clean, transform, and reshape data with
packages like dplyr and tidyr.
10. Big Data Analysis:
➢ Scalability: Handles large datasets with packages like sparklyr and bigr.
6
2 Installation of R Packages
2.1 R Packages
➢ base
➢ readxl
➢ readr
➢ dplyr
➢ tidyr
➢ tibble
➢ tidyverse
➢ ggplot2
➢ lmtest
➢ car
➢ graphics
➢ stats
2.2 Command
➢ install.packages("tidyverse")
➢ library(tidyverse)
7
3 Essentials of the R Language
3.1 File R1(calculation)
#Calculation with R
10+17
10*15
140/7
100-6
2^5
2+2-2/2
3*5/6
3/5*6
1:20
20:1
seq(1,100, by=5)
seq(1,100, by=3)
seq(1,100, 5)
seq(1,100, 3)
rep(7,10)
# arithmetic function in R
abs(-15)
exp(1)
log(exp(1))
log(10)
log10(10)
log10(exp(1))
log(16,4)
# create a variable
x <- -100
x + 70
abs (x)
u <- 19
v <- 11
u+v
sum(u, v)
Remember:
8
Order of Operations: R follows the standard order of operations (PEMDAS/BODMAS):
Parentheses, Exponents, Multiplication and Division (from left to right), Addition and Subtraction
(from left to right).
9
D <- matrix(c(10,40,70,20,50,80,30,60,90), nrow=3)
D
2*D-5
E <- matrix(c(5,6,7,4,8,9,10,6,11),nrow=3)
E
3*E+5
D+E
D-E
DE<-D%*%E
DE
10
i <- i + 1
if (i > 15) {
break
}
11
print("z is greater than 5")
} else {
print("z is less than or equal to 5")
}
#Break
for (i in 1:10) {
if (i == 5) {
break
}
print(i)
}
#next
for (j in 1:10) {
if (j %% 2 == 0) {
next
}
print(j)
}
12
3.8 File R8(importing from excel)
#importing from excel
library(readxl)
CreditLimit <- read_excel("C:/Users/ADMIN/OneDrive/Desktop/R/R Data/CreditLimit.xlsx")
View(CreditLimit)
Remember:
File Import in R
File Format Package Required
As Text .txt base
As CSV .csv readr
AS EXCEL .xlsx / .xls readxl
As SPSS, SAS, STATA .sav/ .sas/.dta haven
The apply family of functions in R provides efficient ways to apply a function to elements of an array
or list. These functions can significantly streamline your data analysis tasks.
3.10.1 apply ()
Purpose: Applies a function to the margins of an array.
Syntax: apply(X, MARGIN, FUN, ...)
13
• X: The array to which the function is applied.
• MARGIN: A vector specifying the margins (rows, columns, etc.) to apply the
function to.
• FUN: The function to be applied.
• ...: Additional arguments to be passed to the function.
Example:
# Create a matrix
my_matrix <- matrix(1:9, nrow = 3, ncol = 3)
# Calculate the sum of each row
row_sums <- apply(my_matrix, 1, sum)
print(row_sums)
3.10.2 lapply()
Purpose: Applies a function to each element of a list.
Syntax: lapply(X, FUN, ...)
• X: The list to which the function is applied.
• FUN: The function to be applied.
• ...: Additional arguments to be passed to the function.
Example:
# Create a list of numbers
my_list <- list(1, 2, 3, 4, 5)
# Square each element
squared_list <- lapply(my_list, function(x) x^2)
print(squared_list)
3.10.3 sapply()
Purpose: Similar to lapply(), but simplifies the output to a vector or matrix.
Syntax: sapply(X, FUN, ..., simplify = TRUE)
3.10.4 vapply()
Purpose: Like sapply(), but allows you to specify the type of the output.
Syntax: vapply(X, FUN, FUN.VALUE, ..., USE.NAMES = TRUE)
3.10.5 tapply()
Purpose: Applies a function to subsets of a vector, splitting the vector based on factors.
Syntax: tapply(X, INDEX, FUN, ..., simplify = TRUE)
Example:
# Create a vector and a factor
x <- 1:10
f <- factor(rep(c("A", "B"), 5))
# Calculate the mean of x for each level of f
means <- tapply(x, f, mean)
print(means)
14
4 Data Visualisation using R
4.1 R11(Histograms)
• Create a script file by going to File>New File>R Script
• Import Dataset in the File Menu and then choose “From Text(base)”
• Select the data file iris.csv
• Click open
OR
data("iris")
View(iris)
hist(iris$Sepal.Length)
hist(iris$Sepal.Length, col="steelblue")
hist(iris$Sepal.Width, col="red")
hist(iris$Sepal.Width, col="yellow")
#Specify Heading as Main, Label of x and y axis as xlab & ylab, col as the colour
hist(iris$Petal.Length,
main='Histogram',
xlab='Length',
ylab='Frequency',
col='red')
4.2 R12(Boxplot)
• Create a script file by going to File>New File>R Script
• Import Dataset in the File Menu and then choose “From Text(base)”
• Select the data file iris.csv
• Click open
OR
data("iris")
View(iris)
boxplot(iris$Sepal.Length)
#Specify Heading as Main, Label of x and y axis as xlab & ylab, col as the colour of boxplot and bor-
der as a border of boxplot
boxplot(iris$Petal.Length,
main='Petal Length',
xlab='Species',
ylab='Petal Length',
col='pink',
border = 'red')
OR
boxplot(iris$Petal.Length,main='Petal Length', xlab='Species', ylab='Petal Length', col='pink', border
= 'red')
15
plot(l1)
#Plot a vector using points
plot(l1,type = 'p')
#Plot a vector using lines
plot(l1,type = 'l')
#Plot a vector using both points and lines
plot(l1,type = 'o')
#Plot a vector using both points and lines with colour
l2<- c(1,2,8,13,40)
plot(l2,type = 'o', col='red')
#Plot a vector using both points and lines with colour, heading, label of the x & y axis
l3<- c(5,2,11,7,20,15,22,17,25)
plot(l3,type = 'o', col='green', main='Line Plot', xlab='points', ylab='Frequency')
16
plot(iris$Sepal.Length, iris$Sepal.Width,pch=2)
plot(iris$Sepal.Length, iris$Sepal.Width,pch=3)
plot(iris$Sepal.Length, iris$Sepal.Width,pch=4)
plot(iris$Sepal.Length, iris$Sepal.Width,pch=5)
plot(iris$Sepal.Length, iris$Sepal.Width,pch=6)
plot(iris$Sepal.Length, iris$Sepal.Width,pch=7)
plot(iris$Sepal.Length, iris$Sepal.Width,pch=8)
plot(iris$Sepal.Length, iris$Sepal.Width,pch=9)
plot(iris$Sepal.Length, iris$Sepal.Width,pch=10)
plot(iris$Sepal.Length, iris$Sepal.Width,pch=11)
plot(iris$Sepal.Length, iris$Sepal.Width,pch=12)
plot(iris$Sepal.Length, iris$Sepal.Width,pch=13)
plot(iris$Sepal.Length, iris$Sepal.Width,pch=14)
plot(iris$Sepal.Length, iris$Sepal.Width,pch=15)
plot(iris$Sepal.Length, iris$Sepal.Width,pch=16)
plot(iris$Sepal.Length, iris$Sepal.Width,pch=17)
plot(iris$Sepal.Length, iris$Sepal.Width,pch=18)
plot(iris$Sepal.Length, iris$Sepal.Width,pch=19)
plot(iris$Sepal.Length, iris$Sepal.Width,pch=20)
plot(iris$Sepal.Length, iris$Sepal.Width,pch=21)
plot(iris$Sepal.Length, iris$Sepal.Width,pch=22)
plot(iris$Sepal.Length, iris$Sepal.Width,pch=23)
plot(iris$Sepal.Length, iris$Sepal.Width,pch=24)
plot(iris$Sepal.Length, iris$Sepal.Width,pch=25)
#plot scatter diagram in dot shape, red colour, heading Scatter Plot, Label of x & y axes sepal_length
& sepal_width
plot(iris$Sepal.Length, iris$Sepal.Width,col="red", main="Scatter Plot",xlab="sepal_length",
ylab="sepal_width", pch=20)
17
5 Descriptive Statistics Using R
5.1 Summary (Descriptive Statistics 1)
#Summary of a data sheet
data(“iris”)
View(iris)
summary (iris)
• Mean
• Minimum
• Medium
• Quartiles
• Maximum
18
mode_z
19
tribble(~X,~Y,"v",15,"w",5,"x",25,"y",20)
tribble(~A,~B,"m",11:14,"n",2:6,"o",21:25,"p",51:56)
20
cor(x,y)
cor(x,y, method = "pearson")
cor(x,y, method = "kendall")
cor(x,y, method = "spearman")
21
7 Citation
citation()
To cite R in publications, use:
@Manual{,
title = {R: A Language and Environment for Statistical Computing},
author = {{R Core Team}},
organization = {R Foundation for Statistical Computing},
address = {Vienna, Austria},
year = {2024},
url = {https://www.R-project.org/},
}
22
8 Regression using R
8.1 Simple Regression using R
#Perfect Regression Model_1
x1<-c(1,2,3,4,5)
x2<-c(3,5,7,9,11)
reg1<-lm(x2~x1)
summary(reg1)
plot(reg1)
#Perfect Regression Model_2
x3<-c(6,7,8,9,10)
x4<-c(-4,-5,-6,-7,-9)
reg2<-lm(x4~x3)
summary(reg2)
plot(reg2)
#imperfect Regression Model
x5<-c(11,12,13,14,15)
x6<-c(8,11,-5,22,-0)
reg3<-lm(x6~x5)
summary(reg3)
plot(reg3)
23
floor = c(12, 9, 8,7,6,4,3,2,1)
)
# Create a multiple regression model
model <- lm(price ~ sqft + floor, data = house_data)
#Making Predictions
new_data <- data.frame(sqft = 1600, floor = 7)
predicted_price <- predict(model, newdata = new_data)
print(predicted_price)
24
Getting started with R:
Introduction to R, Advantages of R, Installation of R Packages, Importing data from
spreadsheet files, Commands and Syntax, Packages and Libraries.
Data Structures in R:
Vectors, Matrices, Arrays, Lists, Factors, Data Frames, Conditionals and Control Flows,
Loops, Functions, and Apply family.
Descriptive Statistics Using R:
Importing Data file; Data visualisation using charts: histograms, bar charts, box plots, line
graphs, scatter plots. etc.
Data description: Measure of Central Tendency, Measure of Dispersion,
Relationship between variables: Covariance, Correlation and coefficient of determination.
25