0% found this document useful (0 votes)

74 views33 pages

CH 3

Uploaded by

Rashi Mehta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

74 views33 pages

CH 3

Uploaded by

Rashi Mehta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 33

Chapter 3 -

Loading and
Handling Data in
R

to enter random numbers between 1 to 100 . 10 numbers. so we will write

round(runif(10, min = 1, max = 100))

Operator Example

= X=5

<- X<-5

-> 5->x

<<- X<<-2
Expression, Variables and Functions
Operation Operator Description
Addition x+y y added to x
Subtraction x-y y subtracted from x
Expressions
Multiplication x*y x multiplied by y

Division x/y x divided by y

Exponentiation x^y x raised to the power y
x ** y
Modulus x %% y Remainder of (x divided by y)

Integer division x%/%y x divided by y but rounded down

Computing the sqrt(x) Computing the square root of x

Square root
Relational Operator

Operator Description

x>y True if left operand greater than the right

x<y True if left operand less than the right
x == y True if left operand is equal to the right

x != y True if left operand is not equal to the right

>= True if left operand is greater than and equal to the right
<= True if left operand is less than and equal to the right
Expression, Variables and Functions
Logical values
Logical values are TRUE and FALSE or T and F. Note that these are case sensitive.
The equality operator is ==.

Dates
The default format of date is “YYYY-MM-DD”
Print system’s date

Print system’s time

Expression, Variables and Functions

Variables
(i) Assign a value of 50 to the variable by the name, “Var”.

(ii) Print the value in the variable, “Var”.

(iii) Perform arithmetic operations on the variable, “Var”.

atrix(t.test.res$p.value, file="ttest.tsv", sep="\t")

• Vectors- a = c(1,2,4,5,6)
• Sort(), #replace a[1]<-5, #index
Data Type
b=c(“abc”, “ddd”, “ccc”)
a[1:3] #first three position will be printed
• Lists #character a = list(“aa”, 55, 33, ‘bb’) b = list(“da”, 55, 1, ‘bb’)
• #merge list merge(a,b)
• C=list(a,b)
• Arrays – store data in more than 2 parameters
• arr1 = c(11,13,15,16,14)
• arr2 = c(55,54,2,15,12)
• arr3 = array(c(arr1,arr2),dim = c(3,3,5))
• Matrices
• Mtr = matrix(c(arr1,arr2),4,4)
• Factors
• Fact1 = factor(arr1)
• Data Frames
• data <- data.frame(x1 = 1:5, x2 = 2:6, x3 = 1:7)
• data1<-(“aa”, “bb”, “cc”, “dd”)
• Merge
Vectors
• Vectors are stored like arrays in C
• Vector indices begin at 1
• All Vector elements must have the same mode such as integer,
numeric (floating point number), character (string), logical (Boolean),
complex, object etc.

Create a vector of numbers

The c function (c is short for combine) creates a new vector consisting of three
values: 4, 7, and 8.
Vectors
A vector cannot hold values of different data types. Consider the example below.
We are trying to place integer, string and boolean values together in a vector.

Note: All the values are converted to the same data type, i.e. “character”.
Vectors
Accessing the value (s) in the vector
Create a variable by the name, “VariableSeq” and assign to it a vector consisting of
string values.

• Access values in a vector, specify the indices at which the value is present in the
vector. Indices start at 1.
Vectors
Vector math
Matrices
Create a matrix, “mat”, 3 rows high and 4 columns wide using a vector

Access the element present in the 2nd row and 3rd column of the matrix, “mat”.
Matrices
To access the 2nd column of the matrix, simply provide the column number and
omit the row number.

To access the 2nd and 3rd columns of the matrix, simply provide the column
numbers and omit the row number.
List
To create a list, “emp” having three elements, “EmpName”, “EmpUnit”, “EmpSal”.

To get the elements of the list, “emp” use the below command.

Retrieve the names of the elements in the list “emp”.

List
Add an element with the name “EmpDesg” and value “Software Engineer” to the
list, “emp”.

Output:

Delete an element with the name “EmpUnit” and value “IT” from the list, “emp”.
Recursive list
A recursive list means a list within a list.
Let us begin with two lists, “emp” and “emp1”.
The elements in both the lists are as shown below:

Combine both the lists into a single list by the name “EmpList”.
Functions Function Arguments Description
substr(a, start stop) Manipulating Text in Data
∙
∙
a is a character vector The function returns part of the string
Start and stop arguments contain a numeric starting from start argument and ends at
value the stop argument.

strsplit(a, split, …) ∙ a is a character vector The function splits the given text string
∙ Split is also a character vector that contains a into substring.
regular expression for splitting.

paste(…, sep= “”, …) ∙ The dots “…” define R objects The function concatenates string vectors
∙ sep argument is a character string for after converting the objects into strings.
separating objects
∙ paste('a',1:5,sep = ‘ ')

grep(pattern, a) ∙ Pattern argument contains matching pattern The function returns string after
∙ a is a character vector searching for a text pattern into a given
∙ x <- c("d", "a", "c", "abba") text string.
∙ grep("a", x)

∙ grepl("a", x)

toupper(a) ∙ a is a character vector The function converts a string into

uppercase
tolower(a) ∙ a is a character vector The function converts a string into
lowercase.
Exploring a Dataset
Functions Function Arguments Description
names(dataset) Dataset argument contains name of dataset The function displays the variables of the given
dataset.
summary(dataset) Dataset argument contains name of dataset The function displays the summary of the given
dataset.
str(dataset) Dataset argument contains name of dataset The function displays the structure of the given
dataset.
head(dataset, n) Dataset argument contains name of dataset The function displays the top rows according to the
value of n. If value of n is not provided in the function
n is a numeric value to display the number of top rows then by default function displays top 6 rows of the
dataset.
tail(dataset, n) Dataset argument contains the name of a dataset The function displays the top rows according to the
n is a numeric value to display the number of bottom rows value of n. If value of n is not provided in the function
then by default function displays bottom 6 rows of the
dataset.

class(dataset) Dataset argument contains the name of a dataset The function displays the class of the dataset.

dim(dataset) Dataset argument contains the name of a dataset The function returns the dimension of the dataset
which implies the total number of rows and columns
of the dataset.
table(dataset$variablenames) Dataset argument contains name of dataset The function returns the number of categorical value
Variable name contains the name of the variable names after counting it.
Aggregating and Group Processing of a Variable
The syntax of the aggregate() function is as follows:
aggregate(x, …) or

aggregate(x, by, FUN, …)

where,
x is an object; by argument defines the list of group elements of the
particular variable of the dataset; FUN argument is a statistic function
that returns numeric value after given statistic operations; the dots “…”
define the other optional argument.
Create data frame

• data <- data.frame(x1 = 1:5, x2 = 2:6, x3 = 1, group = c("A", "A", "B",

"C", "C"))

aggregate(data[colnames(data) != "group"],by = list(data$group),FUN = mean)

Exercise - Create Data Frame
• A = 10 numbers
• B = 10 numbers random
• C = Marks
• D = Name 2 names for five times

• Find out average by grouping name.

tapply() Function
The tapply() function is also an inbuilt function of R and works similar to the
function aggregate().

tapply (x, …) or

tapply(x, INDEX, FUN, …)

where, x is an object that defines the summary variable; INDEX argument

defines the list of group elements—also called group variable; FUN argument
is a statistic function that returns numeric value after given statistic
operations; the dots “…” define the other optional argument
Create Data Frame
• price = round(rnorm(15, sd = 8, mean = 25)) #rnorm is the R function
that simulates random variates having a specified normal distribution.
• class = sample(1:4, size = 15, replace = TRUE) #random number
• shop1 = sample(paste("Store", 1:4), size = 15, replace = TRUE)

• tapply(price, class, mean) Class ko dekh ke price ka mean niklege

• tapply(price, shop1, mean) shop1 ko dekh ke price ka mean niklege

• class1=factor(class,labels =c("a","b","c","d"))

• tapply(price, class1, mean)

Exercise
Collect 30 samples for the ticket price of movie, average = 200, sd = 20
Define 3 class
5 multiplex

Find out mean ticket price for class type and for multiplex.
Selecting
• Install dplyr and tidyr packages
variables

library(dplyr)

# keep the variables name, height, and gender

newdata <- select(starwars, name, height, gender)

# keep the variables name and all variables

# between mass and species inclusive
newdata <- select(starwars, name, mass:species)

# keep all variables except birth_year and gender

newdata <- select(starwars, -birth_year, -gender)
• library(dplyr)
Selecting observations
• # select females
• newdata <- filter(starwars, gender == "female")

• # select females that are from Alderaan

• newdata <- filter(starwars, gender == "female" & homeworld == "Alderaan")

• # select individuals that are from

• # Alderaan, Coruscant, or Endor
• newdata <- filter(starwars, homeworld == "Alderaan" | homeworld ==
"Coruscant" | homeworld == "Endor")
Creating/Recoding variables
• The mutate function allows you to create new variables or transform
existing ones.

• library(dplyr)

• # convert height in centimeters to inches,

• # and mass in kilograms to pounds
• newdata <- mutate(starwars, height = height * 0.394, mass = mass * 2.205)
• The ifelse function (part of base R) can be used for recoding data. The
format is ifelse(test, return if TRUE, return if FALSE).

• library(dplyr)

• # if height is greater than 180

• # then heightcat = "tall",
• # otherwise heightcat = "short"

• newdata <- mutate(starwars, heightcat = ifelse(height > 180, "tall", "short")

• # set heights greater than 200 or
• # less than 75 to missing
• newdata <- mutate(starwars, height = ifelse(height < 75 | height >
200,NA, height)
Summarizing data
• The summarize function can be used to reduce multiple values down to a single
value (such as a mean). It is often used in conjunction with the by_group function,
to calculate statistics by group. In the code below, the na.rm=TRUE option is used
to drop missing values before calculating the means.

• library(dplyr)

•# calculate mean height and mass

•newdata <- summarize(starwars, mean_ht =
mean(height, na.rm=TRUE), mean_mass =
mean(mass, na.rm=TRUE))
• newdata
• calculate mean height and weight by gender
• newdata <- group_by(starwars, gender)
• newdata <- summarize(newdata,
• mean_ht = mean(height, na.rm=TRUE),
• mean_wt = mean(mass, na.rm=TRUE))
• newdata
Methods for Reading Data
Reading CSV Files
A CSV file uses .csv extension and stores data in a table structure
format in any plain text. The following function reads data from a CSV
file:
read.csv(“filename”)
where, filename is the name of the CSV file that needs to be imported.

Reading Spreadsheets
read.xlsx(“filename”,…)
where, filename argument defines the path of the file to be read; the
dots “…” define the other optional arguments.

Data Analytics Using R
No ratings yet
Data Analytics Using R
37 pages
R1 Uptovisualisation
No ratings yet
R1 Uptovisualisation
122 pages
Data Types in R (Vectors)
No ratings yet
Data Types in R (Vectors)
48 pages
R Porgramming Notes
No ratings yet
R Porgramming Notes
20 pages
R Comandos
No ratings yet
R Comandos
13 pages
Pushpendra Lab File
No ratings yet
Pushpendra Lab File
51 pages
In R Programming PDF
No ratings yet
In R Programming PDF
72 pages
R Pres
No ratings yet
R Pres
53 pages
Lec 4 Basics of R
No ratings yet
Lec 4 Basics of R
22 pages
Data Science Using R - Lab Manual-Complete Ver 2.0 - Nov 2024
No ratings yet
Data Science Using R - Lab Manual-Complete Ver 2.0 - Nov 2024
36 pages
ICAEW Assurance WB 2023
100% (1)
ICAEW Assurance WB 2023
382 pages
2 Program
No ratings yet
2 Program
11 pages
Basic Coding Syntax and Structure in R - Version 2
No ratings yet
Basic Coding Syntax and Structure in R - Version 2
19 pages
R WorkSamples
No ratings yet
R WorkSamples
44 pages
Base R
No ratings yet
Base R
9 pages
R Statistical Package
No ratings yet
R Statistical Package
63 pages
Practical 1 - Data Frame Manipulation - 072502
No ratings yet
Practical 1 - Data Frame Manipulation - 072502
16 pages
Session For R
No ratings yet
Session For R
23 pages
R Study Material I
No ratings yet
R Study Material I
8 pages
Lab 1 22.7
No ratings yet
Lab 1 22.7
40 pages
R Programming-Chapiter 4
No ratings yet
R Programming-Chapiter 4
16 pages
Data Anlytics Using R Notes
No ratings yet
Data Anlytics Using R Notes
14 pages
R Cheatsheet Base R
No ratings yet
R Cheatsheet Base R
2 pages
Introduction To R
No ratings yet
Introduction To R
18 pages
K P P Abhilash Emergency Medicine Best Practices at CMC EMAC 2018
100% (1)
K P P Abhilash Emergency Medicine Best Practices at CMC EMAC 2018
531 pages
R
No ratings yet
R
13 pages
Week3 2020
No ratings yet
Week3 2020
20 pages
Basics of R Programming - Part 2
No ratings yet
Basics of R Programming - Part 2
7 pages
Network Analysis and Visualization With R and Igraph
No ratings yet
Network Analysis and Visualization With R and Igraph
62 pages
RStudio
No ratings yet
RStudio
60 pages
208B Manual de Vuelo PDF
100% (1)
208B Manual de Vuelo PDF
846 pages
Tutorial 1
No ratings yet
Tutorial 1
29 pages
R Training by Emma Mba
No ratings yet
R Training by Emma Mba
68 pages
Unit 4
No ratings yet
Unit 4
27 pages
R Programming
No ratings yet
R Programming
50 pages
Da Session 4
No ratings yet
Da Session 4
75 pages
Creating and Manipulating Objects
No ratings yet
Creating and Manipulating Objects
12 pages
Introduction To R Chap 2
No ratings yet
Introduction To R Chap 2
30 pages
ProgrammingForDS14 Rbasics
No ratings yet
ProgrammingForDS14 Rbasics
32 pages
Lecture 1
No ratings yet
Lecture 1
42 pages
Lab1 411 Eman Yahya 7773225
No ratings yet
Lab1 411 Eman Yahya 7773225
16 pages
1 - Introduction To Programming With R
No ratings yet
1 - Introduction To Programming With R
13 pages
R - A Practical Course
No ratings yet
R - A Practical Course
42 pages
Unit 2 Notes - Data Analysis Using R
No ratings yet
Unit 2 Notes - Data Analysis Using R
19 pages
Lesson 1
No ratings yet
Lesson 1
4 pages
4 Overview of R Part 2
No ratings yet
4 Overview of R Part 2
63 pages
Rbasics
No ratings yet
Rbasics
96 pages
State Budget 2025-26
No ratings yet
State Budget 2025-26
13 pages
Data in R
No ratings yet
Data in R
7 pages
Lecture 1
No ratings yet
Lecture 1
35 pages
Boom Placer Spare Parts Manual Sp1420 RMC - 80009056 - 0, Edition - Dec '18
No ratings yet
Boom Placer Spare Parts Manual Sp1420 RMC - 80009056 - 0, Edition - Dec '18
104 pages
R Software - Notes
No ratings yet
R Software - Notes
18 pages
R Programming
No ratings yet
R Programming
22 pages
Time Value of Money Notes & Concepts
No ratings yet
Time Value of Money Notes & Concepts
9 pages
MIS 4.hafta (Introduction To R)
No ratings yet
MIS 4.hafta (Introduction To R)
52 pages
An R Programming Quick Reference: Basic Data Representation
No ratings yet
An R Programming Quick Reference: Basic Data Representation
13 pages
R Session A
No ratings yet
R Session A
107 pages
TOS TLE 8 Agricrop For Sharing
No ratings yet
TOS TLE 8 Agricrop For Sharing
2 pages
Chapter 1 Introduction To R
No ratings yet
Chapter 1 Introduction To R
33 pages
N2 Data in R
No ratings yet
N2 Data in R
7 pages
Đề Thi Học Kì 1 Lớp 3 Môn Tiếng Anh
No ratings yet
Đề Thi Học Kì 1 Lớp 3 Môn Tiếng Anh
56 pages
Introduction To R
No ratings yet
Introduction To R
21 pages
R - Tutorial: Matrices Are Vectors
No ratings yet
R - Tutorial: Matrices Are Vectors
13 pages
Module 1: Unit - 1.1: Introduction To Analytics or R Programming
No ratings yet
Module 1: Unit - 1.1: Introduction To Analytics or R Programming
26 pages
Introduction To R PDF
No ratings yet
Introduction To R PDF
56 pages
Programming With R: Lecture #4
No ratings yet
Programming With R: Lecture #4
34 pages
CA Final DT A MTP 2 May 2025 Exam Castudynotes Com
No ratings yet
CA Final DT A MTP 2 May 2025 Exam Castudynotes Com
21 pages
WACC Solved Sums
0% (1)
WACC Solved Sums
6 pages
R-Tutorial - Introduction
No ratings yet
R-Tutorial - Introduction
30 pages
TVM Practice Sums & Solutions
100% (1)
TVM Practice Sums & Solutions
15 pages
Shriman Tapasviji Maharaj.
No ratings yet
Shriman Tapasviji Maharaj.
40 pages
Final Paper
No ratings yet
Final Paper
6 pages
FLEX-1500 Service Manual
No ratings yet
FLEX-1500 Service Manual
49 pages
Risk & Return
No ratings yet
Risk & Return
90 pages
Lesson-Plan 1
No ratings yet
Lesson-Plan 1
2 pages
5th Grade Gmo Plan
No ratings yet
5th Grade Gmo Plan
1 page
50 Years of The Future
100% (1)
50 Years of The Future
25 pages
Acyfar 3 Answer Key Q1andq2 T2ay2324
No ratings yet
Acyfar 3 Answer Key Q1andq2 T2ay2324
3 pages
Lecture 1a
No ratings yet
Lecture 1a
22 pages
Inventory Management and Cash Budget
No ratings yet
Inventory Management and Cash Budget
3 pages
EMD001 - Medical Companion
No ratings yet
EMD001 - Medical Companion
115 pages
Taxi Reimbursement Request Form 07.31.24 - 0
No ratings yet
Taxi Reimbursement Request Form 07.31.24 - 0
2 pages
10.working Capital Management Sums
No ratings yet
10.working Capital Management Sums
4 pages
Form 60
No ratings yet
Form 60
1 page
Tutorial Letter 201/1/2018: Organisational Communication
No ratings yet
Tutorial Letter 201/1/2018: Organisational Communication
37 pages
Intermediary Liability in A Global World: Prof. Dr. Matthias Leistner, LL.M. (Cambridge)
No ratings yet
Intermediary Liability in A Global World: Prof. Dr. Matthias Leistner, LL.M. (Cambridge)
40 pages
APacCHRIE 2016 Paper 12
No ratings yet
APacCHRIE 2016 Paper 12
6 pages
TVF Practice Sum - 2
No ratings yet
TVF Practice Sum - 2
14 pages
TVF Practice Sum - 1
No ratings yet
TVF Practice Sum - 1
12 pages
The Status of Knowledge
No ratings yet
The Status of Knowledge
8 pages
Pricing Policies That Protect Your Brand
No ratings yet
Pricing Policies That Protect Your Brand
7 pages
Week 5 MODULE PURPOSIVE COMMUNICATION
No ratings yet
Week 5 MODULE PURPOSIVE COMMUNICATION
13 pages
Grade 6 2nd Q Final
No ratings yet
Grade 6 2nd Q Final
5 pages
Origin of HAZOP Analysis
No ratings yet
Origin of HAZOP Analysis
5 pages
A Guidelines For Interviewing For The High School Newspaper
No ratings yet
A Guidelines For Interviewing For The High School Newspaper
4 pages
The Process of Photosynthesis
No ratings yet
The Process of Photosynthesis
2 pages
Valuation of Equity Sums
No ratings yet
Valuation of Equity Sums
5 pages
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Introduction to PHP, Part 2, Second Edition
From Everand
Introduction to PHP, Part 2, Second Edition
Adam Majczak
No ratings yet

CH 3

Uploaded by

CH 3

Uploaded by

Chapter 3 -

to enter random numbers between 1 to 100 . 10 numbers. so we will write

Division x/y x divided by y

Integer division x%/%y x divided by y but rounded down

Computing the sqrt(x) Computing the square root of x

x>y True if left operand greater than the right

x != y True if left operand is not equal to the right

Print system’s time

(ii) Print the value in the variable, “Var”.

(iii) Perform arithmetic operations on the variable, “Var”.

Create a vector of numbers

Retrieve the names of the elements in the list “emp”.

toupper(a) ∙ a is a character vector The function converts a string into

aggregate(x, by, FUN, …)

• data <- data.frame(x1 = 1:5, x2 = 2:6, x3 = 1, group = c("A", "A", "B",

aggregate(data[colnames(data) != "group"],by = list(data$group),FUN = mean)

• Find out average by grouping name.

tapply(x, INDEX, FUN, …)

where, x is an object that defines the summary variable; INDEX argument

• tapply(price, class, mean) Class ko dekh ke price ka mean niklege

• tapply(price, shop1, mean) shop1 ko dekh ke price ka mean niklege

• tapply(price, class1, mean)

# keep the variables name, height, and gender

# keep the variables name and all variables

# keep all variables except birth_year and gender

• # select females that are from Alderaan

• # select individuals that are from

• # convert height in centimeters to inches,

• # if height is greater than 180

• newdata <- mutate(starwars, heightcat = ifelse(height > 180, "tall", "short")

•# calculate mean height and mass

You might also like