0% found this document useful (0 votes)
12 views

R Programming Cont..

MBA(BA) R introduction ppts

Uploaded by

Dr Shweta RAI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

R Programming Cont..

MBA(BA) R introduction ppts

Uploaded by

Dr Shweta RAI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 24

R programming

Data Frames
• Data Frames in R Language are generic data
objects of R that are used to store tabular
data. Data frames can also be interpreted as
matrices where each column of a matrix can
be of different data types. R Data Frame is
made up of three principal components, the
data, rows, and columns.
Create a data frame
• x <-
data.frame(GENDAR=c("M","F","M","F"),AGE=
c(25,18,2,8),
WEIGHT=c(2,5,2,8),HEIGHT=c(6,3,2,4))
• print(x)
#output
???????
# R program to create dataframe

# creating a data frame


friend.data <- data.frame(
friend_id = c(1:5),
friend_name = c("Sachin",
"Sourav",
"Dravid",
"Sehwag",
"Dhoni"),
stringsAsFactors = FALSE
)
# print the data frame
print(friend.data)
# Create the data frame
emp.data <- data.frame( emp_id = c (1:5),
emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),
salary = c(623.3,515.2,611.0,729.0,843.25),
start_date = as.Date(c("2012-01-01", "2013-09-23", "2014-
11-15", "2014-05-11", "2015-03-27")),
stringsAsFactors = TRUE)
# Print the data frame.
print(emp.data)
dplyr and tidyr
• dplyr is a package that provides a grammar of data
manipulation, and provides a most used set of verbs that
helps data science analysts to solve the most common
data manipulation. All dplyr verbs take input as
data.frame and return data.frame object.
• tidyr' contains tools for changing the shape (pivoting) and
hierarchy (nesting and 'unnesting') of a dataset, turning
deeply nested lists into rectangular data frames
('rectangling'), and extracting values out of string
columns. It also includes tools for working with missing
values (both implicit and explicit).
# import dplyr package
install.packages('dplyr')
library(dplyr)
# create a data frame
stats <- data.frame(player=c('A', 'B', 'C', 'D'),
runs=c(100, 200, 408, 19),
wickets=c(17, 20, NA, 5))
# fetch players who scored more
# than 100 runs
filter(stats, runs>100)
# Create DataFrame
df <- data.frame( id = c(10,11,12,13,14,15,16,17),
name =
c('sai','ram','deepika','sahithi','kumar','scott','Don','Lin'),
gender = c('M','M','F','F','M','M','M','F'),
dob = as.Date(c('1990-10-02','1981-3-24','1987-6-
14','1985-8-16', '1995-03-02','1991-6-21','1986-3-
24','1990-8-26')),
state = c('CA','NY',NA,NA,'DC','DW','AZ','PH'),
row.names=c('r1','r2','r3','r4','r5','r6','r7','r8') )
df
By using dplyr filter() function you can filter the R data frame rows by name, filter
dataframe by column value, by multiple conditions e.t.c. Here, %>% is an infix
operator which acts as a pipe, it passes the left-hand side of the operator to the first
argument of the right-hand side of the operator.

# Load dplyr library


library('dplyr')
# filter() by row name
df %>% filter(rownames(df) == 'r3')
# filter() by column Value
df %>% filter(gender == 'M')
# filter() by list of values
df %>% filter(state %in% c("CA", "AZ", "PH"))
# filter() by multiple conditions
df %>% filter(gender == 'M' & id > 15)
dplyr::select() Examples
dplyr select() function is used to select the columns or variables from the data frame.
This takes the first argument as the data frame and the second argument is the
variable name or vector of variable names. For more examples refer to
select columns by name and select columns by index position.

• # select() single column


• df %>% select('id')
• # select() multiple columns
• df %>% select(c('id','name'))
• # Select multiple columns by id
• df %>% select(c(1,2))
• # Select rows 2 and 3
• df %>% slice(2,3)
• # Select rows from list
• df %>% slice(c(2,3,5,6))
• # select rows by range
• df %>% slice(2:6)
• # Drop rows using slice()
• df %>% slice(-2,-3,-4,-5,-6)
• # Drop by range
• df %>% slice(-2:-6)
dplyr::mutate() Examples
Use mutate() function and its other
verbs mutate_all(), mutate_if() and mutate_at() from
dplyr package to replace/update the values of the
column (string, integer, or any type) in R DataFrame
(data.frame).

# REPLACE ON SELECTED COLUMN


DF %>%
MUTATE(NAME = STR_REPLACE(NAME,
"SAI", "SAIRAM"))
dplyr::rename() Examples
The rename() function of dplyr is used to change the column name present in the data frame. The first example from the
following renames the column from the old name id to the new name c1. Similarly use dplyr to rename multiple columns.

• #Change the column name - c1 to id


my_dataframe %>% rename("c1" = "id") #
Rename multiple columns by name
my_dataframe <- my_dataframe %>%
rename("c1" = "id", "c2" = "pages", "c3" =
"name") # Rename multiple columns by index
my_dataframe <- my_dataframe %>%
rename(col1 = 1, col2 = 2)
dplyr::distinct() Examples
distinct() function of dplyr is used to select the unique/distinct rows from the input data frame. Not using any column/variable names as

arguments, this function returns unique rows by checking values on all columns .

• # Create dataframe
df=data.frame(id=c(11,11,33,44,44),
pages=c(32,32,33,22,22),
name=c("spark","spark","R","java","jsp"),
chapters=c(76,76,11,15,15),
price=c(144,144,321,567,567)) df # Load library
dplyr library(dplyr) # Distinct rows df2 <- df %>%
distinct() df2
• # Distinct on selected columns df2 <- df %>%
distinct(id,pages) df2
dplyr::arrange() Examples
dplyr arrange() function is used to sort the R dataframe rows by ascending or descending order based on column
values.

• # Create Data Frame


df=data.frame(id=c(11,22,33,44,55),
name=c("spark","python","R","jsp","java"),
price=c(144,NA,321,567,567), publish_date=
as.Date( c("2007-06-22", "2004-02-13", "2006-
05-18", "2010-09-02","2007-07-20")) ) # Load
dplyr library library(dplyr)
# Using arrange in ascending order df2 <- df
%>% arrange(price) df2
dplyr::group_by()
group_by() function in R is used to group the rows in a DataFrame by single or multiple columns and perform the aggregations.

• # Create Data Frame df =


read.csv('/Users/admin/apps/github/r-
examples/resources/emp.csv') df # Load dplyr
library(dplyr) # group_by() on department
grp_tbl <- df %>% group_by(department)
grp_tbl # summarise on groupped data.
agg_tbl <- grp_tbl %>%
summarise(sum(salary)) agg_tbl
One can get the structure of the R data frame using str() function in R. It can display
even the internal structure of large lists which are nested. It provides one-liner output
for the basic R objects letting the user know about the object and its constituents

• # R program to get the


• # structure of the data frame

• # creating a data frame
• friend.data <- data.frame(
• friend_id = c(1:5),
• friend_name = c("Sachin", "Sourav",
• "Dravid", "Sehwag",
• "Dhoni"),
• stringsAsFactors = FALSE
• )
• # using str()
• print(str(friend.data))
In the R data frame, the statistical summary and nature of the data can be obtained by applying
summary() function. It is a generic function used to produce result summaries of the results of
various model fitting functions. The function invokes particular methods which depend on the
class of the first argument.

• # R program to get the


• # summary of the data frame

• # creating a data frame
• friend.data <- data.frame(
• friend_id = c(1:5),
• friend_name = c("Sachin", "Sourav",
• "Dravid", "Sehwag",
• "Dhoni"),
• stringsAsFactors = FALSE
• )
• # using summary()
• print(summary(friend.data))
Extracting data from an R data frame means that to access its rows or
columns. One can extract a specific column from an R data frame using its
column name.

• # R program to extract
• # data from the data frame

• # creating a data frame
• friend.data <- data.frame(
• friend_id = c(1:5),
• friend_name = c("Sachin", "Sourav",
• "Dravid", "Sehwag",
• "Dhoni"),
• stringsAsFactors = FALSE
• )

• # Extracting friend_name column
• result <- data.frame(friend.data$friend_name)
• print(result)
A data frame in R can be expanded by adding new
columns and rows to the already existing R data frame .
• # R program to expand
• # the data frame

• # creating a data frame
• friend.data <- data.frame(
• friend_id = c(1:5),
• friend_name = c("Sachin", "Sourav",
• "Dravid", "Sehwag",
• "Dhoni"),
• stringsAsFactors = FALSE
• )

• # Expanding data frame
• friend.data$location <- c("Kolkata", "Delhi",
• "Bangalore", "Hyderabad",
• "Chennai")
• resultant <- friend.data
• # print the modified data frame
• print(resultant)
A data frame in R removes columns and rows from
the already existing R data frame.
• library(dplyr)
• # Create a data frame
• data <- data.frame(
• friend_id = c(1, 2, 3, 4, 5),
• friend_name = c("Sachin", "Sourav", "Dravid", "Sehwag", "Dhoni"),
• location = c("Kolkata", "Delhi", "Bangalore", "Hyderabad", "Chennai")
• )

• # Remove a row with friend_id = 3
• data <- subset(data, friend_id != 3)

• # Remove the 'location' column
• data <- select(data, -location)
• Print(data)

You might also like