0% found this document useful (0 votes)
30 views12 pages

Advanced R Programming Tidyverse Packages Notes

The Tidyverse is a collection of R packages designed for data science, facilitating data exploration, visualization, and transformation. Key packages include ggplot2 for visualization, dplyr for data manipulation, and tidyr for data cleaning. The document also details specific functions within these packages, such as gather, separate, and unite in tidyr, along with examples of their usage.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views12 pages

Advanced R Programming Tidyverse Packages Notes

The Tidyverse is a collection of R packages designed for data science, facilitating data exploration, visualization, and transformation. Key packages include ggplot2 for visualization, dplyr for data manipulation, and tidyr for data cleaning. The document also details specific functions within these packages, such as gather, separate, and unite in tidyr, along with examples of their usage.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 12

Tidyverse packages

Tidyverse package: Tidyverse is a powerful package, when dealing


with Data Science in the R Programming Language, the Tidyverse
packages are the best . These Tidyverse packages were specially
designed for Data Science with a common design philosophy.It is
powerful collection of R packages R packages required in the data
science.
It include all the packages workflow, ranging from data exploration to data
visualization.
For example, readr is for data importing, tibble and tidyr help in tidying
the data, dplyr and stringr contribute to data transformation and ggplot2 is
vital for data visualization.

There are eight core Tidyverse packages namely ggplot2, dplyr, tidyr,
readr, purrr, tibble, stringr that are mentioned in this article. All of these
packages are loaded automatically at once with the
install.packages(“tidyverse”) command.
Tidyverse Packages in R following:
1. Data Visualization and Exploration
 ggplot2
2. Data Wrangling and Transformation
 dplyr
 tidyr
 stringr

3. Data Import and Management


 tibble
 readr
4. Functional Programming
 purrr

Note: These above all packages are comes under tidyvers package so, there
is no need to install all the packages separately.
1) tidyr: tidyr is a data cleaning library in R which helps to
create tidy data. Tidyr data means that all the data cells have
a single value with each of the data columns being a variable
and the data rows being an observation.
One of the most important packages in R is the tidyr package. The sole
purpose of the tidyr package is to simplify the process of creating tidy
data. Tidy data describes a standard way of storing data that is used
wherever possible throughout the tidyverse. If you once make sure
that your data is tidy, you’ll spend less time punching with the tools and
more time working on your analysis.

a) Gather() function: It takes multiple columns and gathers them into


key-value pairs. Basically it makes “wide” data longer.
The gather() function will take multiple columns and collapse them
into key-value pairs, duplicating all other columns as needed.
Example
Install.packages(“tidyverse”)
library(tidyverse)
library(dplyr)
#create a data frame
n=10
tidy_df=data.frame(
s.no=c(1:n),
Group.1=c(23,345,76,212,88,199,72,35,90,265),
Group.2=c(117,89,66,334,90,101,178,233,45,200),
Group.3=c(29,101,239,289,176,320,89,109,199,56)
)
tidy_df
# Apply gather function
long<-tidy_df %>%
gather(Group,Frequency,Group.1:Group.3)
print(long)
b) Seprate(): It converts longer data to a wider format.
The separate() function turns a single character column into multiple
columns.
Example:
separate_data<-long %>%
separate(Group,c("Allotment","Number"))
separate_data
c)Unite(): It merges two columns into one column. The unite() function is
a convenience function to paste together multiple variable values into one.
In essence, it combines two variables of a single observation into one
variable.
Example:
Unite_data<-separate_data %>%
unite(Group, Allotment, Number, Sep=".")
print(Unite_data)

unite_data <- separate_data %>%


unite(Group, Allotment,
Number, sep = ".")
unite_data
d)spread(): It helps in reshaping a longer format to a wider format.
The spread() function spreads a key-value pair across multiple columns.
Example:
back_to_wide<-unite_data %>%
spread(Group,Frequency)
print(back_to_wide)
e) nest(): It creates a list of data frames containing all the nested variables.
Nesting is implicitly a summarizing operation. This is useful in conjunction
with other summaries that work with whole datasets, most notably models.
Example:
df<-tidy_df
df1<-df %>% nest(data=c(Group.2))
df1
f)unnest(): It basically reverses the nest operation. It makes each element
of the list its own row. It can handle list columns that contain atomic
vectors, lists, or data frames (but not a mixture of the different types).
Example:
df1 %>% unnest(Group.2,.drop=NA,.preserve=NULL)
nun<-iris
names(iris)
head(nun %>% nest(data=c(Species)))

head(nun %>% unnest(Species,.drop=NA,.preserve=NULL))


Note:
.drop: This option is used for additional list columns be dropped. By default it
will drop them if unesting the specified columns required the rows to be
duplicate.
.preserve: List column to preserve in the output. These will be duplicate in the
same way as atomic vectors.

g) fill() function: Fill function is used to fill in the missing values in selected
columns using the previous entry. This is useful in the common output
format where values are not repeated, they’re recorded each time they
change. Missing values are replaced in atomic vectors; NULL is replaced
in the list.
Example:
df=data.frame(Month=1:6,
year=c(2000,rep(NA,5)))
df
df %>% fill(year)
h) full seq function: It basically fills the missing values in a vector which
should have been observed but weren’t. The vector should be numeric.
Example:
num_vec<-c(3,7,9,14,19,20)
seq<-full_seq(num_vec,1)
seq
g) drop_na function: This function drops rows containing missing values.
Example:
drop_df<-tibble(s.no=c(1:10),
Name=c("Jhon","Smith","Perer","luke","King",rep(NA,5)))

drop_df
dfg<-drop_df %>% drop_na(Name)
dfg
h) replace_na() function: This function is used to replaces missing values.
Example:
drop_df<-tibble(s.no=c(1:10),
Name=c("Jhon","Smith","Perer","luke","King",rep(NA,5)))

drop_df %>% replace_na(list(Name='Henry'))


2) tibble() Package: A tibble is a form of a data.frame which includes the
useful parts of it and discards the parts that are not so important. So
tibbles don’t change variables names or types like data.frames nor do
they do partial matching but they bring problems to the forefront much
sooner such as when a variable does not exist.
So a code with tibbles is much cleaner and effective than before. Tibbles
is also easier to use with larger datasets that contain more complex
objects, in part before an enhanced print() method.
we can create new tibbles from column vectors using the tibble() function
and we can also create a tibble row-by-row using a tribble() function. If we
want to install tibble, the best method is to install the tidyverse using:

Tibbles package have nice printing method that show only the first 10 rows
and all the Coolum that fit on the screen. This is useful when we work with
large data.
Example:
tf<-tibble(x=letters,y=1:26,z=sample(50,26))
View(tf)
z=sample(0,26)
#check tibble: Check weather the particular set of data is tibble or not.
Example:
is_tibble(mtcars)
# If we want to make a tibble format
Example:
is_tibble(as_tibble(mtcars))
# glimpse: glimpse is the function is used to get all the coloum name into
row.
Example:
glimpse(mtcars)
#enframe: enframe function is used to convert dataframe into tibble
Example:
enframe(1:3)
# deframe: We get the value back to frame
Example:
v<-deframe(tibble(a=1:3))
is_tibble(v)
# Mathmetical operations.
mat<-tibble(x=1:5,y=1,z=x^+y)
mat
# Add rows
Example:
df<-tibble(x=1:3,y=3:1)
df %>% add_row(x=4,y=0)
df %>% add_row(x=4:5,y=0,.before=2)
Note: If we want to add rows before second row then use .before option.
# Add coloum
Example:
df<-tibble(x=1:3,y=3:1)
df %>% add_column(z=-1:1,w=0)
# If we want to print last 3 dates then use negative sign with number.
Example:
data_t <- tibble(a = 1:3, b = letters[1:3],
c = Sys.Date() - 1:3)
print(data_t)
# If we want to print next coming week 3 dates then use positive sign with
number.
Example:
data1 <- data.frame(a = 1:3, b = letters[1:3],
c = Sys.Date() + 1:3)
print(data1)
3) purr package: Purrr is a popular R Programming package that provides
a consistent and powerful set of tools for working with functions and
vectors. It was developed by Hadley Wickham and is part of the tidyverse
suite of packages. Purrr is an essential package for functional
programming in R.
Purrr provides a set of functions that are designed to work with functional
programming concepts, such as mapping, filtering, and reducing. These
functions are designed to work with lists, data frames, and other objects,
making it easier to work with complex data structures.
The main functions provided by purrr are map(), walk(), reduce(),
accumulate(), and compose() etc. These functions can be used for a
variety of tasks, such as applying a function to each element of a list,
filtering a list based on a condition, and reducing a list to a single value.
Example1:
my_list<-list(
c(1,2,6),
c(4,7,1),
c(9,1,5)
)
#Find the mean of each vector by using map function
my_list %>% map(mean)
map(my_list,mean)
Example2:
df<-iris[1:4]
means<-map(df,mean)
means

4) readr packages: readr can read different kinds of file format using different
functions, namely read_csv() for comma -separated files, read_tsv() for
tab_seprated files,read_table() for tabular file,read_delim() for delimited file .
This readr library provides a simple and speedy method to read rectangular
data such as that with file format tsv,csv,delim
Note: before working with readr package first we need to set the working
directory
getwd()
my_path<-"E:\\R programme\\R.Directory\\CLASS (2).CSV"

dset<-read_csv(my_path,"CLASS (2).CSV")
dset
print(str_c(dset))
#class of dataset
dset %>% class()
#View
dset %>% View()
#Rename of dset column
datas<-dset %>% rename(Gen=X2)
datas
# Delete the coloum
datas<-dset %>% select(-X3,-X4)
datas
5) stringr: stringr is a library that has many functions used for data
cleaning and data preparation tasks. It is also designed for working with
strings and has many functions that make this an easy process.
All of the functions in stringr start with str and they take a string vector as
their first argument. Some of these functions include str_detect(),
str_extract(), str_match(), str_count(), str_replace(), str_subset(), etc. If
you want to install stringr, the best method is to install the tidyverse using:

a)Str_length() function: It will return a information vector which provide


information on number of character in each string.
Example:
my_string<-c("No", "analysis","without","data","and","R")
str_length(my_string)
b)Str_c() function: It join multiple string into single string.
Example1:
str_c(my_string,collapse=" ")
Note: After assign the quotation in collapse option to give one space between
the quotation.
Example2: We have another argument in strc called as separator. It will return
character vector of length 20.
str_c("x",1:20,sep="-",collapes="")
Example3:
str_c("x",1:20,sep="-",collapes=",")
c)str_replace_na(); If we want to concatenate all the elements with hyphen d
so in this case NA can not be concatenate with hyphen d. So for that we have
function str_replace_na which convert the NA into character NA.
Example3:
str_c(c("a",NA,"b"),"-d")
str_c(str_replace_na(c("a",NA,"b")),"-d")
d)str_sub() function: By using str_sub function we can extract the word from
the string or we can also modify the string.

Example1:
mystring<-"Data is the new Science"
str_sub(mystring,1,4)
Example2:
str_sub(mystring,17,-1)<-"art"
mystring
e) str_split(): This function is used to split string into pieces.
Example
str_split(mystring,pattern=" ")
f) str_to_lower: This function is used to convert string into low case.
Example
str_to_lower(c("ABC","JKL"))
g)str_to_upper: This function is used to convert string into upcase case.
Example
str_to_upper(c("abc","mno"))
h)str_to_title: This function is used to convert string into prop case
Example
str_to_title(c("abnm","word"))
6) tribble(): Tribble function is used to creating a row wise readable tibble in R.
Example
cat<-tribble(~test_1, ~test_2, ~test_3, ~test_4,
56,67,78,89,
54,67,21,20,
89,43,21,29,
23,24,25,25)
print(cat)

You might also like