0% found this document useful (0 votes)

67 views

R Programming Unit-3 Complete Notes

Uploaded by

pawarpushkaraj05

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

67 views

R Programming Unit-3 Complete Notes

Uploaded by

pawarpushkaraj05

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Advanced Graphics, Importing Data- readr, Representing Tables

Unit 3 – tibble

Contents:

• Advanced Graphics:
Advanced plotting using Trellis; ggplots2, Lattice, Examples that Present Panels of
Scatterplots using xyplot(), Simple use of xyplot

• Importing Data- readr:

Functions for Reading Data, File Headers, Column Types, String-based Column Type
Specification, Functionbased Column Type Specification Parsing Time and Dates,
Space-separated Columns, Functions for Writing Data

• Representing Tables – tibble:

Creating Tibbles, Indexing Tibbles

29
Unit 3
Advanced Graphics:
Advanced plotting using Trellis
Lattice is an add-on package that implements Trellis graphics (originally developed for S and S-PLUS)
in R. It is a powerful and elegant high-level data visualization system, with an emphasis on multivariate
data,that is sufficient for typical graphics needs, and is also flexible enough to handle most nonstandard
requirements. This lab covers the basics of lattice and gives pointers to further resources.
lattice provides a high-level system for statistical graphics that is independent of traditional R graphics.
It is modeled on the Trellis suite in S-PLUS, and implements most of its features. In fact, lattice can
be considered an implementation of the general principles of Trellis graphics (?).
It uses the grid package (?) as the underlying implementation engine, and thus inherits many of its
features by default.
Trellis displays are defined by the type of graphic and the role different variables play in it. Each
display type is associated with a corresponding high-level function (histogram, densityplot, etc.).
Possible roles depend on the type of display, but typical ones are:
primary variables: those that define the primary display (e.g., gcsescore in the previous examples).
conditioning variables: divides data into subgroups, each of which are presented in a different panel
(e.g., score in the last two examples).
grouping variables: subgroups are contrasted within panels by superposing the corresponding displays
(e.g., gender in the last example).

ggplots2
ggplot2 package in R Programming Language also termed as Grammar of Graphics is a free, open-
source, and easy-to-use visualization package widely used in R. It is the most powerful visualization
package written by Hadley Wickham.
It includes several layers on which it is governed. The layers are as follows:
Building Blocks of layers with the grammar of graphics
• Data: The element is the data set itself
• Aesthetics: The data is to map onto the Aesthetics attributes such as x-axis, y-axis, color, fill,
size, labels, alpha, shape, line width, line type
• Geometrics: How our data being displayed using point, line, histogram, bar, boxplot
• Facets: It displays the subset of the data using Columns and rows
• Statistics: Binning, smoothing, descriptive, intermediate
• Coordinates: the space between data and display using Cartesian, fixed, polar, limits
• Themes: Non-data link

Lattice
The lattice package was written by Deepayan Sarkar. The package provides better defaults. It also
provides the ability to display multivariate relationships and it improves on the base-R graphics. This
package supports the creation of trellis graphs:
graphs that display a variable or the relationship between variables, conditioned on one or other
variables.
The typical format is:
graph_type(formula, data=)

Scatter Plots in the Lattice Package

The xyplot() function can be used to create a scatter plot in R using the lattice package. The iris
dataset is perfectly suited for this example.
library(lattice)

30
xyplot(Sepal.Length ~ Petal.Length,
data = iris)

We can also create plots in multiple panels based on groups.

xyplot(Sepal.Length ~ Petal.Length | Species,
group = Species,
data = iris,
type = c("p", "smooth"),
scales = "free")

Examples that Present Panels of Scatterplots using xyplot()

xyplot(): Scatter plot
R function: The R function xyplot() is used to produce bivariate scatter plots or time-series plots. The
simplified format is as follow:
xyplot(y ~ x, data)
Data set: mtcars
my_data <- iris
head(my_data)

Basic scatter plot: y ~ x

# Default plot
xyplot(Sepal.Length ~ Petal.Length, data = my_data)
# Color by groups
xyplot(Sepal.Length ~ Petal.Length, group = Species,
data = my_data, auto.key = TRUE)
xyplot(Sepal.Length ~ Petal.Length, data = my_data,
type = c("p", "g", "smooth"),
xlab = "Miles/(US) gallon", ylab = "Weight (1000 lbs)")

Multiple panels by groups: y ~ x | group

xyplot(Sepal.Length ~ Petal.Length | Species,
group = Species, data = my_data,
type = c("p", "smooth"),
scales = "free")

cloud(): 3D scatter plot

my_data <- iris
head(my_data)

Simple use of xyplot

The xyplot() function of the lattice package allows to build a scatterplot for several categories
automatically. The lattice library offers the xyplot() function. It builds a scatterplot for each levels of
a factor automatically.

Importing Data- readr:

Functions for Reading Data
readr is part of the core tidyverse, so you can load it with:
library(tidyverse)
library(readr)

31
readr supports the following file formats with these read_*() functions:
read_csv(): comma-separated values (CSV) files
read_tsv(): tab-separated values (TSV) files
read_delim(): delimited files (CSV and TSV are important special cases)
read_fwf(): fixed-width files
read_table(): whitespace-separated files
read_log(): web log files

File Headers
The first line in a comma-separated file is not always the column names; that information might be
available from elsewhere outside the file. If you do not want to interpret the first line as column names,
you can use the
option col_names = FALSE.
read_csv(
file = "data/data.csv",
col_names =FALSE
)
Since the data/data.csv file has a header, that is interpreted as part of the data, and because the header
consists of strings, read_csv infers that all the column types are strings. If we did not have the header,
for example, if we had the file data/data-no-header.csv:
1, a, a, 1.2
2, b, b, 2.1
3, c, c, 13.0
then we would get the same data frame as before, except that the names would be autogenerated:
read_csv(
file = "data/data-no-header.csv",
col_names =FALSE
)
If you have data in a file without a header, but you do not want the autogenerated names, you can
provide column names to the col_names option:
read_csv(
file = "data/data-no-header.csv",
col_names = c("X", "Y", "Z", "W")
)

Column Types
When read_csv parses a file, it infers the type of each column. This inference can be slow, or worse
the inference can be incorrect. If you know a priori what the types should be, you can specify this
using the col_types option. If you do this, then read_csv will not make a guess at the types. It will,
however, replace values that it cannot parse as of the right type into NA.

String-based Column Type Specification

In the simplest string specification format, you must provide a string with the same length as you
have columns and where each character in the string specifies the type of one column. The characters
specifying different types are this:

32
By default, read_csv guesses, so we could make this explicit using the type specification "????":
read_csv(
file = "data/data.csv",
col_types = "????"
)

The results of the guesses are double for columns A and D and character for columns B and C. If we
wanted to make this explicit, we could use "dccd".

read_csv(
file = "data/data.csv",
col_types = "dccd"
)
If you want an integer type for column A, you can use "iccd":
read_csv(
file = "data/data.csv",
col_types = "iccd"
)

Function based Column Type Specification

If you are like me, you might find it hard to remember the single-character codes for different types.
If so, you can use longer type names that you specify using function calls. These functions have names
that start with col_, so you can use autocomplete to get a list of them. The types you can specify using
functions are the same as those you can specify using characters, of course, and the functions are:

You need to wrap the function-based type specifications in a call to cols.

read_csv(
file = "data/data.csv",
col_types = cols(A = col_integer())

33
)

read_csv(
file = "data/data.csv",
col_types = cols(D = col_character())
)

Most of the col_ functions do not take any arguments, but they are affected by the locale parameter the
same way that the string specifications are. For factors, date, time, and datetime types, however, you
have more control over the format using the col_ functions. You can use arguments to these functions
for specifying how read_csv should parse dates and how it should construct factors.
For factors, you can explicitly set the levels. If you do not, then the column parser will set the levels
in the order it sees the different strings in the column. For example, in data/data.csv the strings in
columns C and D are in the order a, b, and c:

By defaults, the two columns will be interpreted as characters, but if we specify that C should be a
factor, we get one where the levels are a, b, and c, in that order.
my_data <- read_csv(
file = "data/data.csv",
col_types = cols(C = col_factor())
)

If we want the levels in a different order, we can give col_factor() a levels argument.
my_data <- read_csv(
file = "data/data.csv",
col_types = cols(
C = col_factor(levels = c("c", "b", "a"))
)
)
my_data$C

We can also make factors ordered using the ordered argument

my_data <- read_csv(
file = "data/data.csv",
col_types = cols(
B = col_factor(ordered = TRUE),
C = col_factor(levels = c("c", "b", "a"))
)
)
my_data$B

Parsing Time and Dates

The most complex types to read (or write) are dates and time (and datetime), just because these are
written in many different ways. You can specify the format that dates and datetime are in using a
string with codes that indicate how time information is represented. The codes are these:

34
There are shortcuts for frequently used formats:

As we saw earlier, you can set the date and time format using the locale() function. If you do not, the
default codes will be %AD for dates and %AT for time (there is no locale() argument for datetime).
These codes specify YMD and H:M/H:M:S formats, respectively, but are more relaxed in matching
the patterns. The date parse, for example, will allow different separators. For dates, both “1975-02-
15” and “1975/02/15” will be read as February the 15th 1975, and for time, both “18:00” and “6:00
pm” will be six o’clock in the evening.
In the following text, I give a few examples. I will use the functions parse_date, parse_time, and
parse_datetime rather than read_csv with column type specifications. These functions are used by
read_csv when you specify a date, time, or datetime column type, but using read_csv for the
examples would be unnecessarily verbose. Each takes a vector string representation of dates and
time. For more examples, you can read the function documentation ?col_datetime. Parsing time is
simplest; there is not much variation in how time points are written. The main differences are in
whether you use 24-hour clocks or 12-hour clocks. The %R and %T codes expect 24-hour clocks and
differ in whether seconds are included or not.
parse_time(c("18:00"), format = "%R")
parse_time(c("18:00:30"), format = "%T")

35
Space-separated Columns
The preceding functions all read delimiter-separated columns. They expect a single character to
separate one column from the next. If the argument trim_ws is true, they ignore whitespace. This
argument is true by default for read_csv, read_csv2, and read_tsv, but false for read_delim. The
functions read_table and read_table2 take a different approach and separate columns by one or more
spaces. The simplest of the two is read_table2. It expects any sequence of whitespace to separate
columns. Consider
read_table2(
"A B C D
1234
15 16 17 18"
)
The header names are separated by two spaces. The first data line has spaces before the first line
since the string is indented the way it is. Between columns, there are also two spaces. For the second
data line, we have several spaces before the first value, once again, but this time only single space
between the columns. If we used a delimiter character to specify that we wanted a space to separate
columns, we had to have exactly the same number of spaces between each column. The read_table
function instead reads the data as fixed-width columns. It uses the whitespace in the file to figure out
the width of the columns. After this, each line will be split into characters that match the width of the
columns and assigned to those columns.
read_table(
"
ABCD
121 xyz 14 15
22 abc 24 25
"
)
the columns are aligned, and the rows are interpreted as we might expect. Aligned, here, means that
we have aligned spaces at some position between the columns. If you do not have spaces at the same
location in all rows, columns will be merged.
read_table(
"
ABCD
121 xyz 14 15
22 abc 24 25
"
)
Here, the header C is at the position that should separate columns C and D, and these columns are
therefore merged. If you have spaces in all rows but data between them in some columns only, you
will get an error. For example, if your data looks like this
read_table(
"
ABCD
121 xyz x 14 15
22 abc 24 25
"
)
where the x in the first data line sits between two all-space columns. If you need more specialized
fixed-width files, you might want to consider the read_fwf function. See its documentation for
details: ?read_fwf. The read_table and read_table2 functions take many of the same arguments as the

36
delimiter-based parser, so you can, for example, specify column types and set the locale in the same
way as the preceding data. Not part of the main Tidyverse, the packages loaded when you load the
package tidyverse, is readxl. Its read_excel function does exactly what it says on the tin; it reads
Excel spreadsheets into R. Its interface is similar to the functions in readr. Where the interface differs
is in Excel specific options such as which sheet to read. Such options are clearly only needed when
reading Excel files.

Functions for Writing Data

Writing data to a file is more straightforward than reading data because we have the data in the
correct types and we do not need to deal with different formats. With readr’s writing functions, we
have fewer options to format our output—for example, we cannot give the functions a locale() and
we cannot specify date and time formatting, but we can use different functions to specify delimiters
and time will be output in ISO 8601 which is what the reading functions will use as default. The
functions are write_delim(), write_csv(), write_csv2(), and write_tsv(), and for formats that Excel
can read, write_excel_csv() and write_excel_csv2(). The difference between write_csv() and write_
excel_csv() and between write_csv2() and write_excel_csv2() is that the Excel functions include a
UTF-8 byte order mark so Excel knows that the file is UTF-8 encoded.
The first argument to these functions is the data we want to write and the second is the path to the file
we want to write to. If this file has suffix .gz, .bz2, or .xz, the output is automatically compressed. I
will not list all the arguments for these functions here, but you can read the documentation for them
from the R console. The argument you are most likely to use is col_names which, if true, means that
the function will write the column names as the first line in the output, and if false, will not. If you
use write_delim(), you might also want to specify the delimiter character using the delim argument.
By default it is a single space; if you write to a file using write_delim() with the default options, you
get the data in a format that you can read using read_table2(). The delimiter characters and the
decimal points for write_csv(), write_csv2(), and write_tsv are the same as for the corresponding
read functions

Representing Tables – tibble:

Creating Tibbles
Tidyverse functions that create tabular data will create tibbles rather than data frames. For example,
when we use read_csv to read a file into memory, the result is a tibble:
x <- read_csv(file = "data/data.csv")
The table that read_csv() creates has several super-classes, but the last is data.frame.
class(x)

This means that generic functions, if not specialized in the other classes, will use the data.frame
version, and this, in turn, means that you can often use tibbles in functions that expect data frames. It
does not mean that you can always use tibbles as a replacement for a data frame. If you run into this
problem, you can translate a tibble into a data frame using as.data.frame():
y <- as.data.frame(x)
y

You can create a tibble from vectors using the tibble() function:
x <- tibble(
x = 1:100,
y = x^2,
z = y^2
)
X

37
Two things to notice here: when you print a tibble, you only see the first ten lines. This is because the
tibble has enough lines that it will flood the console if you print all of them. If a tibble has more than
20 rows, you will only see the first ten. If it has fewer, you will see all the rows. You can change how
many lines you will see using the n option to print():
print(x, n = 2)

If a tibble has more columns than your console can show, only some will be printed. You can change
the number of characters it will print using the width option to print

print(x, n = 2, width = 15)

Indexing Tibbles
You can index a tibble in much the same way as you can index a data frame. You can extract a
column using single-bracket index ([]), either by name or by index:
x <- read_csv(file = "data/data.csv")

y <- as.data.frame(x)
x["A"]
The result is a tibble or data.frame, respectively, containing a single column. Chapter 3 Representing
Tables: tibble 40 If you use double brackets ([[]]), you will get the vector contained in a column
rather than a tibble/data frame:

x[["A"]]

You will also get the underlying vector of a column if you use $-indexing:

x$A

Using [] you can extract more than one column

x[c("A", "C")]

You cannot do this using [[]]. You can extract a subset of rows and columns if you use two indices.
For example, you can get the first two rows in the first two columns using [1:2,1:2]:
x[1:2,1:2]

Unit 2
No ratings yet
Unit 2
32 pages
Vectors:: Status Poor, Improved, Excellent
No ratings yet
Vectors:: Status Poor, Improved, Excellent
4 pages
R Prog
No ratings yet
R Prog
27 pages
06 Plots Export Plots
100% (1)
06 Plots Export Plots
17 pages
Mod1 R Programming
No ratings yet
Mod1 R Programming
49 pages
02 Graphs and Chart in R-2012
No ratings yet
02 Graphs and Chart in R-2012
24 pages
R Programming: © 2016 SMART Training Resources Pvt. LTD
No ratings yet
R Programming: © 2016 SMART Training Resources Pvt. LTD
28 pages
P6ADBMS
No ratings yet
P6ADBMS
34 pages
Graphics in R
No ratings yet
Graphics in R
8 pages
IDS Unit-5
No ratings yet
IDS Unit-5
39 pages
DA_Lab_Week-2
No ratings yet
DA_Lab_Week-2
22 pages
R Programming For NGS Data Analysis
No ratings yet
R Programming For NGS Data Analysis
5 pages
Importing The Files
No ratings yet
Importing The Files
14 pages
Module 3-2
No ratings yet
Module 3-2
35 pages
R Programming Unit 2
No ratings yet
R Programming Unit 2
46 pages
M3 Dar
No ratings yet
M3 Dar
52 pages
Module_4
No ratings yet
Module_4
23 pages
Untitled
No ratings yet
Untitled
59 pages
Data Analytics Using R
100% (1)
Data Analytics Using R
27 pages
Empirical Software Engineering (Swe504) : Practical File
No ratings yet
Empirical Software Engineering (Swe504) : Practical File
27 pages
Muthayammal College of Arts and Science Rasipuram: Assignment No - 1
No ratings yet
Muthayammal College of Arts and Science Rasipuram: Assignment No - 1
10 pages
R Graphics
No ratings yet
R Graphics
76 pages
W2 Advanced Data Structures, IO & Control
No ratings yet
W2 Advanced Data Structures, IO & Control
44 pages
Introduction to R for Business Analytics(1)
No ratings yet
Introduction to R for Business Analytics(1)
7 pages
MDPN460 Lecture05
No ratings yet
MDPN460 Lecture05
32 pages
DSCI Key Terms and Ideas For Review
No ratings yet
DSCI Key Terms and Ideas For Review
98 pages
Module 1 Rprogramming Introduction Part A
No ratings yet
Module 1 Rprogramming Introduction Part A
20 pages
Unit II - R Programming
No ratings yet
Unit II - R Programming
29 pages
Module 4-1
No ratings yet
Module 4-1
84 pages
Basic R Dplyr Session 4 Demonstration
No ratings yet
Basic R Dplyr Session 4 Demonstration
18 pages
R & Python notes
No ratings yet
R & Python notes
131 pages
Presentation of R
No ratings yet
Presentation of R
109 pages
UL2
No ratings yet
UL2
2 pages
R Tutorial
No ratings yet
R Tutorial
15 pages
Kmbn It01_ Unit 4
No ratings yet
Kmbn It01_ Unit 4
19 pages
Data Science Lab Manual
No ratings yet
Data Science Lab Manual
40 pages
introduction_to_R_Charts_Graphs_AN_15_09_2024 (1)
No ratings yet
introduction_to_R_Charts_Graphs_AN_15_09_2024 (1)
8 pages
R Data Types 8
No ratings yet
R Data Types 8
7 pages
Unit 5 Advanced Graphics in r
No ratings yet
Unit 5 Advanced Graphics in r
43 pages
basics of R
No ratings yet
basics of R
12 pages
Basic of R Language: Jarno Tuimala
No ratings yet
Basic of R Language: Jarno Tuimala
41 pages
Graphics: R.M. Ripley
No ratings yet
Graphics: R.M. Ripley
28 pages
ppt3
No ratings yet
ppt3
20 pages
MTech R Notes
No ratings yet
MTech R Notes
14 pages
Unit 1 R Reading-Writing Files
No ratings yet
Unit 1 R Reading-Writing Files
8 pages
Unit 5 r Programming
No ratings yet
Unit 5 r Programming
43 pages
UNIT-II R Programming
No ratings yet
UNIT-II R Programming
41 pages
RBasics Handout
No ratings yet
RBasics Handout
6 pages
EM622 Data Analysis and Visualization Techniques For Decision-Making
No ratings yet
EM622 Data Analysis and Visualization Techniques For Decision-Making
47 pages
2.R Concepts - BDSM - Oct2020 PDF
No ratings yet
2.R Concepts - BDSM - Oct2020 PDF
37 pages
R Most Important Question
No ratings yet
R Most Important Question
12 pages
ProgrammingForDS14_Rbasics
No ratings yet
ProgrammingForDS14_Rbasics
32 pages
Visualizing Data in R 4: Graphics Using the base, graphics, stats, and ggplot2 Packages 1st Edition Margot Tollefson download
No ratings yet
Visualizing Data in R 4: Graphics Using the base, graphics, stats, and ggplot2 Packages 1st Edition Margot Tollefson download
40 pages
Graph Plotting in R Programming
No ratings yet
Graph Plotting in R Programming
12 pages
Introduction To R - Part1
No ratings yet
Introduction To R - Part1
34 pages
Introduction To R II
No ratings yet
Introduction To R II
35 pages
R-Programming-Cheat-Sheet
No ratings yet
R-Programming-Cheat-Sheet
7 pages
R – Charts and Graphs[1]
No ratings yet
R – Charts and Graphs[1]
21 pages
Stat2022,chapter 4
No ratings yet
Stat2022,chapter 4
10 pages

R Programming Unit-3 Complete Notes

Uploaded by

R Programming Unit-3 Complete Notes

Uploaded by

Advanced Graphics, Importing Data- readr, Representing Tables

• Importing Data- readr:

• Representing Tables – tibble:

Scatter Plots in the Lattice Package

We can also create plots in multiple panels based on groups.

Examples that Present Panels of Scatterplots using xyplot()

Basic scatter plot: y ~ x

Multiple panels by groups: y ~ x | group

cloud(): 3D scatter plot

Simple use of xyplot

Importing Data- readr:

String-based Column Type Specification

Function based Column Type Specification

You need to wrap the function-based type specifications in a call to cols.

We can also make factors ordered using the ordered argument

Parsing Time and Dates

Functions for Writing Data

Representing Tables – tibble:

print(x, n = 2, width = 15)

Using [] you can extract more than one column

You might also like