0% found this document useful (0 votes)
14 views45 pages

DSR - Unit 2 - 3.3 LineGraphs

This document outlines a lecture on Data Science with R, focusing on the creation and use of line graphs. It covers the use of ggplot for visualizing continuous and categorical variables, as well as techniques for enhancing graph appearance and adding multiple lines. Resources for further reading are also provided.

Uploaded by

donmoulali786
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views45 pages

DSR - Unit 2 - 3.3 LineGraphs

This document outlines a lecture on Data Science with R, focusing on the creation and use of line graphs. It covers the use of ggplot for visualizing continuous and categorical variables, as well as techniques for enhancing graph appearance and adding multiple lines. Resources for further reading are also provided.

Uploaded by

donmoulali786
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 45

Established as per the Section 2(f) of the UGC Act, 1956

Approved by AICTE, COA and BCI, New Delhi

Unit 3: Lecture 3.3


Data Science with R
Department of Computer Science and Engineering

Sailaja Thota
[email protected]

AY:2021-22
OUTLINE
Recap of previous Lecture
Topic for the Lecture

Objective and Outcome of Lecture

Lecture Discussion

Loop Functions
Data Science with R
Recap of previous Lecture
RECAP OF PREVIOUS LECTURE

Exploring Basic Graphs


Bar Graphs
TOPIC OF THE LECTURE
• Line Graphs
Data Science with R
Objective and Outcome of
Lecture
OBJECTIVE AND OUTCOME OF
LECTURE

Lecture Explain the Graphics functions in R.


Objective

Lecture
Outline the creation and use of Graphs in R
Outcome
Line Graph
LINE GRAPHS

• Line graphs are typically used for visualizing how one continuous
variable, on
the y-axis, changes in relation to another continuous variable, on
the x-axis.
• Often the x variable represents time, but it may also represent some
other
continuous quantity; for example, the amount of a drug
administered to
experimental subjects.
• Line graphs can also be used with a discrete variable on the x-axis.
This is appropriate when the variable is ordered (e.g., “small,”
“medium,” “large”), but not when the variable is unordered (e.g.,
“cow,” “goose,” “pig”).
BASIC LINE GRAPH

Use ggplot() with geom_line(), and specify which variables you


mapped to x
and y (Figure 4-1):
ggplot(BOD, aes(x = Time, y = demand)) +
geom_line()
Line graphs can be made with discrete (categorical) or continuous (numeric)
variables on the x-axis. In the example here, the variable demand is numeric, but
it could be treated as a categorical variable by converting it to a factor with
factor() (Figure 4-2). When the x variable is a factor, you must also use
aes(group=1) to ensure that ggplot knows that the data points belong together
and should be connected with a line:
In the BOD data set there is no entry for
Time = 6, so there is no level 6 when
Time is converted to a factor.
Factors hold categorical values, and in
that context, 6 is just another value.
It happens to not be in the data set,
so there’s no space for it on the x-axis.
With ggplot2, the default y range of a line
graph is just enough to include the y
values in the data. For some kinds of data
it’s better to have the y range start
from zero. You can use ylim() to set the
range, or you can use
expand_limits() to expand the range
to include a value. This will set the range
from zero to the maximum value of the
demand column in BOD (Figure 4-3):
ADDING POINTS TO A LINE GRAPH
With the log y-axis, you can see that the rate
of proportional change has increased in the
last thousand years. The estimates for the years
before 0 have a roughly constant rate of change
of 10 times per 5,000 years. In the most recent
1,000 years, the population has increased at a
much faster rate.
MAKING A LINE GRAPH WITH MULTIPLE LINES

• You want to make a line graph with more than one line.
• Solution
• In addition to the variables mapped to the x-and y-axes, map
another (discrete) variable to colour or linetype, as shown in Figure
4-6:
• The tg data has three columns, including the factor supp, which we
mapped to
• colour and linetype:
• tg If the x variable is a factor, you must also tell ggplot
• #> supp dose length to group by that same variable, as described next.
Line graphs can be used with a continuous or categorical
• #> 1 OJ 0.5 13.23
variable on the x-axis. sometimes the variable mapped
• #> 2 OJ 1.0 22.70 to the x-axis is conceived of as being categorical, even
• #> 3 OJ 2.0 26.06 when it’s stored as a number. In the example here, there are
• #> 4 VC 0.5 7.98 three values of dose: 0.5, 1.0, and 2.0. You may want to
treat these as categories rather than values on a continuous
• #> 5 VC 1.0 16.77 scale. To do this, convert dose to a factor
• #> 6 VC 2.0 26.14 (Figure 4-7):
Notice the use of group = supp. Without this statement, ggplot won’t
know how to group the data together to draw the lines, and it will give
an error:
ggplot(tg, aes(x = factor(dose), y = length, colour = supp)) +
geom_line()
#> geom_path: Each group consists of only one observation. Do you
need to
#> adjust the group aesthetic?
Another common problem when the incorrect grouping is used is that
you will
see a jagged sawtooth pattern, as in Figure 4-8:
ggplot(tg, aes(x = dose, y = length)) +
geom_line()
This happens because there are
multiple data points at each y
location, and ggplot thinks they’re all
in one group. The data points for
each group are connected with a
single line, leading to the sawtooth
pattern. If any discrete variables are
mapped to aesthetics like colour or
linetype, they are automatically used
as grouping variables. But if you
want to use other variables
for grouping (that aren’t mapped to
an aesthetic), they should be used
with group.
• Sometimes points will
overlap. In these cases,
you may want to dodge
them, which means their
positions will be adjusted
left and right (Figure 4-10).
When doing so, you must
also dodge the lines, or
else only the points will
move and they will be
misaligned. You must also
specify how far they
should move when
dodged:
CHANGING THE APPEARANCE OF LINES

• You want to change the appearance of the lines in a line graph.


• Solution
• The type of line (solid, dashed, dotted, etc.) is set with linetype, the
thickness (in mm) with size, and the color of the line with colour (or
color). These properties can be set (as shown in Figure 4-11) by
passing them values in the call to geom_line():
• The default colors aren’t the most appealing, so you may want to
use a different palette, as shown in Figure 4-12, by using
scale_colour_brewer() or scale_colour_manual():
library(gcookbook) # Load gcookbook for the tg data set
ggplot(tg, aes(x = dose, y = length, colour = supp)) +
geom_line() +scale_colour_brewer(palette = "Set1")
CHANGING THE APPEARANCE OF POINTS

You want to change the appearance of the points in a line graph.


In geom_point(), set the size, shape, colour, and/or fill outside of aes()
(the result is shown in Figure 4-14):
• ggplot(BOD, aes(x = Time, y = demand)) +
• geom_line() +
• geom_point(size = 4, shape = 22, colour = "darkred", fill = "pink")
The default shape for points is a solid circle, the default size is 2, and
the default color is black. The fill color is relevant only for some point
shapes (numbered 21–25), which have separate outlines and fill colors
(see Recipe 5.3 for a chart of shapes). The fill color is typically NA, or
empty; you can fill it with white to get hollow-looking circles, as shown
in Figure 4-15:
ggplot(BOD, aes(x = Time, y = demand)) +
geom_line() +
geom_point(size = 4, shape = 21, fill = "white")
If the points and lines have different
colors, you should specify the points
after the lines, so that they are
drawn on top. Otherwise, the lines
will be drawn on top of the points.
The default colors are not very
appealing, so you may want to use a
different palette, using
scale_colour_brewer() or
scale_colour_manual().
To set a single constant shape or
size for all the points,
as in Figure 4-16, specify
shape or size outside of aes():
library(gcookbook) # Load gcookbook for
the tg data set
# Save the position_dodge specification
because we'll use it multiple times
pd <- position_dodge(0.2)
ggplot(tg, aes(x = dose, y = length, fill =
supp)) +
geom_line(position = pd) +
geom_point(shape = 21, size = 3, position
= pd) +
scale_fill_manual(values =
c("black","white"))
MAKING A GRAPH WITH A SHADED AREA

• Use geom_area() to get a shaded area


MAKING A STACKED AREA GRAPH
Use geom_area() and map a factor to fill (Figure 4-20):
• library(gcookbook) # Load gcookbook for the uspopage data set
• ggplot(uspopage, aes(x = Year, y = Thousands, fill = AgeGroup)) +
• geom_area()
ADDING A CONFIDENCE REGION

• Use geom_ribbon() and map values to ymin and ymax. In the


climate data set, Anomaly10y is a 10-year running average of the
deviation (in Celsius) from the average 1950–1980 temperature, and
Unc10y is the 95% confidence interval. We’ll set ymax and ymin to
Anomaly10y plus or minus Unc10y
TEXTBOOKS/RESOURCES

• Roger D. Peng, “R Programming for Data Science”, Leanpub, 2015

• Winston Chang, “R Graphics Cookbook Practical Recipes for


Visualizing Data”, O'Reilly Media, 2012

• https://www.r-project.org/about.html
• https://cran.r-project.org/mirrors.html
• https://rstudio.com/products/rstudio/download/
DISCUSSION

• 5 MINUTES

Functions in R

You might also like