Unit 4 DVTTT
Unit 4 DVTTT
Customizing
Graphs
3-D ▹ The ggplot2 package and its extensions
Scatterplot can’t create a 3-D plot.
▹ However, you can create a 3-D
scatterplot with the scatterplot3d
function in the scatterplot3d package.
▹ plot automobile mileage vs. engine
displacement vs. car weight using the
data in the mtcars dataframe.
2
Now lets, modify the graph by replacing the points with filled
blue circles, add drop lines to the x-y plane, and create more
meaningful labels
3
4
5
6
▹ Next, label the points.
▹ saving the results of the scatterplot3d
function to an object, using the xyz
▹ convert function to convert coordinates
from 3-D (x, y, z) to 2D-projections (x,
y), and apply the text function to add
labels to the graph.
7
8
9
▹ As a final step, we will add information
on the number of cylinders in each car.
▹ we’ll add a column to the mtcars
dataframe indicating the color for each
point.
▹ For good measure, we will shorten the
y-axis, change the drop lines to dashed
lines, and add a legend
10
11
12
13
“
▹ easily see that the car with the highest
mileage (Toyota Corolla) has low engine
displacement, low weight, and 4 cylinders
14
Biplots ▹ A biplot is a specialized graph that
attempts to represent the relationship
between observations, between
variables, and between observations
and variables, in a low (usually two)
dimensional space.
▹ Let’s create a biplot for the mtcars
dataset, using the fviz_pca function
from the factoextra package.
15
▹ The fviz_pca function produces a ggplot2
graph.
▹ Dim1 and Dim2 are the first two principal
components - linear combinations of the
original p variables.
▹ P C1 = β10 + β11x1 + β12x2 + β13x3 + · · ·
+ β1pxp
▹ P C2 = β20 + β21x1 + β22x2 + β23x3 + · · ·
+ β2pxp
16
17
18
▹ The weights of these linear
combinations (βij s) are chosen to
maximize the variance accounted for in
the original variables.
▹ Additionally, the principal components
(PCs) are constrained to be
uncorrelated with each other.
19
▹ In this graph, the first PC accounts for
60% of the variability in the original
data.
▹ The second PC accounts for 24%.
Together, they account for 84% of the
variability in the original p = 11
variables.
▹ As you can see, both the observations
(cars) and variables (car
characteristics) are plotted in the same
graph.
20
▹ Points represent observations. Smaller
distances between points suggest similar
values on the original set of variables.
▹ For example, the Toyota Corolla and Honda
Civic are similar to each other, as are the
Chrysler Imperial and Liconln Continental.
▹ However, the Toyota Corolla is very
different from the Lincoln Continental.
21
▹ The observations that are are farthest along
the direction of a variable’s vector, have the
highest values on that variable.
▹ For example, the Toyoto Corolla and Honda
Civic have higher values on mpg. The Toyota
Corona has a higher qsec. The Duster 360 has
more cylinders.
22
▹ Care must be taken in interpreting
biplots.
▹ They are only accurate when the
percentage of variance accounted for is
high.
▹ Always check your conclusion with the
original data. See the article by Forrest
Young to learn more about interpreting
biplots correctly.
23
Thanks!
24