Unit-Iv: Variation, Missing Values, Co Variation, Patterns and Models

The document discusses variation, missing values, covariation, patterns, and models in exploratory data analysis. It explains how to visualize distributions of categorical and continuous variables using bar charts and histograms. It also covers replacing missing values, examining covariation using boxplots, identifying patterns in data, and fitting simple linear models.

Uploaded by

venkatasaisumanth74

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views17 pages

Unit-Iv: Variation, Missing Values, Co Variation, Patterns and Models

Uploaded by

venkatasaisumanth74

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 17

UNIT-IV

variation, missing values, co variation, patterns and models

Variation
● Variation is the tendency of the values of a variable to change from
measurement to measurement.
● Continuous variables and Categorical variables will give different results
with different measurements. (e.g., the eye colors of different people)
● Every variable has its own pattern of variation, which can reveal
interesting information.
● The best way to understand that pattern is to visualize the distribution of
variables’ values.
Variation
Visualizing Distributions
● In R, categorical variables are usually saved as factors or character
vectors. To examine the distribution of a categorical variable, use a bar
chart:
● > ggplot(data = diamonds) + geom_bar(mapping = aes(x = cut))
● You can also compute these values manually with dplyr::count():
● diamonds %>% count(cut)
Variation
Visualizing Distributions
● A variable is continuous if it can take any of an infinite set of ordered
values.
● Numbers and date-times are two examples of continuous variables.
● To examine the distribution of a continuous variable, use a histogram.
● > ggplot(data = diamonds) + geom_histogram(mapping = aes(x = carat),
binwidth = 0.5)
● You can compute this by hand by combining dplyr::count() and
ggplot2::cut_width():
● > diamonds %>% count(cut_width(carat, 0.5))
Variation
Visualizing Distributions
● the diamonds with a size of less than three carats and choose a smaller
binwidth:
● > smaller <- diamonds %>% filter(carat < 3)
● ggplot(data = smaller, mapping = aes(x = carat)) +
geom_histogram(binwidth = 0.1)
Variation
Visualizing Distributions
● If you wish to overlay multiple histograms in the same plot, use
geom_freqpoly() instead of geom_histogram().
● geom_freqpoly() performs the same calculation as geom_histogram(), but
instead of displaying the counts with bars, uses lines instead.
● It’s much easier to understand overlapping lines than bars:
● > ggplot(data = smaller, mapping = aes(x = carat, color = cut)) +
geom_freqpoly(binwidth = 0.1)
Missing Values
● Drop the entire row with the strange values:
● > diamonds2 <- diamonds %>% filter(between(y, 3, 20))
● Instead, replace the unusual values with missing values. The easiest way
to do this is to use mutate() to replace the variable with a modified copy.
You can use the ifelse() function to replace unusual values with NA:
● > diamonds2 <- diamonds %>% mutate(y = ifelse(y < 3 | y > 20, NA, y))
Exploratory Data Analysis (EDA)
Missing Values
● In ggplot2 we get warning messages for missing values
● > ggplot(data = diamonds2, mapping = aes(x = x, y = y)) + geom_point()
● #> Warning: Removed 9 rows containing missing values
● #> (geom_point).
● To suppress that warning, set na.rm = TRUE:
● > ggplot(data = diamonds2, mapping = aes(x = x, y = y)) +
geom_point(na.rm = TRUE)
Covariation
● Covariation is the tendency for the values of two or more variables to vary
together in a related way.
● The best way to spot covariation is to visualize the relationship between
two or more variables.
● To display the distribution of a continuous variable broken down by a
categorical variable is the boxplot.
● A boxplot is a type of visual shorthand for a distribution of values that is
popular among statisticians.
Covariation
Each boxplot consists of:
● A box that stretches from the 25th percentile of the distribution to the 75th
percentile, a distance known as the interquartile range (IQR).
● In the middle of the box is a line that displays the median, i.e., 50th
percentile, of the distribution.
● These three lines give you a sense of the spread of the distribution and
whether or not the distribution is symmetric about the median or skewed
to one side.
Covariation
Covariation
● Let’s take a look at the distribution of price by cut using geom_boxplot():
● > ggplot(data = diamonds, mapping = aes(x = cut, y = price)) +
geom_boxplot()
● gplot(data = mpg, mapping = aes(x = class, y = hwy)) +geom_boxplot()

● Many categorical variables don’t have such an intrinsic order, so you

might want to reorder them to make a more informative display. One way
to do that is with the reorder() function.
● > ggplot(data = mpg) + geom_boxplot(mapping = aes(x = reorder(class,
hwy, FUN = median),y = hwy))
Covariation
● If you have long variable names, geom_boxplot() will work better if you
flip it 90°. You can do that with coord_flip():
● ggplot(data = mpg) + geom_boxplot(mapping = aes(x = reorder(class,
hwy, FUN = median), y = hwy)) + coord_flip()
Patterns and models
● Patterns in your data provide clues about relationships.
● If a systematic relationship exists between two variables it will appear as a
pattern in the data.
● If you spot a pattern, ask yourself:
● Could this pattern be due to coincidence (i.e., random chance)?
● How can you describe the relationship implied by the pattern?
● How strong is the relationship implied by the pattern?
● What other variables might affect the relationship?
● Does the relationship change if you look at individual subgroups of the
data?
● ggplot(data = faithful) + geom_point(mapping = aes(x = eruptions, y =
waiting))

● The scatterplot displays the two clusters

● code fits a model that predicts price from carat and then computes the
residuals
● library(modelr)
● mod <- lm(log(price) ~ log(carat), data = diamonds)
● diamonds2 <- diamonds %>% add_residuals(mod) %>% mutate(resid =
exp(resid))
● ggplot(data = diamonds2) +
● geom_point(mapping = aes(x = carat, y = resid))
● Once you’ve removed the strong relationship between carat and price,
you can see what you expect in the relationship between cut and price—
relative to their size, better quality diamonds are more expensive:
● ggplot(data = diamonds2) + geom_boxplot(mapping = aes(x = cut, y =
resid))

● ggplot2 Calls :

● ggplot(data = faithful, mapping = aes(x = eruptions)) +

geom_freqpoly(binwidth = 0.25)
THANK YOU

Answer
100% (2)
Answer
7 pages
Shandon Cytospin 3 Operator Guide
No ratings yet
Shandon Cytospin 3 Operator Guide
68 pages
Data Visualization With R - Principles and Practice
No ratings yet
Data Visualization With R - Principles and Practice
36 pages
Predicting Diamond Price Using Linear Model
50% (2)
Predicting Diamond Price Using Linear Model
20 pages
2018 Icas Invitation ENGLISH2
No ratings yet
2018 Icas Invitation ENGLISH2
2 pages
Eyal Lederman - Process Approach in PT
100% (1)
Eyal Lederman - Process Approach in PT
72 pages
Distance & Direction-2: Floor, Behind Bus Stand, Karnal - Contact: 7015275075, 7206600658
No ratings yet
Distance & Direction-2: Floor, Behind Bus Stand, Karnal - Contact: 7015275075, 7206600658
8 pages
Dismantling Naik
No ratings yet
Dismantling Naik
45 pages
FIT ZONE Nutrition Plan For MEN by Guru Mann
100% (1)
FIT ZONE Nutrition Plan For MEN by Guru Mann
8 pages
1 s2.0 S0263224113006519 Main
No ratings yet
1 s2.0 S0263224113006519 Main
11 pages
Legal Positivism Austins Theory
No ratings yet
Legal Positivism Austins Theory
6 pages
3 DescriptiveStatistics
No ratings yet
3 DescriptiveStatistics
25 pages
Case Study
No ratings yet
Case Study
20 pages
Gallup Test
No ratings yet
Gallup Test
25 pages
Benchmark Report - Voice Service Optimization For Common State, TP20160728
No ratings yet
Benchmark Report - Voice Service Optimization For Common State, TP20160728
16 pages
Copy Entire Document Content in R Studio: R Script Compiled by Mr. Anup Sharma (Strictly To Be Used As Class Notes)
No ratings yet
Copy Entire Document Content in R Studio: R Script Compiled by Mr. Anup Sharma (Strictly To Be Used As Class Notes)
15 pages
Chapter 2
No ratings yet
Chapter 2
9 pages
Big Data Analytics
No ratings yet
Big Data Analytics
13 pages
Sapien Labs Age of First Smartphone and Mental Wellbeing Outcomes
No ratings yet
Sapien Labs Age of First Smartphone and Mental Wellbeing Outcomes
26 pages
PRACTICUM, Day 1: R Graphing: Basic Plotting and Ggplot2: CRG Bioinformatics Unit, Sarah - Bonnin@crg - Eu May 6th, 2016
No ratings yet
PRACTICUM, Day 1: R Graphing: Basic Plotting and Ggplot2: CRG Bioinformatics Unit, Sarah - Bonnin@crg - Eu May 6th, 2016
52 pages
UCUN DINAS I BHS INGGRIS PKT A Dijawab
100% (3)
UCUN DINAS I BHS INGGRIS PKT A Dijawab
12 pages
Data Exploration
No ratings yet
Data Exploration
23 pages
Visualization in R
No ratings yet
Visualization in R
44 pages
Lecture 3&4
No ratings yet
Lecture 3&4
294 pages
Diamond1 Script
No ratings yet
Diamond1 Script
38 pages
Sneha SVMCM SC 2023-2024
No ratings yet
Sneha SVMCM SC 2023-2024
2 pages
Fern Complex: Operational Summary For Vegetation Management
No ratings yet
Fern Complex: Operational Summary For Vegetation Management
8 pages
Reating A Project IN Tudio: Steps
No ratings yet
Reating A Project IN Tudio: Steps
4 pages
R Module 4
No ratings yet
R Module 4
31 pages
Geom Histogram
No ratings yet
Geom Histogram
4 pages
Lab 6 Data Visualization
No ratings yet
Lab 6 Data Visualization
8 pages
Red Pills
100% (1)
Red Pills
2 pages
R For Data Exploration
No ratings yet
R For Data Exploration
52 pages
L4 Exploratory Analysis en
No ratings yet
L4 Exploratory Analysis en
42 pages
How To Make Any Plot in Ggplot2?: Topics
No ratings yet
How To Make Any Plot in Ggplot2?: Topics
18 pages
Lang Aquisition - Emergent Rubric Original All Criteria
No ratings yet
Lang Aquisition - Emergent Rubric Original All Criteria
4 pages
Data Visualization in R Sem-III 2021 PDF
No ratings yet
Data Visualization in R Sem-III 2021 PDF
57 pages
Module 1
No ratings yet
Module 1
5 pages
Ex1a & 1b
No ratings yet
Ex1a & 1b
4 pages
7exploatory Data Analysis
No ratings yet
7exploatory Data Analysis
33 pages
Cico Plast-N: Normal Water Reducing Admixture / Plasticiser For Concrete
No ratings yet
Cico Plast-N: Normal Water Reducing Admixture / Plasticiser For Concrete
2 pages
QPlot Tutorial
No ratings yet
QPlot Tutorial
8 pages
Guide To Create: Beautiful Graphics in R
No ratings yet
Guide To Create: Beautiful Graphics in R
48 pages
Content: Dplyr, Readr, TM, Ggplot2/+ggforce/, Tidyr, Broom Dplyr
No ratings yet
Content: Dplyr, Readr, TM, Ggplot2/+ggforce/, Tidyr, Broom Dplyr
8 pages
Grade 7 Maths Notes Part 1
No ratings yet
Grade 7 Maths Notes Part 1
6 pages
11 Data Visualization
No ratings yet
11 Data Visualization
44 pages
David Gerbing - R Visualizations Derive Meaning From Data (2020) - 1 - CRC Press (9780429894923)
100% (1)
David Gerbing - R Visualizations Derive Meaning From Data (2020) - 1 - CRC Press (9780429894923)
252 pages
Beautiful Graphics in R
No ratings yet
Beautiful Graphics in R
238 pages
Schedule For OzCon 2023 Revised 05-30 2
No ratings yet
Schedule For OzCon 2023 Revised 05-30 2
4 pages
BDA 09 Shridhti Tiwari
No ratings yet
BDA 09 Shridhti Tiwari
12 pages
Graphics
No ratings yet
Graphics
10 pages
R
No ratings yet
R
14 pages
Data Mining: Data Exploration: - Chapter 6
No ratings yet
Data Mining: Data Exploration: - Chapter 6
56 pages
Use Plotly
No ratings yet
Use Plotly
4 pages
Lab 0 - (Part 1) Lab Environment Setup
No ratings yet
Lab 0 - (Part 1) Lab Environment Setup
5 pages
R Graphics Essentials For Great Data Visualization
No ratings yet
R Graphics Essentials For Great Data Visualization
28 pages
From The Canterbury Tales - The Prologue
No ratings yet
From The Canterbury Tales - The Prologue
24 pages
ProgrammingForDS15 Dataviz
No ratings yet
ProgrammingForDS15 Dataviz
40 pages
Ggplot2 Exercise
No ratings yet
Ggplot2 Exercise
6 pages
Modelling With R
No ratings yet
Modelling With R
3 pages
On Eda
No ratings yet
On Eda
60 pages
Using Ggplot2 For Plots in R
No ratings yet
Using Ggplot2 For Plots in R
8 pages
Lesson3 Sandbox - RMD
No ratings yet
Lesson3 Sandbox - RMD
4 pages
Adjective Order NA
No ratings yet
Adjective Order NA
2 pages
R Ggplot2 Package
No ratings yet
R Ggplot2 Package
21 pages
Event Management and Marketing in Tourism
No ratings yet
Event Management and Marketing in Tourism
8 pages
EM 526 - Lab Assignment 03
No ratings yet
EM 526 - Lab Assignment 03
1 page
LP 4TH Grade 10 Day1
No ratings yet
LP 4TH Grade 10 Day1
3 pages
P6ADBMS
No ratings yet
P6ADBMS
34 pages
Basukukya
No ratings yet
Basukukya
9 pages
Histograms and Density Plots in R
No ratings yet
Histograms and Density Plots in R
9 pages
Unit 3data Visualization With Ggplot2
No ratings yet
Unit 3data Visualization With Ggplot2
19 pages
Exploratory Data Analysis Reference
No ratings yet
Exploratory Data Analysis Reference
50 pages
KrutikaKolhe 862467252 HW3
No ratings yet
KrutikaKolhe 862467252 HW3
14 pages
4.Ggplot2.Density - Boxplots.bi Variate
No ratings yet
4.Ggplot2.Density - Boxplots.bi Variate
29 pages
Pred Mold Buiness Report PDF
No ratings yet
Pred Mold Buiness Report PDF
49 pages
DS-R Block 4 All
No ratings yet
DS-R Block 4 All
50 pages
5.1 Exploratory Analysis en
No ratings yet
5.1 Exploratory Analysis en
79 pages
Ggplot2 For Data Visualization: Grammer of Graphics "
No ratings yet
Ggplot2 For Data Visualization: Grammer of Graphics "
19 pages
Unit 3
No ratings yet
Unit 3
36 pages
Business Analytics Unit - IV Notes - 60637706 - 2025 - 05!15!02 - 16
No ratings yet
Business Analytics Unit - IV Notes - 60637706 - 2025 - 05!15!02 - 16
28 pages
Business Analytics Unit 4
No ratings yet
Business Analytics Unit 4
24 pages
Research II Proposal
No ratings yet
Research II Proposal
26 pages
R语言学习笔记
No ratings yet
R语言学习笔记
78 pages
manual-KVL-c304i (D1) Öá W0208
No ratings yet
manual-KVL-c304i (D1) Öá W0208
8 pages
Pracal Labexamsamplequestions
No ratings yet
Pracal Labexamsamplequestions
35 pages
Unit 3 DS
No ratings yet
Unit 3 DS
30 pages
The Practically Cheating Calculus Handbook
From Everand
The Practically Cheating Calculus Handbook
S. Deviant
3.5/5 (7)
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Illuminating Data: A hands on guide to data visualization in R
From Everand
Illuminating Data: A hands on guide to data visualization in R
Eman Ahmad
No ratings yet