SlideShare a Scribd company logo
2
Most read
4
Most read
19
Most read
VISUALIZING DATA IN R.
Presentation by:
Ummiya Mohammedi
MSc-2Cs
1213163320
Data-Visualization tools and techniques offer executives and other
knowledge workers new approaches to dramatically improve their ability
to grasp information hiding in their data.
Data visualization is a general term that describes any effort to help
people understand the significance of data by placing it in a visual
context. Patterns, trends and correlations that might go undetected in
text-based data can be exposed and recognized easier with data
visualization software.
It isn't just the attraction of the huge range of statistical analyses
afforded by R that attracts data people to R. The language has also
developed a rich ecosystem of charts, plots and visualizations over
the years.
ggplot2 is a data visualization package for the statistical programming language R.
Created by Hadley Wickham in 2005, ggplot2 is an implementation of Leland
Wilkinson's Grammar of Graphics—a general scheme for data visualization which
breaks up graphs into semantic components such as scales and layers.
ggplot2 can serve as a replacement for the base graphics in R and contains a
number of defaults for web and print display of common scales.
Since 2005, ggplot2 has grown in use to become one of the most popular R
packages. It is licensed under GNU GPL v2.
ggplot2
Basic Visualization
Histogram
Bar / Line Chart
Box plot
Scatter plot
Advanced Visualization
Heat Map
Mosaic Map
Map Visualization
3D Graphs
Correlogram
1. Histogram
Histogram is basically a plot that breaks the data into bins (or
breaks) and shows frequency distribution of these bins. You
can change the breaks also and see the effect it has data
visualization in terms of understandability.
Data visualization using R
2. Bar/ Line Chart
Line Chart
Below is the line chart showing the increase in air
passengers over given time period. Line Charts are
commonly preferred when we are to analyse a trend
spread over a time period. Furthermore, line plot is also
suitable to plots where we need to compare relative
changes in quantities across some variable (like time).
Below is the code:
plot(AirPassengers,type="l") #Simple Line Plot
Data visualization using R
Bar Chart
Bar Plots are suitable for showing comparison between
cumulative totals across several groups. Stacked Plots are
used for bar plots for various categories. Here’s the code:
Data visualization using R
3. Box Plot ( including group-by option )
Box Plot shows 5 statistically significant numbers- the minimum, the 25th percentile, the
median, the 75th percentile and the maximum. It is thus useful for visualizing the spread
of the data is and deriving inferences accordingly. Here’s the basic code:
Ingest Data
When reading data into R, we generally will use
the read.table() or read.csv()function. This opens a file and
returns the content of that file.
In the above example we store the contents of the file in the
variab le bugData. Notice that we use the <- operator in R
instead of the = like in most other languages.
There are certain parameters that we can pass in
to table.read().
Among the most often used of these parameters
are: sep, header, row.name, and col.name.
R provides the plot function that can be used to create time
series charts. We can either pass in a complete data structure
like in the example below (if it contains a plotting function), or
we can pass in lists to serve as the x- and y- axes of the chart.
?plot
1 plot(Nile, col="blue", bty="7", lwd=2, xlab="", ylab="", main="Flow of the River Nile")
R also provides a barplot() function to create bar charts.
The barplot function accepts either a matrix or a vector value as
the data structure.
barplot(as.matrix(USPersonalExpenditure), main="US Personal Expenditures")
R provides the hist() function to create histograms.
The hist() function accepts a vector of values.
Usage
ggplot(data = NULL, mapping = aes(), ..., environment = parent.frame())
Arguments
Data:
Default dataset to use for plot. If not already a data.frame, will be converted to one
by fortify. If not specified, must be suppled in each layer added to the plot.
mapping
Default list of aesthetic mappings to use for plot. If not specified, must be suppled in
each layer added to the plot.
environment
If an variable defined in the aesthetic mapping is not found in the data, ggplot will
look for it in this environment. It defaults to using the environment in which ggplot() is
called.
ggplot() is used to construct the initial plot object, and is
almost always followed by + to add component to the plot. There
are three common ways to invoke ggplot:
ggplot(df, aes(x, y, ))
ggplot(df)
ggplot()
The first method is recommended if all layers use the same data
and the same set of aesthetics, although this method can also be
used to add a layer using data from another data frame. See the
first example below.
The second method specifies the default data frame to use for the
plot, but no aesthetics are defined up front. This is useful when one
data frame is used predominantly as layers are added, but the
aesthetics may vary from one layer to another.
The third method initializes a skeleton ggplot object which is
fleshed out as layers are added. This method is useful when
multiple data frames are used to produce different layers, as is
often the case in complex graphics.
Data visualization using R

More Related Content

PDF
Data Visualization With R
Rsquared Academy
 
PDF
Introduction to Rstudio
Olga Scrivner
 
PPTX
Data analysis with R
ShareThis
 
PDF
Introduction to R Graphics with ggplot2
izahn
 
PPTX
R programming presentation
Akshat Sharma
 
PPTX
Unit 1 - R Programming (Part 2).pptx
Malla Reddy University
 
PDF
Introduction to R
Kazuki Yoshida
 
PPT
R programming slides
Pankaj Saini
 
Data Visualization With R
Rsquared Academy
 
Introduction to Rstudio
Olga Scrivner
 
Data analysis with R
ShareThis
 
Introduction to R Graphics with ggplot2
izahn
 
R programming presentation
Akshat Sharma
 
Unit 1 - R Programming (Part 2).pptx
Malla Reddy University
 
Introduction to R
Kazuki Yoshida
 
R programming slides
Pankaj Saini
 

What's hot (20)

PPTX
Data visualization with R
Biswajeet Dasmajumdar
 
PPTX
Data Management in R
Sankhya_Analytics
 
PPTX
Exploratory data analysis
Gramener
 
PDF
Data analytics using R programming
Umang Singh
 
PPTX
DATA PREPROCESSING AND DATA CLEANSING
Ahtesham Ullah khan
 
PPT
R studio
Kinza Irshad
 
PPTX
Principal Component Analysis (PCA) and LDA PPT Slides
AbhishekKumar4995
 
PPTX
R Programming Language
NareshKarela1
 
PPTX
Data Visualization - A Brief Overview
Rotary Club of North Raleigh
 
PDF
R data-import, data-export
FAO
 
PPTX
multi dimensional data model
moni sindhu
 
PPT
Data Preprocessing
Object-Frontier Software Pvt. Ltd
 
PPTX
Step By Step Guide to Learn R
Venkata Reddy Konasani
 
PPTX
Python Seaborn Data Visualization
Sourabh Sahu
 
PPTX
Sampling Distributions and Estimators
Long Beach City College
 
PPTX
Data science life cycle
Manoj Mishra
 
PPTX
Data visualization-tools
Rolando Fajardo CPE, RN, MAN, MIM
 
PPT
K mean-clustering algorithm
parry prabhu
 
PPT
2. visualization in data mining
Azad public school
 
PDF
Data tidying with tidyr meetup
Matthew Samelson
 
Data visualization with R
Biswajeet Dasmajumdar
 
Data Management in R
Sankhya_Analytics
 
Exploratory data analysis
Gramener
 
Data analytics using R programming
Umang Singh
 
DATA PREPROCESSING AND DATA CLEANSING
Ahtesham Ullah khan
 
R studio
Kinza Irshad
 
Principal Component Analysis (PCA) and LDA PPT Slides
AbhishekKumar4995
 
R Programming Language
NareshKarela1
 
Data Visualization - A Brief Overview
Rotary Club of North Raleigh
 
R data-import, data-export
FAO
 
multi dimensional data model
moni sindhu
 
Step By Step Guide to Learn R
Venkata Reddy Konasani
 
Python Seaborn Data Visualization
Sourabh Sahu
 
Sampling Distributions and Estimators
Long Beach City College
 
Data science life cycle
Manoj Mishra
 
Data visualization-tools
Rolando Fajardo CPE, RN, MAN, MIM
 
K mean-clustering algorithm
parry prabhu
 
2. visualization in data mining
Azad public school
 
Data tidying with tidyr meetup
Matthew Samelson
 
Ad

Similar to Data visualization using R (20)

PPTX
Exploratory Data Analysis
Umair Shafique
 
PPTX
Chart and graphs in R programming language
CHANDAN KUMAR
 
PPTX
Exploratory data analysis using r
Tahera Shaikh
 
PDF
Data Visualization in R (Graph, Trend, etc)
Rudyansyah -
 
PDF
Science Online 2013: Data Visualization Using R
William Gunn
 
PPTX
Visualization_Data with ggplot2_Day 2.pptx
krittika26
 
PPTX
R and Visualization: A match made in Heaven
Edureka!
 
PPTX
R and Visualization: A match made in Heaven
Edureka!
 
DOCX
Week-3 – System RSupplemental material1Recap •.docx
helzerpatrina
 
PPTX
An implementation of the grammar of graphics: ggplot
Hoffman Lab
 
PPTX
Exploratory Analysis Part1 Coursera DataScience Specialisation
Wesley Goi
 
PPTX
Data visualization
Baijayanti Chakraborty
 
PDF
Unit---4.pdf how to gst du paper in this day and age
FireBolt6
 
PDF
R training5
Hellen Gakuruh
 
PPTX
Time Series.pptx
Ramakrishna Reddy Bijjam
 
PPTX
A picture speaks a thousand words - Data Visualisation with R
Barbara Fusinska
 
PDF
Big datacourse
Massimiliano Ruocco
 
PDF
Introduction to R Short course Fall 2016
Spencer Fox
 
PPT
Basics of R-Progranmming with instata.ppt
geethar79
 
PPT
17641.ppt
AhmedAbdalla903058
 
Exploratory Data Analysis
Umair Shafique
 
Chart and graphs in R programming language
CHANDAN KUMAR
 
Exploratory data analysis using r
Tahera Shaikh
 
Data Visualization in R (Graph, Trend, etc)
Rudyansyah -
 
Science Online 2013: Data Visualization Using R
William Gunn
 
Visualization_Data with ggplot2_Day 2.pptx
krittika26
 
R and Visualization: A match made in Heaven
Edureka!
 
R and Visualization: A match made in Heaven
Edureka!
 
Week-3 – System RSupplemental material1Recap •.docx
helzerpatrina
 
An implementation of the grammar of graphics: ggplot
Hoffman Lab
 
Exploratory Analysis Part1 Coursera DataScience Specialisation
Wesley Goi
 
Data visualization
Baijayanti Chakraborty
 
Unit---4.pdf how to gst du paper in this day and age
FireBolt6
 
R training5
Hellen Gakuruh
 
Time Series.pptx
Ramakrishna Reddy Bijjam
 
A picture speaks a thousand words - Data Visualisation with R
Barbara Fusinska
 
Big datacourse
Massimiliano Ruocco
 
Introduction to R Short course Fall 2016
Spencer Fox
 
Basics of R-Progranmming with instata.ppt
geethar79
 
Ad

More from Ummiya Mohammedi (8)

PPTX
Astable multivibrator
Ummiya Mohammedi
 
PPTX
Personal branding
Ummiya Mohammedi
 
PPTX
Pay roll managemnt
Ummiya Mohammedi
 
PPT
Multi core processors
Ummiya Mohammedi
 
PPTX
Distributed Operating Systems
Ummiya Mohammedi
 
PPTX
Depth Buffer Method
Ummiya Mohammedi
 
PPTX
Artificial Intellegence
Ummiya Mohammedi
 
DOC
Artificial intellegince in healthcare sector
Ummiya Mohammedi
 
Astable multivibrator
Ummiya Mohammedi
 
Personal branding
Ummiya Mohammedi
 
Pay roll managemnt
Ummiya Mohammedi
 
Multi core processors
Ummiya Mohammedi
 
Distributed Operating Systems
Ummiya Mohammedi
 
Depth Buffer Method
Ummiya Mohammedi
 
Artificial Intellegence
Ummiya Mohammedi
 
Artificial intellegince in healthcare sector
Ummiya Mohammedi
 

Recently uploaded (20)

PDF
Nashik East side PPT 01-08-25. vvvhvjvvvhvh
mandar401157
 
PDF
Mastering Query Optimization Techniques for Modern Data Engineers
Accentfuture
 
PDF
345_IT infrastructure for business management.pdf
LEANHTRAN4
 
PPTX
Machine Learning Solution for Power Grid Cybersecurity with GraphWavelets
Sione Palu
 
PPTX
Data Security Breach: Immediate Action Plan
varmabhuvan266
 
PDF
Master Databricks SQL with AccentFuture – The Future of Data Warehousing
Accentfuture
 
PPTX
GR3-PPTFINAL (1).pptx 0.91 MbHIHUHUGG,HJGH
DarylArellaga1
 
PDF
Linux OS guide to know, operate. Linux Filesystem, command, users and system
Kiran Maharjan
 
PPTX
Major-Components-ofNKJNNKNKNKNKronment.pptx
dushyantsharma1221
 
PPTX
Purple and Violet Modern Marketing Presentation (1).pptx
SanthoshKumar229321
 
PDF
Research about a FoodFolio app for personalized dietary tracking and health o...
AustinLiamAndres
 
PPTX
Complete_STATA_Introduction_Beginner.pptx
mbayekebe
 
PPTX
Data-Driven-Credit-Card-Launch-A-Wells-Fargo-Case-Study.pptx
sumitmundhe77
 
PDF
Chad Readey - An Independent Thinker
Chad Readey
 
PDF
Data Science Trends & Career Guide---ppt
jisajoy3061
 
PPTX
artificial intelligence deeplearning-200712115616.pptx
revathi148366
 
PDF
Company Presentation pada Perusahaan ADB.pdf
didikfahmi
 
PDF
AI Lect 2 Identifying AI systems, branches of AI, etc.pdf
mswindow00
 
PDF
TCP_IP for Programmers ------ slides.pdf
Souhailsouhail5
 
PDF
Data Analyst Certificate Programs for Beginners | IABAC
Seenivasan
 
Nashik East side PPT 01-08-25. vvvhvjvvvhvh
mandar401157
 
Mastering Query Optimization Techniques for Modern Data Engineers
Accentfuture
 
345_IT infrastructure for business management.pdf
LEANHTRAN4
 
Machine Learning Solution for Power Grid Cybersecurity with GraphWavelets
Sione Palu
 
Data Security Breach: Immediate Action Plan
varmabhuvan266
 
Master Databricks SQL with AccentFuture – The Future of Data Warehousing
Accentfuture
 
GR3-PPTFINAL (1).pptx 0.91 MbHIHUHUGG,HJGH
DarylArellaga1
 
Linux OS guide to know, operate. Linux Filesystem, command, users and system
Kiran Maharjan
 
Major-Components-ofNKJNNKNKNKNKronment.pptx
dushyantsharma1221
 
Purple and Violet Modern Marketing Presentation (1).pptx
SanthoshKumar229321
 
Research about a FoodFolio app for personalized dietary tracking and health o...
AustinLiamAndres
 
Complete_STATA_Introduction_Beginner.pptx
mbayekebe
 
Data-Driven-Credit-Card-Launch-A-Wells-Fargo-Case-Study.pptx
sumitmundhe77
 
Chad Readey - An Independent Thinker
Chad Readey
 
Data Science Trends & Career Guide---ppt
jisajoy3061
 
artificial intelligence deeplearning-200712115616.pptx
revathi148366
 
Company Presentation pada Perusahaan ADB.pdf
didikfahmi
 
AI Lect 2 Identifying AI systems, branches of AI, etc.pdf
mswindow00
 
TCP_IP for Programmers ------ slides.pdf
Souhailsouhail5
 
Data Analyst Certificate Programs for Beginners | IABAC
Seenivasan
 

Data visualization using R

  • 1. VISUALIZING DATA IN R. Presentation by: Ummiya Mohammedi MSc-2Cs 1213163320
  • 2. Data-Visualization tools and techniques offer executives and other knowledge workers new approaches to dramatically improve their ability to grasp information hiding in their data. Data visualization is a general term that describes any effort to help people understand the significance of data by placing it in a visual context. Patterns, trends and correlations that might go undetected in text-based data can be exposed and recognized easier with data visualization software. It isn't just the attraction of the huge range of statistical analyses afforded by R that attracts data people to R. The language has also developed a rich ecosystem of charts, plots and visualizations over the years.
  • 3. ggplot2 is a data visualization package for the statistical programming language R. Created by Hadley Wickham in 2005, ggplot2 is an implementation of Leland Wilkinson's Grammar of Graphics—a general scheme for data visualization which breaks up graphs into semantic components such as scales and layers. ggplot2 can serve as a replacement for the base graphics in R and contains a number of defaults for web and print display of common scales. Since 2005, ggplot2 has grown in use to become one of the most popular R packages. It is licensed under GNU GPL v2. ggplot2
  • 4. Basic Visualization Histogram Bar / Line Chart Box plot Scatter plot Advanced Visualization Heat Map Mosaic Map Map Visualization 3D Graphs Correlogram
  • 5. 1. Histogram Histogram is basically a plot that breaks the data into bins (or breaks) and shows frequency distribution of these bins. You can change the breaks also and see the effect it has data visualization in terms of understandability.
  • 7. 2. Bar/ Line Chart Line Chart Below is the line chart showing the increase in air passengers over given time period. Line Charts are commonly preferred when we are to analyse a trend spread over a time period. Furthermore, line plot is also suitable to plots where we need to compare relative changes in quantities across some variable (like time). Below is the code: plot(AirPassengers,type="l") #Simple Line Plot
  • 9. Bar Chart Bar Plots are suitable for showing comparison between cumulative totals across several groups. Stacked Plots are used for bar plots for various categories. Here’s the code:
  • 11. 3. Box Plot ( including group-by option ) Box Plot shows 5 statistically significant numbers- the minimum, the 25th percentile, the median, the 75th percentile and the maximum. It is thus useful for visualizing the spread of the data is and deriving inferences accordingly. Here’s the basic code:
  • 12. Ingest Data When reading data into R, we generally will use the read.table() or read.csv()function. This opens a file and returns the content of that file. In the above example we store the contents of the file in the variab le bugData. Notice that we use the <- operator in R instead of the = like in most other languages. There are certain parameters that we can pass in to table.read(). Among the most often used of these parameters are: sep, header, row.name, and col.name.
  • 13. R provides the plot function that can be used to create time series charts. We can either pass in a complete data structure like in the example below (if it contains a plotting function), or we can pass in lists to serve as the x- and y- axes of the chart. ?plot
  • 14. 1 plot(Nile, col="blue", bty="7", lwd=2, xlab="", ylab="", main="Flow of the River Nile")
  • 15. R also provides a barplot() function to create bar charts. The barplot function accepts either a matrix or a vector value as the data structure.
  • 17. R provides the hist() function to create histograms. The hist() function accepts a vector of values.
  • 18. Usage ggplot(data = NULL, mapping = aes(), ..., environment = parent.frame()) Arguments Data: Default dataset to use for plot. If not already a data.frame, will be converted to one by fortify. If not specified, must be suppled in each layer added to the plot. mapping Default list of aesthetic mappings to use for plot. If not specified, must be suppled in each layer added to the plot. environment If an variable defined in the aesthetic mapping is not found in the data, ggplot will look for it in this environment. It defaults to using the environment in which ggplot() is called.
  • 19. ggplot() is used to construct the initial plot object, and is almost always followed by + to add component to the plot. There are three common ways to invoke ggplot: ggplot(df, aes(x, y, )) ggplot(df) ggplot() The first method is recommended if all layers use the same data and the same set of aesthetics, although this method can also be used to add a layer using data from another data frame. See the first example below. The second method specifies the default data frame to use for the plot, but no aesthetics are defined up front. This is useful when one data frame is used predominantly as layers are added, but the aesthetics may vary from one layer to another. The third method initializes a skeleton ggplot object which is fleshed out as layers are added. This method is useful when multiple data frames are used to produce different layers, as is often the case in complex graphics.