0% found this document useful (0 votes)
22 views

Data Visualization Using Python and R

The document outlines a course on data visualization for reporting using Python and R, as well as enterprise tools like Looker and Tableau. It includes a release schedule for course materials, examples of effective visualizations, best practices, and resources for further learning. The course aims to equip learners with the skills to create and customize various types of visualizations, including line graphs, histograms, and scatter plots.

Uploaded by

Santhiya S
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views

Data Visualization Using Python and R

The document outlines a course on data visualization for reporting using Python and R, as well as enterprise tools like Looker and Tableau. It includes a release schedule for course materials, examples of effective visualizations, best practices, and resources for further learning. The course aims to equip learners with the skills to create and customize various types of visualizations, including line graphs, histograms, and scatter plots.

Uploaded by

Santhiya S
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Data Management

Data Visualization For Reporting


Topics
The 6th V of Big Data :)

Local tools
● Python: Seaborn, Matplotlib, etc.
● R: ggplot, base R, etc.

Enterprise Tools
● Looker / Tableau / Power BI / etc.
Course Materials Release Schedule
The next three weeks worth of content will be released in two 1.5-week segments
rather than three 1-week segments like we have seen previously.

Content for this 1.5 week session: Visualization for Reporting Using Python and R

Content released in 1.5 weeks: Visualization for Reporting Using Enterprise Tools

Assignment due dates, as usual, will be posted with the individual assignments. The
release schedule above relates to content availability and may not align exactly
with assignment release dates.
Data Management
Visualizations Part 1: Introduction and
Data Visualization For Reporting using Python and R
Overview
https://clauswilke.com/dataviz/index.html

Why do visualizations matter?


Example: Text
Adjusted for inflation, the highest grossing movie of all time (U.S. domestic figures)
is Gone With the Wind, with an adjusted lifetime gross of nearly $1.9 billion. This is
followed by Star Wars Episode IV: A New Hope with an adjusted lifetime gross of
over $1.6 billion. The Sound of Music and E.T. the Extra Terrestrial come in at a little
over $1.3 billion, and rounding out the top five is Titanic at a little under $1.3 billion.
By comparison, the top grossing movie in unadjusted terms, Star Wars Episode VII:
The Force Awakens, comes in at number 11 when adjusting for inflation with just
over $1 billion in inflation-adjusted dollars.
Sources:
https://www.boxofficemojo.com/chart/top_lifetime_gross_adjusted/?adjust_gross_to=2022
https://www.boxofficemojo.com/chart/top_lifetime_gross/
Example: Table

Movie Inflation-adjusted Domestic Gross

Gone With the Wind $1.9 billion

Star Wars Episode IV - A New Hope $1.67 billion

The Sound of Music $1.33 billion

E.T. the Extra Terrestrial $1.33 billion

Titanic $1.27 billion

Star Wars Episode VII - The Force Awakens $1.01 billion


Example: Chart / Visualization
Approach
Must understand the data and what insights you are looking to derive

Use the right visualization for the best explanatory power


Categories
https://clauswilke.com/dataviz/directory-of-visualizations.html
Amounts: bar, point/line, and heatmap
Distributions: histograms, density plots, q-q plots, box
Proportions: pie, bar, stacked bar and density plots, mosaics
Associations: scatter, correlograms,
Time Series: scatter, line,
Trends: smoothing (LOESS, splines), functional forms
Geography: projection, layers, choropleth, cartograms
Uncertainty: frequency framing, shaded densities, error bars, confidence band
Best Practices
https://clauswilke.com/dataviz/proportional-ink.html#visualizations-along-linear-axes (and following)

Bars on linear scale should always start at 0

2D charts: place independent variable on the x-axis, dependent on the y-axis

Overlapping points? Consider partial transparency and/or jitter

Use direct data point labels instead of color coding for more than 8 categories

Avoid large fill areas with highly saturated color

Use monotonic color scales (light to dark - avoid circular scales like rainbow)

Consider using a color-blindness simulator to check visuals for potential issues


How to Approach the Rest of This Module
The content will be a series of notebook demos and YouTube tutorials that go
in-depth on creating visualizations on Python and R dataframes. The expectation will
be that you will know how to build and customize, at a minimum, the following items
on real data in both programming languages:

- Line graphs
- Histograms
- Box plots
- Scatter plots
- Legends, titles, labels

In addition to basic build, you should know how to customize these objects, including
size, color, and placement in a Colab or JupyterLab notebook or .RMD file output
Python
Visualizations
Charts in Colaboratory Notebook Walkthrough
https://colab.research.google.com/notebooks/charts.ipynb
Altair Notebook Walkthrough
https://colab.research.google.com/notebooks/snippets/altair.ipynb
Visualizations in Python: Deep Dive Tutorials
https://www.youtube.com/watch?v=Nt84_TzRkbo

~2.5 hours at 1x speed


R Visualizations
R Basic Visualization Notebook Walkthrough
https://colab.research.google.com/drive/1WrnZbUpHS6cI8fwFyJ2j0Wv2tA88HIY4?usp=sharing
Visualizations in R using ggplot: Deep Dive Tutorials
https://www.youtube.com/playlist?list=PLtL57Fdbwb_C6RS0JtBojTNOMVlgpeJkS

~2.5 hours at 1x speed (you can skip the R programming video)


Optional: Visualizations using Base R
https://www.youtube.com/watch?v=_WyUme_H2ZQ

~30 minutes at 1x speed

You might also like