0% found this document useful (0 votes)
2 views

Data Visualization

This document outlines a course module on Data Analysis and Visualization, focusing on the importance and types of data visualization, as well as best practices for creating effective visualizations. It covers the advantages and disadvantages of various data visualization tools, and emphasizes the need for clear communication of insights to stakeholders. Additionally, it provides an introduction to using Excel for data visualization, including creating and customizing charts, and advanced techniques for data presentation.

Uploaded by

khuddush89
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Data Visualization

This document outlines a course module on Data Analysis and Visualization, focusing on the importance and types of data visualization, as well as best practices for creating effective visualizations. It covers the advantages and disadvantages of various data visualization tools, and emphasizes the need for clear communication of insights to stakeholders. Additionally, it provides an introduction to using Excel for data visualization, including creating and customizing charts, and advanced techniques for data presentation.

Uploaded by

khuddush89
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 83

Course: MsDS

Data Analysis and Visualization

Module: 1
Introduction to Data Visualization

Learning Objective:
After studying this module, students will be able to:

● Understand the definition of data visualization and its importance in


data analysis

● Learn about different types of data visualizations, including their


advantages and best use cases

● Explore the advantages and disadvantages of different visualization


tools and platforms

● Understand best practices for creating effective data visualizations,


including how to choose the right visualization, use clear labels and
colors, provide context, and maintain consistency

● Be able to apply these principles to create effective data visualizations


that communicate insights to stakeholders

Structure
1.1 What is Data Visualization?
1.2 Why is Data Visualization important?
1.3 Types of Data Visualization
1.4 Advantages and disadvantages of Data Visualization tools
1.5 Best practices for creating effective Data Visualizations
1.6 Summary
1.7 Self-Assessment Questions
1.8 References

1.1 What is Data Visualization?


Data visualization is the process of presenting data in a visual or graphical
format that makes it easier to understand and analyze complex
information. By using charts, graphs, maps, and other visual aids, data
visualization allows users to identify patterns, relationships, and trends that
might not be apparent from raw data alone. This process can help decision-
makers better understand the significance of data and make informed
decisions based on data-driven insights.

1.2 Why is Data Visualization important?


Data visualization plays a critical role in data analysis because it helps users
identify insights and trends in large data sets quickly. Here are some key
reasons why data visualization is essential for effective data analysis:
1. Easily Identifies Patterns and Trends: With the help of data
visualization, it's easy to spot patterns, trends, and correlations in
data sets. Instead of looking at raw data, users can visualize the data
in the form of charts and graphs, making it easier to identify patterns
that might not be obvious otherwise.
2. Enables Effective Communication of Insights: Data visualization
enables users to communicate insights and findings to stakeholders
in a clear and concise manner. Charts and graphs are often easier to
understand than a table of numbers, making it easier to present data
and insights to non-technical stakeholders.
3. Identifies Outliers and Anomalies: Data visualization makes it easy
to identify outliers and anomalies that require further investigation.
This can help users identify data quality issues or uncover insights
that might have been missed otherwise.
4. Facilitates Informed Decision-Making: Data visualization helps
decision-makers make informed decisions based on data-driven
insights. By providing clear, concise information about data, data
visualization enables decision-makers to make better-informed
decisions.
5. Improves Data Quality: Data visualization can improve data quality
by identifying errors and inconsistencies. By visualizing data, it's
easier to spot errors and inconsistencies that might not be
immediately apparent in raw data.

1.3 Types of Data Visualization


Bar Charts: Bar charts are used to compare the values of different
categories or groups. They use rectangular bars to represent data values,

with the length of each bar indicating the value of the data.

1. Histograms: Histograms are used to show the frequency distribution of


numerical data. They use bars to represent the number of data points
that fall within each interval or bin.
2. Line Charts: Line charts use lines to connect data points and are used to
show trends over time. They are often used to represent time-
seriesdata and can help identify patterns or trends over a specific period.
3. Pie Charts: Pie charts are used to show the relative proportions of
different categories. They use slices of a circle, with each slice
representing a portion of the whole.

4. Scatter Plots: Scatter plots use dots to represent data points and are
used to show the relationship between two variables. They are often
used to identify correlations or patterns in data.
5. Box Plots: Box plots are used to show the distribution of numerical data.
They use a box to represent the middle 50% of the data and whiskers to
represent the remaining data.

6. Doughnut Charts: Doughnut charts are similar to pie charts but use a
hollow circle instead of a solid one. They are often used to show the
relative proportions of data categories.
7. Pareto Charts: Pareto charts are used to show the relative frequency of
different categories in descending order. They use bars to represent the
frequency of each category and a line to represent the cumulative
percentage.
8. Bullet Charts: Bullet charts are used to show progress toward a goal or
target. They use a bar to represent progress and a line to represent the
target.

9. Gantt Charts: Gantt charts are used to show project timelines and
dependencies. They use bars to represent tasks and arrows to show the
relationship between them.

1.4 Advantages and disadvantages of Data Visualization tools


Data visualization tools are critical in modern data analysis because they
help to make large and complex datasets easier to understand, analyze,
and communicate. However, like any technology, these tools have their
advantages and disadvantages, and it is essential to consider them before
selecting a data visualization tool.

Here are some of the advantages and disadvantages of data visualization


tools:
Advantages:
1. Improved Data Understanding: Data visualization tools help users to
understand and analyze data more efficiently. By converting data
into graphical representations, users can identify patterns, trends,
and relationships that would be difficult to discern in raw data.
2. Improved Decision-making: Data visualization tools help users to
make better decisions by enabling them to identify key insights from
data quickly. These insights can be used to identify opportunities,
mitigate risks, and optimize performance.
3. Enhanced Communication: Data visualization tools enable users to
communicate data insights more effectively. By using graphical
representations, users can communicate complex data sets to a wide
range of audiences, including non-experts.
4. Increased Efficiency: Data visualization tools help users to save time
by streamlining data analysis. Rather than manually sifting through
data sets, users can use these tools to quickly identify relevant
insights.
5. Greater Scalability: Data visualization tools can handle large and
complex data sets, making it easier to analyze data from multiple
sources.

Disadvantages:
1. Misinterpretation: Data visualization tools can be misleading if not
used correctly. Misleading or poorly designed visualizations can lead
to incorrect conclusions, making it essential to ensure that the
visualization accurately represents the data.
2. Biases: Data visualization tools can be biased, either intentionally or
unintentionally. This can lead to incorrect conclusions, making it
essential to ensure that the data is objectively represented.
3. Complexity: Data visualization tools can be complex and require
significant resources to use effectively. Users may require specialized
training to use these tools, and they may need to invest in hardware
and software to handle large and complex data sets.
4. Cost: Data visualization tools can be expensive, making it essential to
consider the cost before selecting a tool. Some data visualization
tools may require additional licenses or subscriptions, making it
important to consider the ongoing cost of using the tool.
5. Data Security: Data visualization tools may require users to upload
data to a cloud-based server, making it essential to consider data
security before selecting a tool. Users may need to ensure that the
data is adequately protected and that the tool complies with data
privacy regulations.

1.5 Best practices for creating effective Data Visualizations


Creating effective data visualizations is essential to ensure that the data is
accurately represented and that the insights gained from the data are
easily understood. Here are some best practices for creating effective data
visualizations:

● Know your Audience: It is essential to understand the audience before


creating a visualization. The visualization should be tailored to the
audience's knowledge level, interests, and goals.

● Use the Right Visualization: The visualization should be chosen based on


the type of data being presented. The visualization should accurately
represent the data and help to highlight the key insights.
● Keep it Simple: The visualization should be simple and easy to
understand. Cluttered or complex visualizations can be difficult to
interpret and may lead to incorrect conclusions.

● Use Clear Labels: The labels used in the visualization should be clear and
concise. The labels should help to explain the data being presented and
make it easier to interpret the visualization.

● Use Colors Effectively: Colors can be used to highlight specific data


points or to group related data. However, it is essential to use colors
effectively and ensure that they do not distract from the data being
presented.

● Provide Context: The visualization should provide context to the data


being presented. This can be done by including background information,
annotations, or additional data.

● Test and Iterate: The visualization should be tested with the target
audience to ensure that it effectively communicates the data insights.
Feedback should be used to improve the visualization and iterate until
the final product is effective.

● Choose the Right Tools: The tools used to create the visualization should
be chosen based on the data being presented and the goals of the
visualization. The tools should be easy to use and provide the necessary
features to create effective visualizations.

● Maintain Consistency: Consistency is essential when creating multiple


visualizations. The visualizations should use the same style, colors, and
labels to ensure that the audience can easily compare and understand
the data.

● Keep Data Secure: The data used in the visualization should be secure
and comply with data privacy regulations. The visualization should not
include sensitive or personal data that could be used to identify
individuals.

Some common pitfalls to avoid when presenting data visually include:


1. Misrepresenting the data: Make sure that your visualization accurately
represents the data and avoids distorting or misrepresenting it.
2. Using unclear labels or colors: Use clear labels and colors that make it
easy for stakeholders to understand the data and identify patterns.
3. Using misleading or biased visuals: Avoid using visuals that are
intentionally or unintentionally misleading, such as by altering the scale
or using inappropriate chart types.
4. Failing to provide context: Without proper context, data can be difficult
to interpret and may not provide meaningful insights. Always provide
context around the data you are presenting.
5. Not updating the data: Data can quickly become outdated, so it's
important to update your visualizations regularly to ensure that they are
still relevant and accurate.

1.6 Summary

● Data visualization is the graphical representation of data and


information.

● The importance of data visualization in data analysis is to help


understand complex data, discover patterns and trends, and
communicate insights to stakeholders.

● There are many types of data visualizations, including bar charts,


histograms, line charts, pie charts, scatter plots, box plots, and others,
each with their own advantages and best use cases.
● Data visualization tools have advantages such as helping to identify
patterns and trends in data, but also have disadvantages such as
potential for data distortion and over-reliance on the tool instead of
critical thinking.

● Best practices for data visualization include knowing your audience,


using the right visualization, keeping it simple, using clear labels and
colors, providing context, testing and iterating, choosing the right tools,
maintaining consistency, and keeping data secure.

1.7 Self-Assessment Questions


1. What is data visualization, and why is it important in data analysis?
2. What are some common types of data visualizations, and in what
situations are they most effective?
3. What are some advantages and disadvantages of using data
visualization tools?
4. How can data visualization be used to communicate insights to
stakeholders, and what are some common pitfalls to avoid when
presenting data visually?
5. How can you use data visualization to communicate insights to different
types of stakeholders, such as executives, technical experts, or general
audiences?

1.8 References

● "The Visual Display of Quantitative Information" by Edward Tufte

● "Information Dashboard Design: Displaying Data for At-a-Glance


Monitoring" by Stephen Few

● "Data Visualization: A Practical Introduction" by Kieran Healy


Course: MsDS

Data Analysis and Visualization

Module: 2
Data Visualization with Excel

Learning Objective:
After studying this module, students will be able to:

● The basics of working with Microsoft Excel, including how to input and
manipulate data, as well as some basic data analysis techniques.

● How to create and format different types of charts and visualizations in


Excel, including bar charts, line charts, pie charts, scatter plots, and box
plots.

● Advanced data visualization techniques in Excel, such as conditional


formatting, sparklines, and data bars.

● How to create dashboards and reports in Excel, including how to link


charts and tables together and create interactive features.

● Best practices for data visualization and presentation, including how to


choose the right chart type, format and style charts, and present data
effectively to different audiences.

Structure
2.1 Introduction to Excel for Data Visualization
2.2 Creating basic charts (bar charts, line charts, pie charts, etc.) in Excel
2.3 Advanced charts (scatter plots, box plots, pareto charts, etc.) in Excel
2.4 Customizing charts in Excel (colors, fonts, labels, etc.)
2.5 Advanced Data Visualization Techniques
2.6 Dashboards and Reports in Excel
2.7 Summary
2.8 Self-Assessment Questions
2.9 References
2.1 Introduction to Excel for Data Visualization
Excel is a powerful tool for data visualization that allows you to analyze and
present data in a variety of formats. With its user-friendly interface and a
wide range of features, Excel is a popular choice for data visualization
across various industries. Here are some key points about Excel for data
visualization:
1. Data input: Excel allows you to input data in various formats,
including CSV, text files, and databases. You can also manually input
data into Excel spreadsheets. Excel has a range of data input features
such as the "Text to Columns" feature, which allows you to split data
in a single column into multiple columns based on a delimiter.
2. Data manipulation: Excel offers a range of tools for data
manipulation, including sorting and filtering data, conditional
formatting, and data validation. These tools help to ensure data
accuracy and make it easier to identify patterns and trends in your
data.
3. Formulas and functions: Excel has a vast library of built-in formulas
and functions that allow you to perform complex calculations on
your data. These include basic math functions, financial functions,
statistical functions, and more. Excel also allows you to create
custom formulas using the built-in formula editor.
4. Charts and graphs: Excel offers a range of chart and graph types that
allow you to visualize your data in a variety of ways. These include
bar charts, line charts, scatter plots, pie charts, and more. Excel
charts and graphs can be customized with different colors, fonts, and
styles to create a visually appealing presentation.
5. Pivot tables: Pivot tables are a powerful tool in Excel for summarizing
and analyzing large sets of data. With pivot tables, you can quickly
and easily create tables that show the relationship between different
variables in your data. You can also use pivot charts to visualize the
data in your pivot table.
6. Data analysis: Excel offers a range of tools for data analysis, including
statistical functions, regression analysis, and what-if analysis. These
tools help to identify patterns and trends in your data and make it
easier to make informed decisions based on that data.
Overall, Excel is a versatile tool for data visualization that offers a range of
features for data input, manipulation, analysis, and presentation. By
mastering these features, you can create compelling visualizations that
help to communicate insights from your data to others.

2.2 Creating basic charts (bar charts, line charts, pie charts, etc.) in Excel
Excel offers a range of chart types that allow you to visualize your data in
different ways. Here is a detailed explanation of some of the basic chart
types available in Excel and how to create them, along with an example in
tabular format:

1. Bar chart: A bar chart is used to compare data across different


categories. It is created by plotting vertical bars of equal width and
varying height based on the data values. Here is an example of a bar
chart:

Month Sales
Jan 500
Feb 750
Mar 900
Apr 600
May 800
Jun 700
2. Line chart: A line chart is used to show trends in data over time or
across categories. It is created by plotting a line connecting data points
on a horizontal axis. Here is an example of a line chart:
Year Sales
2015 500
2016 750
2017 900
2018 600
2019 800
2020 700
3. Pie chart: A pie chart is used to show the proportion of data in
different categories. It is created by dividing a circle into sectors that
represent the proportion of each category. Here is an example of a
pie chart:

Category Sales
A 500
B 750
C 900
D 600
E 800
F 700
Overall, creating basic charts in Excel is a straightforward process that
involves selecting the data range and chart type, customizing the chart as
desired, and presenting the chart to communicate insights from the data.

2.3 Advanced charts (scatter plots, box plots, pareto charts, etc.) in Excel
Excel offers several advanced chart types that can help you analyze and
communicate complex data sets. Here is an explanation of some of the
commonly used advanced charts in Excel along with an example in tabular
format:

1. Scatter plot: A scatter plot is used to show the relationship between


two variables. It is created by plotting data points on a coordinate grid
where one variable is represented on the horizontal axis and the other
variable is represented on the vertical axis. Here is an example of a
scatter plot:

X Y
1 3
2 4
3 7
4 8
5 10

2. Box plot: A box plot, also known as a box and whisker plot, is used to
show the distribution of data in a set. It is created by drawing a box
from the first quartile to the third quartile, with a line inside the box
representing the median. Whiskers extend from the box to show the
range of the data. Here is an example of a box plot:

Data
2
4
5
7
8
10
12
14
3. Pareto chart: A Pareto chart is used to show the relative frequency of
different categories in a set. It is created by plotting bars representing
the frequency of each category in descending order, and a line
representing the cumulative frequency. Here is an example of a Pareto
chart:

Defect Frequency
Scratches 120
Broken parts 70
Incorrect labeling 50
Missing parts 30
Wrong color 20
Total 290
Overall, creating advanced charts in Excel requires a deeper understanding
of the data and the best way to present it visually. By using the right chart
type, customizing it appropriately, and presenting it effectively, you can
create powerful data visualizations that help communicate insights from
the data.

2.4 Customizing charts in Excel (colors, fonts, labels, etc.)


Customizing charts in Excel is a great way to make your data visualizations
more engaging and informative. Here are the steps you can follow to
customize charts in Excel:
1. Select the chart you want to customize by clicking on it.
2. Go to the "Chart Tools" section of the ribbon, which will appear
when the chart is selected.
3. Use the "Design", "Layout", and "Format" tabs to customize the
chart as needed.
4. In the "Design" tab, you can choose from pre-set chart styles and
tweak chart elements such as titles and data labels.
5. In the "Layout" tab, you can add chart elements such as axis titles,
gridlines, and legends, as well as adjust the size and position of the
chart.
6. In the "Format" tab, you can adjust the colors, fonts, and shapes
used in the chart.
7. To change the colors, click on the chart element you want to change,
such as a data series or chart area, then select the "Format" tab and
choose a new color scheme from the "Shape Styles" or "Chart Styles"
sections.
8. To change the font, select the chart element you want to change,
then go to the "Home" tab and use the "Font" section to choose a
new font face, size, or color.
9. To add or edit labels, right-click on the chart element you want to
label, such as a data point or axis label, then choose "Add Data
Labels" or "Format Axis" to customize the label text, position, and
formatting.
By following these steps, you can customize your charts in Excel to make
them more visually appealing and effective for communicating your data
insights. Remember to keep your audience in mind when making design
choices, and use clear and concise labels to help them understand the data
being presented.

2.5 Advanced Data Visualization Techniques


Excel provides several advanced data visualization techniques that can help
you analyze and present complex data sets in a more effective and
impactful way. Some of these techniques include:
1. Sparklines: Sparklines are small, simple charts that can be embedded
within a cell to show trends and patterns in your data. Excel provides
three types of sparklines: line, column, and win/loss.
2. Heat maps: Heat maps are graphical representations of data that use
color coding to indicate the relative values of each data point. They are
especially useful for visualizing large data sets with many variables.
3. Treemap charts: Treemap charts are hierarchical visualizations that use
nested rectangles to show the relative sizes of different data points.
They are particularly useful for visualizing the distribution of large data
sets across different categories or groups.
4. Waterfall charts: Waterfall charts are used to show how an initial value
is affected by a series of positive and negative changes, leading to a final
value. They are often used in financial analysis to show how a
company's profits or losses change over time.
Example:
To create a waterfall chart for the given data, follow these steps:
1. Select the data range including the labels.
2. Go to the Insert tab in the Ribbon.
3. Click on the Waterfall chart type and select the default waterfall
chart.
4. The chart will be inserted in the worksheet.
Now, we need to make some adjustments to the chart to better display
the data.
5. Double-click on the "Invisible" bar, which represents the Carryover
Balance. This will open the Format Data Point pane on the right.
6. In the Fill section, select "No fill" to make the bar invisible.
7. In the Border section, select "No line" to remove the border around
the bar.
8. Right-click on the "Increase" bars and select "Format Data Series".
9. In the Fill section, select green as the color for the "Increase" bars.
10.Right-click on the "Decrease" bars and select "Format Data Series".
11.In the Fill section, select red as the color for the "Decrease" bars.
12.Right-click on the "Net Cash Flow" bars and select "Format Data
Series".
13. In the Fill section, select blue as the color for the "Net Cash Flow"
bars.
Now, we will adjust the axis scale to better display the data.
14. Right-click on the vertical axis and select "Format Axis".
15. In the Axis Options section, set the Minimum Bounds to 0 and the
Maximum Bounds to 120000.
Finally, we will make some cosmetic adjustments to the chart.
16. Right-click on the legend and select "Delete" to remove it from the
chart.
17. Right-click on the horizontal gridlines and select "Delete" to remove
them from the chart.
18. Click on the chart title and change it to "Cash Flow Waterfall Chart".

Period Invisible Increase Decrease Net Cash Labels


Flow
Carryover $1,00,00 $0 $1,00,000
Balance 0
Q1 FY2018 $75,000 $0 $25,000 -$25,000 $25,000 (25% ↓)
Q2 FY2018 $75,000 $10,000 $0 $10,000 $10,000 (12% ↑)
Q3 FY2018 $85,000 $14,000 $0 $14,000 $14,000 (14% ↑)
Q4 FY2018 $84,000 $0 $15,000 -$15,000 $15,000 (15% ↓)
Q1 FY2019 $79,000 $0 $5,000 -$5,000 $5,000 (6% ↓)
Q2 FY2019 $79,000 $7,000 $0 $7,000 $7,000 (8% ↑)
Q3 FY2019 $86,000 $8,500 $0 $8,500 $8,500 (9% ↑)
Q4 FY2019 $84,500 $0 $10,000 -$10,000 $10,000 (11% ↓)
Q1 FY2020 $68,500 $0 $16,000 -$16,000 $16,000 (19% ↓)
Q2 FY2020 $68,500 $10,000 $0 $10,000 $10,000 (13% ↑)
Current $78,500 $0 $78,500
Balance

5. 3D charts: 3D charts add an extra dimension of depth to your data


visualizations, making them more visually appealing and easier to
understand. However, they should be used sparingly and only when
necessary, as they can sometimes make it harder to read and interpret
your data.
6. Conditional formatting: Conditional formatting is a powerful tool that
allows you to apply different formatting styles to your data based on
predefined rules. For example, you could use conditional formatting to
highlight cells that meet a certain threshold or contain specific text.
By using these advanced data visualization techniques in Excel, you can
create more engaging and informative data visualizations that help you
make more informed decisions and communicate your insights more
effectively. However, it's important to use these techniques judiciously and
only when they are appropriate for your data and audience.
2.6 Dashboards and Reports in Excel
Dashboards and reports in Excel allow you to present data in a visually
appealing and organized way. Here are some key components of a
dashboard or report in Excel:
1. Data sources: A dashboard or report in Excel typically pulls data from
one or more sources such as Excel spreadsheets, databases, or
external APIs.
2. Key performance indicators (KPIs): KPIs are metrics that help you
track progress towards your goals. These can be displayed in a
dashboard or report as charts or tables.
3. Visualizations: Charts, tables, and other visualizations can help you
present data in an easy-to-understand way. Excel offers a variety of
chart types such as bar charts, line charts, pie charts, and more.
4. Filters and slicers: Filters and slicers allow you to interact with your
data and focus on specific subsets of information. For example, you
could use a filter to show data for a specific time period or
geographic region.
5. Navigation and interactivity: Dashboards and reports should be easy

to navigate and provide interactive features such as drop-down


menus or buttons that allow users to explore the data.
6. Design elements: The design of a dashboard or report should be
visually appealing and consistent. This includes elements such as
colors, fonts, and layout.
Here is an example of a dashboard in Excel:
This dashboard pulls data from an Excel spreadsheet and displays KPIs and
visualizations related to sales and revenue. It includes a filter to allow users
to focus on data for a specific time period, as well as interactive charts that
allow users to drill down into the data. The design is consistent and visually
appealing, with a color scheme that matches the company's branding.
Creating a dashboard or report in Excel requires careful planning and
design. By incorporating key components such as data sources, KPIs,
visualizations, filters, navigation, and design elements, you can create a
dashboard or report that effectively communicates important information
to your audience.

2.7 Summary

● Excel is a powerful tool for data visualization and analysis, with a variety
of chart types and customization options available.

● Basic data analysis in Excel involves organizing and summarizing data


using tools such as filters, sorting, and conditional formatting.

● Common chart types in Excel include bar charts, line charts, pie charts,
scatter plots, and box plots.

● Advanced chart types in Excel include waterfall charts, treemaps, and


heatmaps.

● Customizing charts in Excel involves changing colors, fonts, labels, and


other design elements to create a visually appealing and effective
visualization.

● Excel pivot tables and pivot charts allow for more advanced data
analysis and exploration.

● Dashboards and reports in Excel involve combining data sources, KPIs,


visualizations, filters, navigation, and design elements to create a
comprehensive and visually appealing display of data.
2.8 Self-Assessment Questions
1. What is Excel used for, and why is it a useful tool for data analysis?
2. How can you use Excel to create a bar chart, and what type of data is
best suited for this type of chart?
3. What are some common visualizations in Excel, and when should you
use each type of visualization?
4. How do you create a pivot table in Excel, and what are some benefits of
using pivot tables for data analysis?
5. What are some advanced data visualization techniques in Excel, and
how can you use them to create more complex and informative
visualizations?

2.9 References

● "Excel Charts for Dummies" by Ken Bluttman

● "Effective Data Visualization: The Right Chart for the Right Data" by
Stephanie Evergreen

● "Excel Data Analysis: Modeling and Simulation" by Hector Guerrero


Course: MsDS

Data Analysis and Visualization

Module: 3
Data Visualization with Python
Learning Objective:
After studying this module, students will be able to:
● Understand the basics of Python programming language and data
analysis in Python
● Create basic data visualizations such as bar charts, line charts, pie charts,
scatter plots, and box plots using Python libraries
● Explore and work with popular Python data visualization libraries such
as Matplotlib, Seaborn, and Plotly
● Learn advanced data visualization techniques such as interactive
visualizations, and network graphs.
Structure
3.1 Introduction to Python
3.2 Basic Data Analysis in Python
3.3 Creating Bar Charts, Line Charts, Pie Charts, Scatter Plots, Box Plots, and
Other Common Visualizations in Python
3.4 Working with Python Data Visualization Libraries - Matplotlib, Seaborn,
Plotly
3.5 Advanced Data Visualization Techniques in Python
3.6 Dashboards and Reports in Python
3.7 Summary
3.8 Self-Assessment Questions
3.9 References

3.1 Introduction to Python


Python is a high-level, interpreted programming language that is used for a
wide range of tasks, including web development, data analysis, machine
learning, and more. It was created in the late 1980s by Guido van Rossum,
and has since become one of the most popular programming languages in
the world.
Some key features of Python include:
1. Easy to learn: Python has a relatively simple syntax that is easy to
read and understand, making it a good language for beginners.
2. Interpreted: Python code is executed line by line by an interpreter,
rather than being compiled like some other languages. This can make
it faster to develop and test code.
3. Multi-paradigm: Python supports both procedural programming and
object-oriented programming, as well as functional programming
concepts.
4. Large community and ecosystem: Python has a large and active
community of developers, and there are many libraries and
frameworks available for various tasks, such as data analysis, web
development, and machine learning.
Overall, Python is a versatile language that can be used for many different
tasks. Its ease of use, large community, and rich ecosystem make it a
popular choice for developers and data scientists alike.

3.2 Basic Data Analysis in Python


Basic Data Analysis in Python involves using Python's built-in data
structures and functions to perform exploratory data analysis (EDA) on a
dataset. Here are some of the key concepts and techniques involved in
basic data analysis in Python:
1. Importing data: Before you can analyze data in Python, you need to
import it into your Python environment. This can be done using
functions like pandas.read_csv() or pandas.read_excel().
2. Exploring the data: Once you have imported your data, you should
explore it to gain an understanding of its structure, features, and
values. Some basic techniques for exploring data include looking at
summary statistics (e.g. mean, median, standard deviation),
frequency tables, and histograms.
3. Cleaning the data: Data is rarely perfect, and you will likely need to
clean and preprocess it before you can perform any meaningful
analysis. This can involve tasks like removing missing or invalid values,
transforming variables, or combining datasets.
4. Manipulating data: Once your data is cleaned, you can start
manipulating it to answer specific research questions. This might
involve tasks like filtering rows, selecting columns, or creating new
variables.
5. Visualizing data: Data visualization is an important part of
exploratory data analysis, as it can help you identify patterns, trends,
and outliers in your data. Python has several powerful visualization
libraries, such as Matplotlib and Seaborn, that you can use to create
various types of plots and charts.
6. Basic statistical analysis: Python has many libraries for performing
statistical analysis, including NumPy and SciPy. With these libraries,
you can perform basic statistical tests (e.g. t-tests, ANOVA) and
calculate descriptive statistics (e.g. correlation coefficients,
regression models).
Overall, basic data analysis in Python involves a combination of data import,
exploration, cleaning, manipulation, visualization, and statistical analysis.
These techniques can be applied to a wide variety of datasets, making
Python a powerful tool for data analysis.

3.3 Creating Bar Charts, Line Charts, Pie Charts, Scatter Plots, Box Plots,
and Other Common Visualizations in Python
here's an overview of some common visualizations in Python and how to
create them:
1. Bar Charts:
Bar charts are a popular way to visualize categorical data. In Python, bar
charts can be created using the Matplotlib or Seaborn library. Here's an
example:

import matplotlib.pyplot as plt


# Create some data
x = ['A', 'B', 'C', 'D']
y = [10, 24, 12, 20]
# Create a bar chart
plt.bar(x, y)
# Add some labels and a title
plt.xlabel('Categories')
plt.ylabel('Values')
plt.title('Bar Chart Example')
# Show the plot
plt.show()

This code creates a bar chart with four categories on the x-axis and their
corresponding values on the y-axis.
2. Line Charts:
Line charts are used to show trends or patterns in data over time or
another continuous variable. In Python, line charts can be created using
Matplotlib or Seaborn. Here's an example:
import matplotlib.pyplot as plt
# Create some data
x = [1, 2, 3, 4, 5]
y = [10, 8, 12, 6, 14]
# Create a line chart
plt.plot(x, y)
# Add some labels and a title
plt.xlabel('Time')
plt.ylabel('Values')
plt.title('Line Chart Example')
# Show the plot
plt.show()

This code creates a line chart with values on the y-axis plotted against time
on the x-axis.

3. Pie Charts:
Pie charts are a useful way to show the proportion of each category in a
dataset. In Python, pie charts can be created using Matplotlib or Seaborn.
Here's an example:
import matplotlib.pyplot as plt
# Create some data
sizes = [30, 20, 15, 10, 25]
labels = ['Category A', 'Category B', 'Category C', 'Category D', 'Category E']
# Create a pie chart
plt.pie(sizes, labels=labels, autopct='%1.1f%%')
# Add a title
plt.title('Pie Chart Example')
# Show the plot
plt.show()

This code creates a pie chart with five categories and their corresponding
proportions.

4. Scatter Plots:
Scatter plots are used to show the relationship between two continuous
variables. In Python, scatter plots can be created using Matplotlib or
Seaborn. Here's an example:
import matplotlib.pyplot as plt
# Create some data
x = [1, 2, 3, 4, 5]
y = [10, 8, 12, 6, 14]
# Create a scatter plot
plt.scatter(x, y)
# Add some labels and a title
plt.xlabel('Variable 1')
plt.ylabel('Variable 2')
plt.title('Scatter Plot Example')
# Show the plot
plt.show()

This code creates a scatter plot with one variable on the x-axis and another
on the y-axis.
5. Box Plots:
Box plots are used to show the distribution of a dataset and any outliers. In
Python, box plots can be created using Matplotlib or Seaborn. Here's an
example:
import matplotlib.pyplot as plt
# Create some data
data = [10, 8, 12, 6, 14]
# Create a box plot
plt.boxplot(data)
# Add a title
plt.title('Box Plot Example')
# Show the plot
plt.show()

This code creates a box plot with the data values shown as a box with
whiskers indicating the range of the data and any outliers as dots outside of
the whiskers.

6. Other Common Visualizations:


There are many other common visualizations that can be created using
Python. Some examples include:
⮚ Histograms: used to show the distribution of a continuous variable
here's an example of how to create a histogram in Python using the
Matplotlib library:
import matplotlib.pyplot as plt
import numpy as np
# Create some data
data = np.random.normal(0, 1, 1000)
# Create a histogram
plt.hist(data, bins=30, density=True, alpha=0.5, color='blue')
# Add some labels and a title
plt.xlabel('Value')
plt.ylabel('Density')
plt.title('Histogram Example')
# Show the plot
plt.show()
This code creates a histogram that shows the distribution of a continuous
variable (in this case, data generated from a normal distribution). The
Matplotlib hist function creates a histogram with the specified number of
bins, and the density parameter is set to True to normalize the histogram
so that it represents the probability density rather than the frequency of
each bin. The alpha parameter controls the transparency of the bars, and
the color parameter sets the color of the bars.

⮚ Heatmaps: used to show the relationship between two categorical


variables
here's an example of how to create a heatmap in Python using the
Seaborn library:
import seaborn as sns
import pandas as pd
# Create some data
data = pd.DataFrame({'Category 1': ['A', 'B', 'C', 'D', 'E'],
'Category 2': ['F', 'G', 'H', 'I', 'J'],
'Values': [10, 20, 30, 40, 50]})
# Reshape the data into a pivot table
data_pivot = data.pivot(index='Category 1', columns='Category 2',
values='Values')
# Create a heatmap
sns.heatmap(data_pivot)
# Add some labels and a title
plt.xlabel('Category 2')
plt.ylabel('Category 1')
plt.title('Heatmap Example')
# Show the plot
plt.show()

This code creates a heatmap that shows the relationship between two
categorical variables (Category 1 and Category 2) and the corresponding
values. The Seaborn heatmap function automatically creates a color-coded
visualization of the data, with darker colors representing higher values and
lighter colors representing lower values. The pivot function is used to
reshape the data into a format that is compatible with the heatmap
function.

⮚ Area charts: used to show the trend of a continuous variable over time
or another continuous variable
here's an example of how to create an area chart in Python using the
Matplotlib library:
import matplotlib.pyplot as plt
import numpy as np
# Create some data
t = np.arange(0.0, 2.0, 0.01)
s1 = np.sin(2*np.pi*t)
s2 = np.exp(-t)
s3 = np.sin(4*np.pi*t)
# Create an area chart
plt.fill_between(t, 0, s1, alpha=0.3, label='sine')
plt.fill_between(t, s1, s1+s2, alpha=0.3, label='exponential')
plt.fill_between(t, s1+s2, s1+s2+s3, alpha=0.3, label='sine 4x')
plt.legend()
# Add some labels and a title
plt.xlabel('Time (s)')
plt.ylabel('Amplitude')
plt.title('Area Chart Example')
# Show the plot
plt.show()

This code creates an area chart that shows the trend of a continuous
variable (in this case, sine, exponential, and sine 4x functions) over time.
The Matplotlib fill_between function is used to create the area chart, with
the alpha parameter controlling the transparency of the fill. The label
parameter is used to specify the label for each area chart, and the legend
function is used to display the legend.

These visualizations can be created using Matplotlib or Seaborn, as well as


other Python visualization libraries like Plotly.
In summary, Python provides a wide range of tools for creating
visualizations for different types of data. Whether you want to show trends,
compare categories, or visualize the distribution of data, there's a
visualization technique that can help.
3.4 Working with Python Data Visualization Libraries - Matplotlib,
Seaborn, Plotly
Matplotlib, Seaborn, and Plotly are popular Python data visualization
libraries that can be used to create a variety of charts, plots, and
visualizations. Here's an overview of each library and its features:
1. Matplotlib:
● Matplotlib is a widely used data visualization library for Python
that provides a variety of chart types, including line charts,
scatter plots, bar charts, histograms, and more.
● It provides a high level of customization and control over plot
elements such as colors, labels, titles, legends, and axes.
● Matplotlib can be used to create static and interactive
visualizations, and supports a variety of file formats such as
PNG, PDF, and SVG.
● While Matplotlib can be a bit verbose and require a lot of code
to create complex visualizations, it provides a lot of flexibility
and is often used as a building block for other data
visualization libraries.
2. Seaborn:
● Seaborn is a Python data visualization library based on
Matplotlib that provides a higher-level interface for creating
attractive and informative statistical graphics.
● Seaborn provides a variety of chart types such as scatter plots,
line plots, bar plots, and heat maps, and includes built-in
support for statistical visualizations such as regression plots,
distribution plots, and box plots.
● Seaborn also includes features for color palettes, style themes,
and grid layouts that make it easy to create professional-
looking visualizations with minimal code.
● While Seaborn may not provide as much flexibility as
Matplotlib, it is often easier to use and can create attractive
visualizations quickly.
3. Plotly:
● Plotly is a Python data visualization library that provides
interactive, web-based visualizations for data exploration and
analysis.
● Plotly provides a variety of chart types such as scatter plots,
line charts, bar charts, and 3D charts, and includes built-in
support for animations, sliders, and hover effects that make it
easy to explore complex data.
● Plotly can be used to create static and interactive
visualizations, and supports a variety of file formats such as
PNG, PDF, and HTML.
● While Plotly can be a bit more difficult to use than Matplotlib
or Seaborn, it provides a lot of flexibility and is well-suited for
creating interactive visualizations for web applications and
dashboards.
3.5 Advanced Data Visualization Techniques in Python
Advanced data visualization techniques in Python can help you to
communicate insights and patterns in your data more effectively. Here are
some advanced techniques and examples of how to use them in Python:

1. Multi-Panel Visualizations: Multi-panel visualizations are useful when


you want to compare different visualizations side by side. In Python, you
can create multi-panel visualizations using subplots in Matplotlib or
Facet Grids in Seaborn. For example, you could create a 2x2 grid of
scatter plots to compare relationships between different variables.

here's an example code for creating a 2x2 grid of scatter plots using
Matplotlib:
import numpy as np
import matplotlib.pyplot as plt
# Create some sample data
x = np.random.normal(0, 1, size=100)
y1 = x + np.random.normal(0, 0.5, size=100)
y2 = -x + np.random.normal(0, 0.5, size=100)
y3 = np.random.normal(0, 1, size=100)
y4 = np.random.normal(0, 0.5, size=100)
# Create a figure with a 2x2 grid of subplots
fig, axs = plt.subplots(nrows=2, ncols=2, figsize=(8, 8))
# Plot the data in each subplot
axs[0, 0].scatter(x, y1)
axs[0, 0].set_title('Plot 1')
axs[0, 1].scatter(x, y2)
axs[0, 1].set_title('Plot 2')
axs[1, 0].scatter(x, y3)
axs[1, 0].set_title('Plot 3')
axs[1, 1].scatter(x, y4)
axs[1, 1].set_title('Plot 4')
# Add a title for the entire figure
fig.suptitle('Comparison of Different Plots')
# Display the plot
plt.show()

In this example, we create a 2x2 grid of subplots using the subplots


function in Matplotlib. We then plot some sample data in each subplot
using the scatter function. Finally, we add a title for the entire figure using
the suptitle function and display the plot using the show function.

2. Interactive Visualizations: Interactive visualizations allow the user to


explore the data in real-time and can be more engaging than static
visualizations. Plotly is a Python library that allows you to create
interactive visualizations such as interactive maps, sliders, and hover
effects. For example, you could create an interactive scatter plot with
tooltips that display additional information about each data point when
the user hovers over it.
here's an example code for an interactive scatter plot using Plotly:
import plotly.graph_objects as go
import pandas as pd
# create some sample data
df = pd.DataFrame({
'x': [1, 2, 3, 4, 5],
'y': [2, 1, 3, 2, 4],
'text': ['A', 'B', 'C', 'D', 'E'],
'color': ['red', 'blue', 'green', 'yellow', 'purple']
})
# create the scatter plot
fig = go.Figure(data=go.Scatter(
x=df['x'],
y=df['y'],
mode='markers',
marker=dict(
color=df['color'],
size=10
),
text=df['text']
))
# add tooltips to display additional information
fig.update_traces(hovertemplate='Text: %{text}<br>X: %{x}<br>Y: %{y}<br>
Color: %{marker.color}')
# add axis labels and title
fig.update_layout(
xaxis_title='X Axis',
yaxis_title='Y Axis',
title='Interactive Scatter Plot with Tooltips'
)
# display the plot
fig.show()

This code creates a scatter plot with five data points, each with an x and y
coordinate, a label (text), and a color. The hovertemplate parameter is
used to define what information is displayed in the tooltip when the user
hovers over a data point. Finally, the update_layout method is used to add
axis labels and a title to the plot, and the show method is used to display
the plot.

3. Network Visualizations: Network visualizations allow you to visualize


relationships between entities in a network, such as social networks or
transportation networks. Python has several libraries that allow you to
create network visualizations, including NetworkX and Plotly. For
example, you could create a network visualization of a social network to
visualize connections between different people.

Sure, here's an example of creating a network visualization using


NetworkX in Python:
import networkx as nx
import matplotlib.pyplot as plt
# Create graph
G = nx.Graph()
G.add_edge('Alice', 'Bob')
G.add_edge('Bob', 'Charlie')
G.add_edge('Charlie', 'David')
G.add_edge('David', 'Eve')
G.add_edge('Eve', 'Frank')
# Draw graph
pos = nx.spring_layout(G)
nx.draw_networkx(G, pos, with_labels=True)
plt.show()

In this example, we first create a Graph object using NetworkX and add
edges between nodes representing people in a social network. We then
use the spring_layout function to compute the positions of the nodes and
use the draw_networkx function to draw the graph with labels. Finally, we
display the graph using plt.show().
This will create a simple network visualization where each node represents
a person and the edges represent connections between them. You can
customize the appearance of the nodes and edges using various
parameters in NetworkX.

4. 3D Visualizations: 3D visualizations allow you to visualize data in three


dimensions and can be useful for exploring complex relationships and
patterns. Python has several libraries that allow you to create 3D
visualizations, including Matplotlib, Plotly, and Mayavi. For example,
you could create a 3D scatter plot to visualize the relationship between
three different variables.

here's an example of creating a 3D scatter plot using Matplotlib in Python:


import matplotlib.pyplot as plt
import numpy as np
# Generate random data
x = np.random.normal(size=500)
y = np.random.normal(size=500)
z = x + y + np.random.normal(size=500)
# Create 3D scatter plot
fig = plt.figure()
ax = fig.add_subplot(projection='3d')
ax.scatter(x, y, z, c=z, cmap='viridis')
ax.set_xlabel('X')
ax.set_ylabel('Y')
ax.set_zlabel('Z')
plt.show()
In this example, we first generate some random data for the variables x, y,
and z. We then create a 3D scatter plot using the scatter method of the
Axes3D object in Matplotlib. We set the c parameter to z to color the
points based on the value of z, and we use the 'viridis' colormap to map
values to colors. Finally, we set the labels for the x, y, and z axes and
display the plot using plt.show().

Overall, these advanced data visualization techniques in Python can help


you to create more effective and engaging visualizations that allow you to
explore and communicate patterns and insights in your data. The choice of
technique will depend on the specific data and the insights you want to
communicate.

3.6 Dashboards and Reports in Python


Dashboards and reports are essential tools for data analysis and
visualization in various fields, including business, healthcare, and finance.
They provide a way to present complex data in an organized and easily
understandable manner. Python offers several libraries for creating
dashboards and reports, including Plotly Dash, Streamlit, and Flask.
Here are some key points and concepts related to creating dashboards and
reports in Python:
1. Dashboard Layout: The layout is one of the critical components of a
dashboard. It defines how the data will be displayed and organized
on the screen. A typical dashboard may contain several widgets or
components, including charts, tables, and dropdown menus.
2. Interactivity: Dashboards should be interactive and responsive,
allowing users to explore the data and change parameters
dynamically. Python libraries such as Plotly Dash and Streamlit
provide easy-to-use tools for creating interactive dashboards.
3. Data Sources: Dashboards typically rely on data from various sources,
including databases, APIs, and flat files. Python offers several
libraries for working with data, such as Pandas for data manipulation
and analysis, and SQLAlchemy for database access.
4. Data Visualization: Data visualization is an essential component of a
dashboard or report. Python libraries such as Matplotlib, Seaborn,
Plotly, and Bokeh provide powerful tools for creating interactive and
informative visualizations.
5. Deployment: Once the dashboard is created, it needs to be deployed
to a web server or cloud platform so that it can be accessed by users.
Python libraries such as Flask, Django, and Heroku provide easy-to-
use tools for deploying dashboards and web applications.

3.7 Summary

● Python provides powerful tools for data visualization, allowing users to


explore and communicate data insights effectively.

● Basic data analysis in Python involves data manipulation, cleaning, and


transformation, as well as descriptive statistics to summarize the data.
● Python offers a range of visualization techniques, including bar charts,
line charts, pie charts, scatter plots, and box plots, which can be created
using libraries such as Matplotlib, Seaborn, and Plotly.

● Matplotlib is a popular Python library for creating static visualizations,


while Seaborn is used for more complex and aesthetically pleasing
visualizations.

● Plotly is a library that allows for the creation of interactive visualizations,


such as maps and sliders.

● Advanced visualization techniques in Python include 3D plotting,


network visualization, and geospatial visualization using libraries like
Plotly, NetworkX, and GeoPandas.

● Dashboards and reports in Python allow users to create interactive data


visualizations that can be easily shared with others. Dash is a popular
library for building interactive dashboards.

3.8 Self-Assessment Questions:


1. What are some common types of data visualizations and what are they
used for?
2. What are some advantages of using Python for data visualization?
3. How can you create a scatter plot in Python using Matplotlib or Seaborn?
4. Describe the process of creating a dashboard or report in Python. What
are some key elements that should be included?

5. What is an interactive visualization and how can you create one in


Python using Plotly?
6. What are some advanced data visualization techniques you can use in
Python, such as 3D plots or animation?
7. How can you use Matplotlib, Seaborn, and Plotly to create visualizations
in Python? What are some advantages and disadvantages of each library?
3.9 References

● "Python Data Science Handbook: Essential Tools for Working with Data"
by Jake VanderPlas

● "Data Visualization with Python and JavaScript: Scrape, Clean, Explore &
Transform Your Data" by Kyran Dale

● "Python Plotting with Matplotlib" by Ben Root


Course: MsDS

Data Analysis and Visualization

Module: 4
Data Visualization with Tableau and Power BI

Learning Objective:
After studying this module, students will be able to:
● Understand the importance of data visualization and its role in gaining
insights from complex data sets.
● Learn how to connect to different data sources in Tableau and Power BI.
● Develop skills to prepare and clean data to make it suitable for
visualization.
● Create basic and advanced visualizations using various charts and
graphs such as bar charts, line charts, scatter plots, heat maps, and
geographic maps.
● Customize visualizations by formatting and labeling them effectively.
● Use parameters and filters to enhance visualizations and make them
more interactive.
● Develop skills to create calculated fields and measures in Tableau and
Power BI.
● Understand how to build interactive dashboards and reports in Tableau
and Power BI.
● Learn how to share and publish visualizations with others.
● Gain a comprehensive understanding of the capabilities and limitations
of Tableau and Power BI for data visualization.

Structure
4.1 Introduction
4.2 Connecting to Data Sources
4.3 Creating Basic Visualizations
4.4 Working with Calculations and Functions
4.5 Advanced Visualization Techniques
4.6 Dashboards and Reports
4.7 Summary
4.8 Self-Assessment Questions
4.9 References

4.1 Introduction
Data visualization is the graphical representation of data and information.
It helps to make data more accessible, understandable, and actionable. The
use of data visualization tools such as Tableau and Power BI has become
increasingly popular in recent years. These tools help organizations to
analyze, interpret and present data in a visually appealing way.
Data visualization and its importance:
Data visualization is essential because it helps to identify patterns,
relationships, and trends in data that may not be apparent in a raw form.
Visualization of data makes it easier for users to comprehend complex data,
leading to better decision-making.
Data visualization tools allow users to create interactive dashboards and
reports, providing real-time insights into the data. They enable
organizations to quickly identify areas of improvement, track key
performance indicators, and make data-driven decisions.

Overview of Tableau and Power BI:


Tableau and Power BI are two of the most popular data visualization tools
available in the market. Both of these tools offer powerful data
visualization capabilities that help organizations to gain insights into their
data quickly.
⮚ Tableau is a data visualization and business intelligence tool that
allows users to connect to various data sources, create interactive
dashboards, and publish them on Tableau Server or Tableau Online.
Tableau provides an intuitive drag-and-drop interface that allows
users to create complex visualizations quickly.
⮚ Power BI is a business analytics service provided by Microsoft that
provides users with interactive visualizations and business
intelligence capabilities. It allows users to connect to various data
sources, including cloud-based and on-premise data, to create
reports and dashboards. Power BI also provides natural language
query capabilities, making it easier for users to ask questions about
their data.
Both Tableau and Power BI are highly flexible and provide extensive
customization options for creating visualizations. They offer a wide range of
charts, graphs, and other visualization types that can be used to display
data in an easily digestible manner.

4.2 Connecting to Data Sources


Both Tableau and Power BI offer a wide range of data connectors that
allow users to connect to various data sources quickly. These connectors
enable users to connect to various data sources, including spreadsheets,
databases, cloud-based data sources, and web-based data sources.
⮚ Tableau supports over 65 different data connectors, including
Microsoft Excel, SQL Server, Oracle, Amazon Redshift, Google
Analytics, and many more. Users can also connect to web-based data
sources, such as JSON and XML files, through the use of web
connectors.
⮚ Power BI also supports a wide range of data connectors, including
Microsoft Excel, SQL Server, Oracle, Amazon Redshift, and many
more. Additionally, Power BI provides users with the ability to
connect to cloud-based data sources, such as Microsoft Dynamics
365, Salesforce, and Google Analytics.

Data Preparation and Cleaning:


Before creating visualizations, it is essential to prepare and clean the data.
Data preparation involves the process of transforming raw data into a
structured format that can be analyzed and visualized. Data cleaning is the
process of identifying and correcting errors in the data, such as missing
values, duplicate records, or inconsistent data.

Both Tableau and Power BI offer data preparation and cleaning tools that
help users to perform these tasks quickly and efficiently.
⮚ Tableau's data preparation tool, Tableau Prep, provides a drag-and-
drop interface that allows users to combine, clean, and shape their
data before analysis. Tableau Prep allows users to perform a range of
data cleaning operations, such as removing duplicates, filtering data,
and filling in missing values. Tableau Prep also provides users with
the ability to automate data preparation workflows, saving time and
reducing errors.
⮚ Power BI's data preparation tool, Power Query, provides users with a
similar drag-and-drop interface that allows them to transform and
clean their data. Power Query also allows users to perform a range of
data cleaning operations, such as removing duplicates, splitting
columns, and pivoting data. Power Query provides users with the
ability to automate data preparation workflows and create reusable
data transformation scripts.
4.3 Creating Basic Visualizations
1. Bar Charts:
Bar charts are used to display categorical data with rectangular bars, where
the height or length of each bar represents the value of the data. In
Tableau and Power BI, creating a bar chart is simple:
1. Drag the categorical variable you want to display on the x-axis, and
the quantitative variable you want to display on the y-axis.
2. Choose the bar chart type from the visualization options.
3. Format the chart by adding labels, titles, and colors.
For example, to create a bar chart in Tableau :
1. Launch Tableau and connect to your data source. You can connect to
a variety of data sources, including Excel files, CSV files, SQL
databases, and more.
2. Drag and drop your data source onto the "Drag a Table Here" section
of the workspace.
3. Once your data is loaded, select the "Worksheet" tab at the bottom
of the screen.
4. From the "Marks" card on the left side of the screen, select "Bar" as
the chart type.
5. Drag the dimension you want to use for the X-axis (usually a
categorical variable) onto the "Columns" shelf.
6. Drag the measure you want to use for the Y-axis (usually a numerical
variable) onto the "Rows" shelf.
7. You can customize your chart by adding filters, sorting, formatting,
and labels using the options in the "Marks" card and other sections
of the interface.
8. You can also add additional elements to your chart, such as a legend
or a title, by selecting the "Worksheet" menu and choosing "Show
Cards" or "Show Title."
9. Finally, you can save your chart by selecting "File" > "Save" or by
publishing it to Tableau Server or Tableau Public.

2. Line Charts:
Line charts are used to display trends over time, where data is represented
by a series of points connected by lines. In Tableau and Power BI, creating a
line chart is similar to creating a bar chart:
1. Drag the time variable you want to display on the x-axis, and the
quantitative variable you want to display on the y-axis.
2. Choose the line chart type from the visualization options.
3. Format the chart by adding labels, titles, and colors.
For example, to create a line chart in Power BI that displays the monthly
sales of a product:
1. Open Power BI Desktop and click on "Get Data" on the Home tab.
2. Select the data source you want to use and click on "Connect."
3. Once your data is loaded, go to the "Fields" pane and select the
columns you want to use in your chart (in this case, the product and
the date columns).
4. Drag the date column to the "Axis" field well and the product column
to the "Values" field well.
5. By default, Power BI will create a bar chart. To change it to a line
chart, click on the "Line chart" button on the Visualizations pane.
6. You can customize your chart further by adding titles, changing
colors, and modifying the axes.

3. Pie Charts:
Pie charts are used to display the proportion of data in different categories
as slices of a pie. In Tableau and Power BI, creating a pie chart is simple:
1. Drag the categorical variable you want to display onto the "Color" or
"Label" shelf.
2. Choose the pie chart type from the visualization options.
3. Format the chart by adding labels, titles, and colors.
For example, to create a Pie chart in Tableau:
1. Connect to your data source.
2. Drag and drop the dimension and measure you want to use for your
chart.
3. Click on "Show Me" and select the pie chart option.
4. Customize your chart using the "Marks" card.
5. Save your chart by clicking on "File" and selecting "Save" or "Save As".

4. Scatter Plots:
Scatter plots are used to display the relationship between two variables,
where each data point is represented by a dot on a two-dimensional plane.
In Tableau and Power BI, creating a scatter plot is simple:
1. Drag the two quantitative variables you want to display onto the x-
axis and y-axis.
2. Choose the scatter plot type from the visualization options.
3. Format the chart by adding labels, titles, and colors.
For example, to create a scatter plot in Power BI that displays the
relationship between a product's price and sales:
1. Open Power BI Desktop and load your data into the data model.
2. Go to the "Visualizations" pane on the right-hand side of the screen.
3. Click on the "Scatter Chart" icon to create a new scatter plot.
4. Drag the field that contains the product's price to the "Values"
section of the "Fields" pane.
5. Drag the field that contains the product's sales to the "Axis" section
of the "Fields" pane.
6. You may also want to add additional fields to the "Legend" section of
the "Fields" pane to display different categories of products in
different colors or shapes on the scatter plot.
7. Customize the appearance of the scatter plot by adjusting the size,
color, and shape of the data points, adding a title and axis labels, and
changing the background color, font, and theme of the visualization.
8. You can also add additional features to the scatter plot, such as trend
lines, data labels, tooltips, and filters, to provide more insights and
interactivity for your users.

5. Box Plots:
Box plots are used to display the distribution of data, where the box
represents the middle 50% of the data, the whiskers represent the rest of
the data, and any outliers are displayed as individual points. In Tableau and
Power BI, creating a box plot is simple:
1. Drag the quantitative variable you want to display onto the rows or
columns shelf.
2. Choose the box plot type from the visualization options.
3. Format the chart by adding labels, titles, and colors.

● Other Common Visualizations:


In addition to the above, Tableau and Power BI offer many other types of
visualizations, such as heat maps, area charts, tree maps, and more. The
steps to create these visualizations are similar to those described above,
and may require additional customization depending on the specific needs
of the user.

⮚ Customizing Visualizations:
In Tableau and Power BI, users can customize their visualizations in various
ways, such as:
1. Adding and formatting labels, titles, and legends.
2. Changing the color scheme and style of the chart.
3. Adding additional data or creating calculated fields.
4. Adjusting the axes and scales of the chart.
For example, in Tableau, users can add a title to a chart by selecting the
"Worksheet" menu, then "Show Title", and entering the desired text. They
can also format the legend by selecting "Legend" from the "Marks" card,
and changing the font, size, and position of the legend.
Similarly, in Power BI, users can add a title to a chart by selecting the
"Visualizations" pane, then choosing "Title", and entering the desired text.
They can also format the legend by selecting the chart, then choosing
"Legend" from the formatting options, and changing the font, size, and
position of the legend.

4.4Working with Calculations and Functions


Calculated fields and measures are essential components of data analysis in
Tableau and Power BI. Calculated fields allow users to create new fields
based on existing fields, while measures allow users to perform calculations
on data values. Basic functions and operators can also be used to
manipulate data and create more complex calculations.

Creating Calculated Fields:


To create a calculated field in Tableau, the user should begin by right-
clicking on the data pane and selecting the "Create Calculated Field" option.
Next, they can provide a name for the field and write a formula using the
available fields and operators. Once the formula is written, the user can
click on "OK" to save the calculated field.
As an illustration, one can create a calculated field that computes the profit
margin by dividing the profit by the sales.
[Profit] / [Sales]
To create a calculated field in Power BI, users can follow these steps:
1. Select the "Modeling" tab from the ribbon
2. Click "New Column" and enter a name for the column
3. Enter a formula using fields and operators
4. Click "OK" to save the calculated field
For example, users can create a calculated field that calculates the profit
margin by dividing the profit by the sales:
Profit Margin = DIVIDE([Profit], [Sales])

Using Basic Functions and Operators:


Tableau and Power BI provide a range of basic functions and operators that
can be used to manipulate data in calculated fields and measures. Some of
the most commonly used functions and operators include:
1. Mathematical functions: Mathematical functions such as SUM, AVG,
MIN, and MAX can be used to perform calculations on data values.
2. String functions: String functions such as LEFT, RIGHT, and MID can
be used to extract or manipulate text data.
3. Date functions: Date functions such as YEAR, MONTH, and DAY can
be used to extract or manipulate date data.
4. Logical operators: Logical operators such as AND, OR, and NOT can
be used to create conditional statements in calculated fields.

For example, users can create a calculated field that calculates the total
profit for each category in Tableau using the SUM function:
SUM([Profit])
Users can also create a calculated field that extracts the year from a date
field in Power BI using the YEAR function:
Year = YEAR([Date])
Overall, calculated fields and basic functions and operators are powerful
tools for data analysis in Tableau and Power BI, allowing users to create
custom calculations and manipulate data in a variety of ways.

● How to Use parameters and filters to enhance visualizations and make


them more interactive:
In both Tableau and Power BI, parameters and filters are powerful tools
that allow users to enhance visualizations and make them more interactive.
Here are the steps to use parameters and filters in both tools:
Tableau:
1. Create a parameter by right-clicking on the data pane and selecting
"Create Parameter."
2. Name the parameter and select its data type.
3. Set the allowable values for the parameter.
4. Create a calculated field that uses the parameter.
5. Drag the calculated field onto the view to use it as a filter.
Power BI:
1. Click on the "Filters" pane in the visualization pane.
2. Drag and drop the field you want to filter on into the "Filters" section.
3. Choose the type of filter you want to apply, such as a basic filter,
advanced filter, or relative date filter.
4. Set the filter conditions based on the data.
5. Add slicers to the report to allow users to interact with the data and
filter the visualizations.
Using parameters and filters in both Tableau and Power BI can help create
more dynamic and interactive visualizations that allow users to explore the
data in more detail.
● How to share and publish visualizations with others.
To share and publish visualizations with others in Tableau and Power BI,
follow these steps:
1. In Tableau, go to the "Server" menu and select "Publish Workbook".
In Power BI, click on "Publish" in the top right corner.
2. Choose the appropriate publishing option, such as to Tableau Server
or Tableau Online for Tableau, or to Power BI Service for Power BI.
3. In Tableau, select the worksheets and dashboards to publish, and
provide a name and description for the workbook. In Power BI, select
the report to publish and provide a name and description.
4. Specify any required authentication information and provide access
permissions for the workbook or report.
5. Once published, the workbook or report can be accessed by others
who have appropriate permissions.
6. To share the workbook or report with others, provide the
appropriate URL or embed code, or create a Tableau or Power BI
account for them.
7. To update the workbook or report, make changes in the original file
and publish again. Any users who have access to the published
workbook or report will be able to see the updates.
By following these steps, you can easily share and publish your
visualizations with others in Tableau and Power BI.

4.5 Advanced Visualization Techniques


Tableau and Power BI offer many advanced visualization techniques that
allow users to explore and analyze complex data sets. Some of the
advanced visualization techniques include:
1. Heat maps: A heat map is a type of chart that uses color to represent
data values. Heat maps are particularly useful for visualizing large
data sets with many variables. In Tableau and Power BI, users can
create heat maps by dragging a measure onto the color shelf and a
dimension onto the rows or columns shelf.
2. Tree maps: A tree map is a type of chart that displays hierarchical
data using nested rectangles. Each rectangle represents a category,
and the size of the rectangle corresponds to the value of the
category. In Tableau and Power BI, users can create tree maps by
dragging a measure onto the size shelf and a dimension onto the
detail shelf.
3. Geographic maps: A geographic map is a type of chart that displays
data on a map, allowing users to visualize spatial relationships in
their data. In Tableau and Power BI, users can create geographic
maps by adding geographic data to their data set and then dragging
the geographic data onto the "Map" chart type.

Using Parameters and Filters to Enhance Visualizations:


Parameters and filters are powerful tools in Tableau and Power BI that
allow users to customize their visualizations and explore their data in
greater depth. Parameters and filters can be used to:
1. Limit the data displayed in a chart: Users can create filters that limit
the data displayed in a chart to specific values, ranges, or conditions.
For example, users can create a filter that only displays data from a
specific region, or only displays data from the last year.
2. Allow users to interact with a chart: Users can create parameters
that allow other users to interact with a chart, by selecting different
options or values. For example, users can create a parameter that
allows other users to choose between different chart types, or
different time periods.
3. Customize the appearance of a chart: Users can create parameters
that allow other users to customize the appearance of a chart, by
adjusting colors, sizes, or other formatting options. For example,
users can create a parameter that allows other users to change the
color scheme of a chart, or the size of the data points.
For example, in Tableau, users can create a filter by dragging a dimension
onto the "Filters" shelf and selecting the desired values. They can also
create a parameter by selecting "Create Parameter" from the "Analysis"
menu, and choosing the desired options.
Similarly, in Power BI, users can create a filter by selecting the desired field
and values from the "Visualizations" pane. They can also create a
parameter by selecting "New Parameter" from the "Fields" pane, and
choosing the desired options.

4.6 Dashboards and Reports


Dashboards and reports are key components of data visualization in
Tableau and Power BI. Dashboards provide an interactive and customizable
view of data, while reports offer a more static and structured view. Both
dashboards and reports can be used to present data in a meaningful and
engaging way.

Building Interactive Dashboards:


To build an interactive dashboard in Tableau, users can follow these steps:
1. Create a new dashboard by selecting "Dashboard" from the menu
bar
2. Drag and drop visualizations onto the dashboard canvas
3. Customize the layout and formatting of the dashboard using the
formatting pane
4. Add interactive elements such as filters, parameters, and actions to
allow users to interact with the data
For example, users can create a dashboard that displays sales data by
region and category, with filters to allow users to explore the data further

Sharing and Publishing Visualizations:


Once a visualization or dashboard is created, it can be shared and
published with others. In Tableau, users can publish their visualizations to
Tableau Server or Tableau Public, allowing others to view and interact with
the data. In Power BI, users can publish their visualizations to the Power BI
service or export them as a report to share with others.
To share a visualization in Tableau, users can follow these steps:
1. Connect to Tableau Server or Tableau Public
2. Select the visualization to be published
3. Click "Publish" and follow the prompts to publish the visualization
To share a visualization in Power BI, users can follow these steps:
1. Publish the visualization to the Power BI service
2. Share the visualization with others by granting them access to the
report or dashboard
3. Embed the visualization in a website or other platform using the
Power BI API
Overall, dashboards and reports are powerful tools for presenting data in
Tableau and Power BI, allowing users to create interactive and engaging
visualizations that can be shared and published with others.

4.7 Summary

● Data visualization is a crucial aspect of data analysis, and tools like


Tableau and Power BI provide a user-friendly way to create effective
visualizations.

● Connecting to data sources is the first step in creating visualizations, and


Tableau and Power BI support a wide range of data sources.

● Data preparation and cleaning is an important part of the data analysis


process, and these tools offer ways to clean and transform data for
better visualization.

● Common visualizations like bar charts, line charts, and scatter plots can
be created easily in both Tableau and Power BI, with customization
options for formatting and labeling.

● Advanced visualization techniques, like heat maps and geographic maps,


can help users better understand and analyze their data, while
parameters and filters can enhance visualizations for better insights.

● Finally, Tableau and Power BI allow users to create interactive


dashboards and reports that can be shared and published with others,
making data analysis and visualization more accessible and collaborative.

4.8 Self-Assessment Questions


1. What are some common data sources that can be connected to in
Tableau and Power BI?
2. What are some advanced visualization techniques available in Tableau
and Power BI, and how can they be used to gain deeper insights into
data?
3. How can parameters and filters be used to enhance visualizations in
Tableau and Power BI?
4. What is the process for creating calculated fields and measures in
Tableau and Power BI?
5. How can interactive dashboards and reports be created in Tableau and
Power BI, and what benefits do they offer for data analysis?
6. How can visualizations be shared and published with others using
Tableau and Power BI?

4.9 References

● Tableau for Dummies by Molly Monsey and Paul Sochan

● Power BI for Dummies by Ken Withee

● The Big Book of Dashboards: Visualizing Your Data Using Real-World


Business Scenarios by Steve Wexler, Jeffrey Shaffer, and Andy Cotgreave
Course: MsDS

Data Analysis and Visualization

Module: 5
Best Practices and Insights

Learning Objective:
After studying this module, students will be able to:

● Understand the importance of data visualization in communicating


insights and informing decision-making.

● Identify best practices for creating effective data visualizations using


tools such as Tableau, Python, PowerBI, and Excel.

● Interpret insights from data visualizations accurately and objectively,


considering context, patterns, and potential biases.

● Generate effective dashboards and reports that provide context, include


actionable insights, and are tailored to the audience.

● Understand the trends and directions shaping the future of data


visualization, including automation and AI, interactivity, integration with
other technologies, expanding data sources, and more sophisticated
visualizations.

Structure
5.1 Best Practices for Data Visualization
5.2 Interpreting Insights from Data Visualizations
5.3 Tips for Effective Dashboard and Report Generation
5.4 Future of Data Visualization
5.5 Summary
5.6 Self-Assessment Questions
5.7 References
5.1 Best Practices for Data Visualization
Data visualization is an important aspect of data analysis, as it enables us to
transform raw data into a visually appealing format that can be easily
understood and interpreted by stakeholders. In order to create effective
data visualizations, there are several best practices that should be followed,
regardless of the tool being used. Here are some of the best practices for
data visualization:
1. Know your audience: Understanding the audience's needs, goals, and
interests is crucial to creating effective data visualizations. Different
stakeholders may have different priorities and interests, so it is
important to tailor the visualization to the audience.
2. Choose the right chart type: Selecting the appropriate chart type is
important for effectively conveying the data. Common chart types
include bar charts, line charts, scatter plots, and heat maps. Some
tools, such as Tableau and Power BI, offer a wide variety of chart
types to choose from.
3. Keep it simple: Avoid cluttering the visualization with unnecessary
information. Only include the data that is necessary to convey the
message.
4. Use color effectively: Color can be a powerful tool for highlighting
important data points or trends, but it should be used sparingly and
consistently. Too much color can be overwhelming, while
inconsistent use of color can be confusing.
5. Label clearly: Labels should be clear and easy to read. Avoid
abbreviations or acronyms that may not be understood by all
stakeholders.
6. Provide context: Providing context is important for helping
stakeholders understand the significance of the data. This can
include annotations, axis labels, or reference lines.
7. Test and iterate: It is important to test the visualization with
stakeholders and iterate based on their feedback. This can help
ensure that the visualization effectively communicates the desired
message.

Different tools offer different capabilities for data visualization. Some of


the popular tools used for data visualization are:
1. Tableau: Tableau is a powerful data visualization tool that allows
users to create interactive dashboards and visualizations. Tableau
offers a wide variety of chart types, and allows for easy filtering and
drilling down into the data.
2. Python: Python is a popular programming language for data analysis
and visualization. Libraries such as Matplotlib, Seaborn, and Plotly
offer a range of chart types and customization options.
3. Power BI: Power BI is a Microsoft tool that allows users to create
interactive dashboards and reports. Power BI offers a range of
visualizations and allows for easy integration with other Microsoft
tools such as Excel.
4. Excel: Excel is a widely used tool for data analysis and visualization.
Excel offers a range of chart types and customization options, and
allows for easy integration with other Microsoft tools such as Power
BI.
Overall, regardless of the tool being used, following best practices for data
visualization can help ensure that the visualization effectively
communicates the desired message and is well-received by stakeholders.

5.2 Interpreting Insights from Data Visualizations


Data visualization is an important tool for understanding and
communicating insights from data. However, simply creating a visualization
is not enough - it is important to be able to interpret the insights from the
visualization in order to take action or make decisions based on the data.
Here are some tips for interpreting insights from data visualizations:
1. Understand the purpose of the visualization: Before interpreting the
insights, it is important to understand the purpose of the
visualization. What question is the visualization trying to answer?
What insights should be gleaned from the visualization? This
understanding can help guide the interpretation process.
2. Look for patterns and trends: Data visualizations can help highlight
patterns and trends in the data that may not be apparent from raw
data. Look for trends over time, patterns in relationships between
variables, or clusters of data points that may indicate a specific trend
or group.
3. Identify outliers: Outliers are data points that are significantly
different from the rest of the data. Identifying outliers can help
understand the factors that may be contributing to these differences
and can inform decision-making.
4. Compare and contrast: Comparing and contrasting different parts of
the visualization can help identify differences and similarities. For
example, comparing different geographic regions or different time
periods can help understand the factors that may be contributing to
differences in the data.
5. Consider the limitations: Data visualizations are a powerful tool, but
they are not without limitations. It is important to consider the
limitations of the data, the visualization, and any underlying
assumptions or biases that may be present.
6. Seek additional context: Additional context can help understand the
factors that may be contributing to the insights. This may include
demographic data, industry-specific knowledge, or other contextual
factors.
7. Ask questions: Finally, it is important to ask questions and seek
clarification when interpreting insights from data visualizations. This
can help ensure that the insights are fully understood and can inform
decision-making.

5.3 Tips for Effective Dashboard and Report Generation


Dashboards and reports are important tools for communicating insights
from data to stakeholders. They can help summarize complex information
in an easily digestible format, and can be customized to meet the needs of
different audiences. Here are some tips for effective dashboard and report
generation:
1. Define the purpose and audience: Before creating a dashboard or
report, it is important to define the purpose and audience. What
questions does the dashboard or report aim to answer? Who is the
audience and what information do they need? This information can
help guide the design and content of the dashboard or report.
2. Keep it simple: Dashboards and reports should be designed to be
easily understood at a glance. Use simple, clear visuals and avoid
cluttering the dashboard or report with unnecessary information.
3. Use visuals effectively: Visuals can help communicate insights from
data more effectively than text alone. Use visuals such as charts,
graphs, and maps to highlight key insights and trends.
4. Use consistent formatting: Consistent formatting can help make the
dashboard or report more visually appealing and easier to read. Use
consistent colors, fonts, and sizing throughout the dashboard or
report.
5. Include actionable insights: Dashboards and reports should not only
communicate insights, but also provide actionable recommendations.
This can help stakeholders make informed decisions based on the
data.
6. Test and iterate: It is important to test the dashboard or report with
stakeholders and iterate based on their feedback. This can help
ensure that the dashboard or report effectively communicates the
desired message.
7. Automate where possible: Automating the data retrieval and
visualization process can help save time and ensure accuracy. Tools
such as Tableau, Power BI, and Google Data Studio offer features for
automating data retrieval and updating the visualizations in real-time.

5.4 Future of Data Visualization


The field of data visualization is constantly evolving, driven by advances in
technology and the increasing need for organizations to make data-driven
decisions. Here are some of the trends and directions that are shaping the
future of data visualization:
1. Increased automation and AI: With the growing amount of data
being generated, there is a need for tools and techniques that can
quickly and accurately process and analyze large datasets.
Automation and AI technologies are increasingly being used in data
visualization to streamline the process of data preparation, analysis,
and visualization.
2. Greater emphasis on interactivity: Interactive data visualization tools
allow users to explore and manipulate data in real-time, providing a
more immersive and engaging experience. This trend is likely to
continue, with data visualization tools increasingly offering features
such as zooming, filtering, and highlighting to provide users with
greater control over the data.
3. Integration with other technologies: Data visualization is increasingly
being integrated with other technologies such as machine learning,
natural language processing, and augmented reality. For example,
machine learning algorithms can be used to identify patterns and
relationships in data, which can then be visualized in an interactive
dashboard.
4. Expansion of data sources: Data visualization tools are expanding to
support a wider range of data sources, including streaming data,
unstructured data, and IoT data. This will enable organizations to
gain insights from a wider range of data sources, and to make faster
decisions based on real-time data.
5. More sophisticated visualizations: As data visualization technologies
become more advanced, we can expect to see more sophisticated
and complex visualizations. This will enable organizations to gain
deeper insights from data, and to communicate these insights more
effectively to stakeholders.
6. Increasing importance of data literacy: As data visualization becomes
more complex and integrated with other technologies, there is a
growing need for individuals with data literacy skills. Data literacy
involves not only the ability to interpret data visualizations, but also
to understand the underlying data, the methods used to analyze it,
and the limitations and biases inherent in the data.

5.5 Summary

● Best practices for data visualization include choosing the right type of
visualization for the data, using clear and concise labels and titles, and
avoiding clutter.

● Interpreting insights from data visualizations requires understanding the


context of the data, analyzing patterns and trends, and considering
potential biases and limitations.
● Effective dashboard and report generation involves defining the
purpose and audience, using visuals effectively, providing context,
including actionable insights, and testing and iterating the design.

● The future of data visualization is likely to be shaped by automation and


AI, greater interactivity, integration with other technologies, expanding
data sources, and more sophisticated visualizations.

● As data becomes increasingly central to decision-making, data


visualization will continue to play a crucial role in communicating
insights and driving informed decisions. It is important for individuals to
develop data literacy skills to effectively interpret and communicate
insights from data.

5.6 Self-Assessment Questions


1. What are some common mistakes to avoid when creating a data
visualization?
2. How can you ensure that you are interpreting insights from a data
visualization accurately and objectively?
3. What are some best practices for creating an effective dashboard or
report?
4. How might automation and AI impact the future of data visualization?
5. What steps can you take to improve your data literacy skills and
effectively communicate insights from data visualizations?

5.7 References

● "Storytelling with Data: A Data Visualization Guide for Business


Professionals" by Cole Nussbaumer Knaflic.

● "Data Visualization Made Simple: Insights into Becoming Visual" by


Kristen Sosulski.
● "Information Dashboard Design: Displaying Data for At-a-Glance
Monitoring" by Stephen Few.

You might also like