Data Visualization notes
Data Visualization notes
1. Dimensions
Definition: Dimensions are qualitative, categorical fields used to describe data. They
represent the "what" in your analysis.
Examples: Time (year, quarter, month), Product (name, category), Region (city, country),
Customer (name, ID).
Purpose: They help break down the data into categories or groups for analysis. In Power BI,
dimensions are used to slice, filter, or categorize data, creating context for your numeric
values (measures).
2. Measures
Definition: Measures are quantitative, numeric values that represent the "how much" or
"how many" aspect of your data. They are typically calculated fields that summarize data.
Purpose: Measures allow you to perform aggregations (sum, average, min, max, etc.) and are
often the focus of reporting and visualizations. In Power BI, measures are frequently created
using DAX (Data Analysis Expressions) to calculate complex metrics dynamically.
3. Hierarchy
Examples:
Purpose: Hierarchies allow users to drill down or roll up through levels of data to see trends
at varying levels of detail. Power BI automatically identifies hierarchies in date fields, but you
can create custom hierarchies for other dimensions like location or product.
4. Grain (Granularity)
Definition: Grain refers to the level of detail or granularity of the data stored in your dataset.
It defines what each row in a table represents.
Examples:
o A dataset with daily grain means each row represents one day’s data.
When defining the grain in a data model, consistency is key. For instance, measures should
be aggregated at the same level of granularity as the dimensions they are related to.
In Power BI, when building relationships between tables, the grain of each table impacts how
well the data model functions, especially in one-to-many or many-to-many relationships.
Dimensions and measures are essential for building reports and dashboards. In Power BI,
dimensions usually come from columns in your data tables, while measures are either
predefined or created using DAX.
Hierarchies allow users to explore data interactively, drilling down to more detailed views
within a chart. Power BI supports dynamic exploration using hierarchies, providing an easy
way to shift from overview to detail.
Understanding the grain of your data helps you manage your data sources effectively. For
example, when importing data into Power BI from a database, the grain will dictate how
much data you pull in and at what level of detail you analyze it.
In data visualization, understanding human perception is essential for creating effective visuals. Two
key concepts related to this are preattentive attributes and the goldfish effect, both of which
influence how users process and retain information from visual representations. Let’s break these
down:
1. Preattentive Attributes
Definition: Preattentive attributes are visual properties that the human brain processes
almost instantly and subconsciously, within milliseconds, before conscious attention is
directed to the entire image. These attributes help us quickly focus on important aspects of
visual data.
Examples:
o Size: Making a large bar in a bar chart stand out to show dominance in a particular
category.
o Shape: Different shapes (e.g., circles vs. squares) can be used to signify different
categories or values.
o Position: Objects placed near the center of a layout are often perceived as more
important.
o In Power BI, for example, you can use contrasting colors or varying sizes to highlight
certain measures or dimensions in a dashboard to draw users' eyes to important
metrics.
o Use contrast (like a red vs. gray) to highlight critical KPIs in reports.
o Adjust size to emphasize the most important metric, like making key bars in a chart
taller.
o Use positioning to place critical visuals in the center or at the top-left corner of a
report, as these areas naturally attract attention.
2. Goldfish Effect
Definition: The "goldfish effect" is a concept that describes the declining human attention
span, likening it to the supposed attention span of a goldfish, which is around 8 seconds. In
data visualization, this suggests that people typically only focus on a visualization for a very
short period, meaning that your visual needs to quickly communicate the key message.
o Since users have limited attention spans, particularly in today’s fast-paced digital
environment, your visualizations need to capture their attention quickly and
effectively. If the visual is too complex or cluttered, users might lose interest before
getting the insights.
o Simplify visualizations: Avoid overloading a single chart with too many elements. For
example, using 10 different colors in a bar chart could overwhelm the user, causing
them to lose focus.
o Direct attention: Use preattentive attributes like color or size to ensure that the
most critical information stands out immediately.
o Use storytelling: Guide the viewer through a narrative, emphasizing key takeaways
early on in the visualization so they don’t have to work hard to get insights.
o Minimalism: Design your reports and dashboards with simplicity in mind. Use clean
layouts, avoid excessive labels, and ensure that each element has a clear purpose.
Preattentive attributes help users focus on what’s important without consciously thinking
about it. Effective use of color, size, and position can significantly improve a visualization’s
clarity.
The goldfish effect reminds designers that attention is a limited resource, so visualizations
must communicate key insights quickly and simply.
In data visualization, choosing the right type of visual is critical for effectively conveying your data.
Each type of visual is suited for specific types of data, relationships, and insights you want to
communicate. Here's a guide to different types of visualizations, and examples of which is best suited
for particular scenarios:
Bar Charts
Best for: Comparing discrete categories, showing quantities across different groups.
Horizontal Bar Chart: Useful when category names are long or when you have many
categories.
Column Charts
Best for: Comparing values across categories when the focus is on the magnitude (e.g.,
monthly sales, product performance).
Best for: Showing the composition of different categories within a whole, while still allowing
for comparison across groups.
Bullet Charts
Line Charts
Best for: Showing trends over time for continuous data, illustrating patterns, and
relationships.
Example: Plotting stock prices, sales over months, or website traffic trends.
Area Charts
Best for: Visualizing the magnitude of change over time, where the focus is on total volume
or accumulated trends.
Example: Showing website traffic breakdown over time (e.g., organic, referral, and direct
traffic).
Sparkline Charts
Best for: Showing a simple, condensed view of trends in a small space, typically without axes.
Example: Adding a quick sales trendline next to key performance indicators in a report.
3. Distributions
Histogram
Best for: Visualizing the distribution of a single numeric variable, showing the frequency of
different ranges.
Best for: Displaying the distribution of data and identifying outliers, showing medians,
quartiles, and variability.
Best for: Showing the distribution of the data with its density, similar to a box plot but with
more detail.
4. Part-to-Whole Relationships
Pie Charts
Best for: Showing proportions or percentages of a whole, but only effective with a small
number of categories (3-5).
Example: Displaying the percentage of total sales coming from each product category.
Caution: Avoid pie charts if there are too many slices or if the differences between them are
subtle.
Donut Charts
Best for: Like a pie chart but with a center cut out, allowing room for more data or text in the
center.
Example: Displaying the percentage breakdown of a company's revenue streams with the
total amount in the center.
Treemaps
Best for: Showing how different categories contribute to a whole over time.
Example: Displaying how different sales channels (online, in-store) contribute to overall
revenue month by month.
Scatter Plots
Best for: Showing the relationship between two continuous variables to identify correlations,
clusters, or trends.
Example: Examining the relationship between advertising spend and sales, or age and
income.
Bubble Charts
Best for: Adding a third variable to a scatter plot through the size of the bubble, allowing for
multi-dimensional analysis.
Example: Comparing marketing spend vs. revenue, where bubble size represents the number
of customers in each region.
6. Ranking or Ordering
Funnel Charts
Best for: Displaying a process that narrows as it progresses, useful for showing stages in a
process or pipeline.
Example: Visualizing conversion rates through a sales funnel from initial leads to closed
deals.
Waterfall Charts
Best for: Showing how an initial value increases or decreases over time through a series of
changes.
Example: Analyzing the breakdown of profits by various factors, like revenue, costs, and
other expenses.
Gantt Charts
Best for: Displaying the duration and progression of tasks in a project over time.
Example: Tracking the progress of a project’s timeline, showing when tasks start, end, and
overlap.
7. Geospatial Data
Maps (Choropleth)
Best for: Visualizing data that has a geographical component, using color to represent
intensity or magnitude across regions.
Example: Displaying sales by country or region, highlighting areas with the most customers.
Bubble Maps
Best for: Displaying numeric data on a map using bubbles to represent magnitude or size at
specific geographic points.
Example: Showing the number of stores in various cities, where the size of the bubble
represents the number of stores.
Heat Maps
Best for: Displaying intensity of data in two dimensions using color, useful for showing
density.
o Best visual: Bar chart (or a map if the regions are geographically distinct)
o Why: A bar chart clearly shows the differences between regions and is effective for
comparison.
o Why: Line charts are ideal for displaying trends over time, making it easy to spot
increases, decreases, or seasonality.
o Why: Both visuals are excellent for showing part-to-whole relationships, but
treemaps allow for hierarchical data representation as well.
4. Scenario: Visualizing the relationship between marketing spend and revenue across
different regions
o Why: Histograms display frequency distribution, which is perfect for showing how
age groups are spread across a customer base.