0% found this document useful (0 votes)
3 views

Data Visualization notes

Uploaded by

pgp24.daksh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Data Visualization notes

Uploaded by

pgp24.daksh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

Data Visualization

1. Dimensions

 Definition: Dimensions are qualitative, categorical fields used to describe data. They
represent the "what" in your analysis.

 Examples: Time (year, quarter, month), Product (name, category), Region (city, country),
Customer (name, ID).

 Purpose: They help break down the data into categories or groups for analysis. In Power BI,
dimensions are used to slice, filter, or categorize data, creating context for your numeric
values (measures).

2. Measures

 Definition: Measures are quantitative, numeric values that represent the "how much" or
"how many" aspect of your data. They are typically calculated fields that summarize data.

 Examples: Total Sales, Profit, Quantity Sold, Average Revenue.

 Purpose: Measures allow you to perform aggregations (sum, average, min, max, etc.) and are
often the focus of reporting and visualizations. In Power BI, measures are frequently created
using DAX (Data Analysis Expressions) to calculate complex metrics dynamically.

3. Hierarchy

 Definition: A hierarchy represents a logical arrangement of data into multiple levels of


granularity, often ordered from broadest to most detailed.

 Examples:

o Time Hierarchy: Year > Quarter > Month > Day

o Geographic Hierarchy: Country > Region > State > City

o Product Hierarchy: Category > Sub-category > Product

 Purpose: Hierarchies allow users to drill down or roll up through levels of data to see trends
at varying levels of detail. Power BI automatically identifies hierarchies in date fields, but you
can create custom hierarchies for other dimensions like location or product.

4. Grain (Granularity)

 Definition: Grain refers to the level of detail or granularity of the data stored in your dataset.
It defines what each row in a table represents.

 Examples:

o A dataset with daily grain means each row represents one day’s data.

o A dataset with transactional grain means each row represents an individual


transaction.

o Customer-level grain means each row represents one customer.


 Purpose: The grain impacts how granular your analysis can be. A finer grain (like transaction-
level) allows more detailed analysis, but may require more storage and processing power.
Coarser grain (like yearly or monthly summary data) is more efficient but less detailed.

Important Considerations in Grain:

 When defining the grain in a data model, consistency is key. For instance, measures should
be aggregated at the same level of granularity as the dimensions they are related to.

 In Power BI, when building relationships between tables, the grain of each table impacts how
well the data model functions, especially in one-to-many or many-to-many relationships.

Practical Applications in Power BI

 Dimensions and measures are essential for building reports and dashboards. In Power BI,
dimensions usually come from columns in your data tables, while measures are either
predefined or created using DAX.

 Hierarchies allow users to explore data interactively, drilling down to more detailed views
within a chart. Power BI supports dynamic exploration using hierarchies, providing an easy
way to shift from overview to detail.

 Understanding the grain of your data helps you manage your data sources effectively. For
example, when importing data into Power BI from a database, the grain will dictate how
much data you pull in and at what level of detail you analyze it.

In data visualization, understanding human perception is essential for creating effective visuals. Two
key concepts related to this are preattentive attributes and the goldfish effect, both of which
influence how users process and retain information from visual representations. Let’s break these
down:

1. Preattentive Attributes

 Definition: Preattentive attributes are visual properties that the human brain processes
almost instantly and subconsciously, within milliseconds, before conscious attention is
directed to the entire image. These attributes help us quickly focus on important aspects of
visual data.

 Examples:

o Color: Using a bright color to highlight a critical number in a dashboard.

o Size: Making a large bar in a bar chart stand out to show dominance in a particular
category.

o Shape: Different shapes (e.g., circles vs. squares) can be used to signify different
categories or values.

o Position: Objects placed near the center of a layout are often perceived as more
important.

o Orientation: A tilted line or arrow can immediately catch attention in a chart.

 Why it’s important:


o When used correctly, preattentive attributes can guide users' attention to key data
points without overwhelming them. Effective use of these attributes makes
visualizations easier to comprehend and more impactful.

o In Power BI, for example, you can use contrasting colors or varying sizes to highlight
certain measures or dimensions in a dashboard to draw users' eyes to important
metrics.

 Common Preattentive Attributes:

o Color (Hue, Saturation)

o Size (Length, Area, Volume)

o Shape (Circles, Squares, etc.)

o Position on Page (Near or far, left or right)

o Motion (Movement captures immediate attention)

o Orientation (Slant or rotation of objects)

o Line Width (Thicker or thinner lines)

o Texture (Solid vs. dashed lines)

 Application in Power BI:

o Use contrast (like a red vs. gray) to highlight critical KPIs in reports.

o Adjust size to emphasize the most important metric, like making key bars in a chart
taller.

o Use positioning to place critical visuals in the center or at the top-left corner of a
report, as these areas naturally attract attention.

2. Goldfish Effect

 Definition: The "goldfish effect" is a concept that describes the declining human attention
span, likening it to the supposed attention span of a goldfish, which is around 8 seconds. In
data visualization, this suggests that people typically only focus on a visualization for a very
short period, meaning that your visual needs to quickly communicate the key message.

 Why it’s important:

o Since users have limited attention spans, particularly in today’s fast-paced digital
environment, your visualizations need to capture their attention quickly and
effectively. If the visual is too complex or cluttered, users might lose interest before
getting the insights.

 Application in Power BI:

o Simplify visualizations: Avoid overloading a single chart with too many elements. For
example, using 10 different colors in a bar chart could overwhelm the user, causing
them to lose focus.
o Direct attention: Use preattentive attributes like color or size to ensure that the
most critical information stands out immediately.

o Use storytelling: Guide the viewer through a narrative, emphasizing key takeaways
early on in the visualization so they don’t have to work hard to get insights.

o Minimalism: Design your reports and dashboards with simplicity in mind. Use clean
layouts, avoid excessive labels, and ensure that each element has a clear purpose.

Key Takeaways for Data Visualization:

 Preattentive attributes help users focus on what’s important without consciously thinking
about it. Effective use of color, size, and position can significantly improve a visualization’s
clarity.

 The goldfish effect reminds designers that attention is a limited resource, so visualizations
must communicate key insights quickly and simply.

In data visualization, choosing the right type of visual is critical for effectively conveying your data.
Each type of visual is suited for specific types of data, relationships, and insights you want to
communicate. Here's a guide to different types of visualizations, and examples of which is best suited
for particular scenarios:

1. Comparing Categories or Values

Bar Charts

 Best for: Comparing discrete categories, showing quantities across different groups.

 Example: Comparing sales across different regions or product categories.

 Horizontal Bar Chart: Useful when category names are long or when you have many
categories.

Column Charts

 Best for: Comparing values across categories when the focus is on the magnitude (e.g.,
monthly sales, product performance).

 Example: Comparing monthly revenue or sales performance for different products.

Stacked Bar/Column Charts

 Best for: Showing the composition of different categories within a whole, while still allowing
for comparison across groups.

 Example: Breaking down total sales by region into product categories.

Bullet Charts

 Best for: Comparing performance against a target or goal, often in dashboards.

 Example: Displaying current sales performance versus target sales.


2. Trends Over Time

Line Charts

 Best for: Showing trends over time for continuous data, illustrating patterns, and
relationships.

 Example: Plotting stock prices, sales over months, or website traffic trends.

Area Charts

 Best for: Visualizing the magnitude of change over time, where the focus is on total volume
or accumulated trends.

 Example: Showing website traffic breakdown over time (e.g., organic, referral, and direct
traffic).

Sparkline Charts

 Best for: Showing a simple, condensed view of trends in a small space, typically without axes.

 Example: Adding a quick sales trendline next to key performance indicators in a report.

3. Distributions

Histogram

 Best for: Visualizing the distribution of a single numeric variable, showing the frequency of
different ranges.

 Example: Displaying the distribution of customer ages or order values.

Box Plot (Box-and-Whisker Plot)

 Best for: Displaying the distribution of data and identifying outliers, showing medians,
quartiles, and variability.

 Example: Analyzing the spread of salaries in a company or distribution of sales across


different branches.

Violin Plot (Advanced)

 Best for: Showing the distribution of the data with its density, similar to a box plot but with
more detail.

 Example: Visualizing customer satisfaction scores distribution across different regions.

4. Part-to-Whole Relationships

Pie Charts

 Best for: Showing proportions or percentages of a whole, but only effective with a small
number of categories (3-5).

 Example: Displaying the percentage of total sales coming from each product category.

 Caution: Avoid pie charts if there are too many slices or if the differences between them are
subtle.
Donut Charts

 Best for: Like a pie chart but with a center cut out, allowing room for more data or text in the
center.

 Example: Displaying the percentage breakdown of a company's revenue streams with the
total amount in the center.

Treemaps

 Best for: Displaying hierarchical data as a part-to-whole relationship using rectangles of


varying size to represent categories.

 Example: Visualizing sales contribution by product categories and subcategories.

Stacked Area Charts

 Best for: Showing how different categories contribute to a whole over time.

 Example: Displaying how different sales channels (online, in-store) contribute to overall
revenue month by month.

5. Relationships and Correlations

Scatter Plots

 Best for: Showing the relationship between two continuous variables to identify correlations,
clusters, or trends.

 Example: Examining the relationship between advertising spend and sales, or age and
income.

Bubble Charts

 Best for: Adding a third variable to a scatter plot through the size of the bubble, allowing for
multi-dimensional analysis.

 Example: Comparing marketing spend vs. revenue, where bubble size represents the number
of customers in each region.

6. Ranking or Ordering

Funnel Charts

 Best for: Displaying a process that narrows as it progresses, useful for showing stages in a
process or pipeline.

 Example: Visualizing conversion rates through a sales funnel from initial leads to closed
deals.

Waterfall Charts

 Best for: Showing how an initial value increases or decreases over time through a series of
changes.

 Example: Analyzing the breakdown of profits by various factors, like revenue, costs, and
other expenses.
Gantt Charts

 Best for: Displaying the duration and progression of tasks in a project over time.

 Example: Tracking the progress of a project’s timeline, showing when tasks start, end, and
overlap.

7. Geospatial Data

Maps (Choropleth)

 Best for: Visualizing data that has a geographical component, using color to represent
intensity or magnitude across regions.

 Example: Displaying sales by country or region, highlighting areas with the most customers.

Bubble Maps

 Best for: Displaying numeric data on a map using bubbles to represent magnitude or size at
specific geographic points.

 Example: Showing the number of stores in various cities, where the size of the bubble
represents the number of stores.

Heat Maps

 Best for: Displaying intensity of data in two dimensions using color, useful for showing
density.

 Example: Visualizing customer concentration or website traffic on a geographical map.

Best Visual Given a Scenario

1. Scenario: Comparing sales performance across different regions

o Best visual: Bar chart (or a map if the regions are geographically distinct)

o Why: A bar chart clearly shows the differences between regions and is effective for
comparison.

2. Scenario: Analyzing monthly sales trends over the past year

o Best visual: Line chart

o Why: Line charts are ideal for displaying trends over time, making it easy to spot
increases, decreases, or seasonality.

3. Scenario: Showing how total revenue is distributed across product categories

o Best visual: Treemap or Pie Chart (if few categories)

o Why: Both visuals are excellent for showing part-to-whole relationships, but
treemaps allow for hierarchical data representation as well.

4. Scenario: Visualizing the relationship between marketing spend and revenue across
different regions

o Best visual: Scatter plot


o Why: Scatter plots show how two variables relate to one another, making them ideal
for correlation analysis.

5. Scenario: Visualizing customer age distribution

o Best visual: Histogram

o Why: Histograms display frequency distribution, which is perfect for showing how
age groups are spread across a customer base.

You might also like