0% found this document useful (0 votes)
14 views19 pages

DV Notes ET - 1

Uploaded by

Rahul Mishra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views19 pages

DV Notes ET - 1

Uploaded by

Rahul Mishra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

…..

an online tutoring platform that wishes to analyze and visualize its website traffic, student may contain duplicate entries, which can be removed using Tableau Prep's
engagement, and tutor performance. deduplication features.
• Combining Datasets: We will need to merge datasets (e.g., joining website traffic
Tools Used: Postman, Tableau Prep, Tableau Desktop, and Google Colab. data with student engagement and tutor performance). Tableau Prep provides a visual
interface for joining or blending datasets based on common fields like student IDs or
Describe the steps you would take to create an end-to-end data visualization cycle, session timestamps.
mentioning which tool you would use at each step and why
Example:
Step 1: Data Collection (Tool: Postman) If the website traffic dataset has timestamps, and the engagement data includes interaction
times, we can join them based on a common time field to analyze how traffic affects
The first step in any data visualization project is gathering the raw data. EduTutor will likely engagement in real time.
have multiple sources of data such as website traffic logs (perhaps from a tool like Google
Analytics), student engagement data (from the learning management system), and tutor Tableau Prep also allows for calculated fields, so if we need to create derived metrics like
performance metrics (which could be stored in a separate system or database). These data engagement scores (e.g., combining quiz scores, attendance rates, and interaction frequency),
sources might expose APIs for real-time or batch data collection. we can calculate these fields in Tableau Prep before moving on to the visualization stage.

Using Postman, we can interact with these APIs to request data, test endpoints, and make sure Step 3: Exploratory Data Analysis (EDA) (Tool: Google Colab)
we are retrieving the required data in formats such as JSON or CSV. Postman is especially
useful because it allows you to structure requests with the correct authorization tokens, Before jumping into final visualizations, performing exploratory data analysis (EDA) is a
headers, and parameters. You can also save requests for reuse or even schedule them using crucial step. This involves generating initial insights and understanding the data’s structure,
Postman’s built-in tools. distribution, correlations, and trends. For this purpose, we can leverage Google Colab with
Python’s powerful libraries such as Pandas, Matplotlib, and Seaborn.
Example:
For website traffic, a GET request to the Google Analytics API can retrieve metrics such as Google Colab is particularly useful because it allows you to write and execute Python code in
session counts, user demographics, and traffic sources. For student engagement, you might a cloud environment, meaning you don’t need to worry about your local machine’s
query the LMS API to get data on attendance, test scores, or participation metrics. For tutor computational limits. Additionally, Python offers more flexibility for customized analysis
performance, an internal API could provide data such as student feedback ratings, hours compared to Tableau’s drag-and-drop interface.
worked, or success rates in courses.
Some key tasks during EDA include:
Postman is highly effective here because it supports not just GET and POST requests but also
more advanced API workflows such as pagination handling, response validation, and • Data Summarization: Using Pandas to calculate summary statistics like the mean,
chaining multiple API calls. It also provides a UI for viewing and saving responses, which median, standard deviation, and distribution of values across different variables (e.g.,
can later be downloaded or exported to other tools. average student engagement score or the most frequent tutor performance rating).
• Correlations: Using Seaborn to create correlation heatmaps, showing how different
Step 2: Data Preparation & Cleaning (Tool: Tableau Prep) variables (e.g., traffic and engagement, or tutor ratings and student success) relate to
each other.
Once data has been collected from various APIs, it is typically in raw form, with • Visual Exploration: Plotting histograms, bar charts, and scatter plots to explore
inconsistencies, duplicates, missing values, and potentially incorrect formats. Data relationships between key variables. This helps in identifying outliers, trends, or
preparation is essential to ensure the data is clean, relevant, and ready for analysis. For this clusters within the data.
step, we use Tableau Prep, a tool specifically designed for data cleaning and transformation.
Example:
Using Tableau Prep, we can import data from the files we exported from Postman (like JSON A scatter plot could show the relationship between tutor performance ratings and student
or CSV). Tableau Prep allows us to join, filter, clean, and aggregate the data into a usable success rates, revealing if higher-rated tutors consistently lead to better student outcomes.
format. Key tasks that might be performed here include:
The insights gained during EDA will inform which visualizations to use in Tableau, helping
• Handling Missing Values: For example, if some students have not submitted to ensure that we focus on the most important trends and relationships when creating the final
assignments, the engagement data might have missing values. We can impute missing dashboard.
data (if appropriate) or remove incomplete rows.
• Removing Duplicates: Data collected from different sources might have duplicate Step 4: Data Visualization (Tool: Tableau Desktop)
records. For instance, if a student is enrolled in multiple classes, the engagement data
The final and most crucial step in the data visualization cycle is creating meaningful and Tableau Prep’s user-friendly interface allows for detailed transformations, such as filtering
interactive visualizations. Tableau Desktop is ideal for this step because of its versatility in out bad data or imputing missing values using averages or medians.
building interactive dashboards with a wide range of visualization types (e.g., bar charts, line
graphs, scatter plots, and geographic maps). 2. API Limitations

In this stage, you will create visualizations based on the key metrics identified during the When working with APIs, rate limits or incomplete responses can present a challenge. Some
previous steps: APIs may restrict the number of requests you can make per minute, or they may not provide
all the data you need in a single response.
• Line Charts: To show the trend of website traffic over time (daily, weekly, or
monthly). These could help identify peak traffic periods or patterns over the academic Solution:
year. To overcome this, you can use Postman’s collection runner to automate API requests in
• Bar Charts: To visualize student engagement across different activities (e.g., batches over time, ensuring you gather all necessary data without hitting API rate limits.
attendance, quiz scores, or forum participation). You can use bar charts to compare Additionally, using pagination techniques within Postman can help retrieve large datasets
engagement across different student segments or tutor groups. over multiple requests.
• Scatter Plots: To explore the relationship between tutor performance ratings and
student success. A scatter plot can help identify which tutors are performing better 3. Large Data Volumes
and if there is any correlation with student achievements.
As EduTutor grows, the amount of data collected (e.g., from website traffic or student
Example: interactions) might become too large to handle efficiently, slowing down the process of
A dashboard could combine these charts to provide a holistic view of how traffic, loading, cleaning, and visualizing data.
engagement, and tutor performance are interrelated. Filters can be added to allow users to
focus on specific time periods, tutor groups, or student cohorts. Solution:
For large datasets, consider using aggregations or sampling. Aggregating data (e.g., daily
Tableau’s interactive features (such as tooltips, filters, and drill-down capabilities) make it instead of hourly traffic counts) reduces the data size without losing significant insights.
easy for stakeholders to explore the data and gain insights specific to their interests. Also, processing large datasets in Google Colab can provide cloud-based computational
power, alleviating the strain on local resources.
Step 5: Final Reporting and Sharing (Tool: Tableau Desktop)
4. Tool Integration and Workflow Management
Once the visualizations are complete, they need to be shared with stakeholders such as the
EduTutor management team or external clients. Tableau makes it easy to compile Ensuring smooth data flow between Postman, Tableau Prep, Google Colab, and Tableau
visualizations into dashboards or storyboards, which can be shared as interactive reports or Desktop could be challenging, as each tool serves a different function.
exported as static images or PDFs.
Solution:
For wider distribution, you can publish the dashboard to Tableau Public or Tableau Server, Clearly define the workflow and automate as many processes as possible. For example, once
enabling anyone with a link to interact with the visualizations online. Tableau also allows you the data is collected in Postman, automate its transformation into a format suitable for
to control access permissions, ensuring sensitive data is only available to authorized users. Tableau Prep. Use Google Colab to handle complex calculations before sending the
processed data to Tableau Desktop for final visualization.
Potential challenges we might encounter in this project and solutions to handle them.
2. …… is a sports gear retailer, and they have data on their monthly sales and customer
1. Data Quality Issues satisfaction ratings. They want to explore:
Relationships between monthly sales and customer satisfaction ratings.
One of the most common challenges in any data project is the issue of data quality. Data from How sales in each category (Footwear, Apparel, Equipment) are contributing to the total
various APIs might be incomplete, inconsistent, or contain errors such as incorrect monthly sales.
timestamps, missing values, or duplicate records.
- Explain when and why you would use Dual Axis in Tableau for creating any visualization
Solution: for HighPeak Sports Co. for the tasks they want to achieve with their data
To address these issues, a robust data cleaning process is required, which can be achieved
through Tableau Prep. Data cleaning involves identifying and correcting errors in the dataset Understanding Dual Axis in Tableau:
and ensuring that all fields are consistent (e.g., using the same date format across all records).
A Dual Axis in Tableau is a feature that allows you to overlay two different measures on the A Dual Axis visualization allows for an intuitive side-by-side comparison of these
same graph while using two separate axes—one on the left and one on the right. This is two related but distinct measures. You can easily identify patterns or discrepancies,
particularly useful when the measures you want to compare have different scales or units but helping decision-makers at HighPeak Sports Co. determine if customer satisfaction is
are related in some way. driving sales or if poor customer service is hindering repeat purchases.

For HighPeak Sports Co., comparing monthly sales and customer satisfaction ratings requires 2. Uncovering Hidden Insights:
a thoughtful visualization approach, as these two measures are quite different in nature. The ability to view both metrics in one chart, each with its own axis, uncovers
Monthly sales are typically measured in currency, while customer satisfaction is often insights that might be missed if the data were visualized separately. For instance,
measured on a scale (e.g., from 1 to 10 or 0 to 100). suppose you observe a trend where months with lower customer satisfaction ratings
correlate with higher sales. This might suggest that aggressive marketing campaigns
When to Use Dual Axis in Tableau for HighPeak Sports: are driving sales, but at the expense of customer experience. Such insights could lead
to strategies that focus not only on boosting sales but also on maintaining high
Dual Axis visualizations are ideal in the following situations, relevant to HighPeak Sports customer satisfaction.
Co.: 3. Enabling Better Decision-Making:
HighPeak Sports Co.’s management can make more informed decisions based on the
1. Different Units of Measurement: visual relationship between sales and satisfaction. If the visualization shows a strong
The company’s two key measures—monthly sales (in dollars or another currency) and positive correlation, it suggests that improving customer satisfaction (through better
customer satisfaction (in a rating system)—use different units. Plotting them on the products or services) could lead to higher sales. Conversely, if there is no relationship,
same graph using a Dual Axis allows both metrics to be visualized simultaneously, management might decide to focus on other factors like pricing or product variety to
with one measure on the primary axis and the other on the secondary axis. improve sales.
2. Different Scales of Data:
Monthly sales values may vary significantly, perhaps ranging from thousands to Example of Dual Axis Implementation in Tableau for HighPeak Sports:
millions of dollars, while customer satisfaction ratings are typically bounded by a
smaller scale (e.g., 0–10 or 0–100). A dual-axis plot allows each variable to be scaled • Primary Axis (Left): Monthly sales are plotted as bars. Each bar represents the sales
appropriately. If we plotted both variables on the same axis, the lower magnitude of for a given month.
satisfaction ratings might be visually overshadowed by the sales data, making the • Secondary Axis (Right): Customer satisfaction ratings are plotted as a line chart
relationship between the two harder to discern. overlaying the bars. This helps to visually track how customer satisfaction fluctuates
3. Comparing Trends Over Time: relative to sales.
HighPeak Sports Co. wants to explore the relationship between sales and customer
satisfaction over time. A line chart with a Dual Axis allows you to visualize how these This setup allows HighPeak Sports Co. to easily observe how changes in sales are
two variables change month by month. For instance, you can see if higher sales associated with shifts in customer satisfaction, uncovering actionable business
correlate with better customer satisfaction, or if there are periods where sales spike insights.
but customer satisfaction drops.
4. Overlaying Different Data Types: -What does Fixed Level of Detail (LOD) do in Tableau? How could it be useful for HighPeak
You might want to represent sales data as bars (which give a clear visual Sports Co. in analyzing their data?
representation of volume) and satisfaction ratings as a line (to emphasize trends). A
Dual Axis lets you combine these two types of visualizations in one chart. For Understanding Fixed Level of Detail (LOD) in Tableau:
example, sales can be represented with vertical bars for each month, while customer
satisfaction ratings can be represented with a line running across the months, Level of Detail (LOD) expressions in Tableau allow you to control the granularity (or detail)
providing a quick, intuitive comparison of both measures. of calculations independently of the visualization. Normally, Tableau’s default behavior is to
aggregate data based on the dimensions used in the view. However, LOD expressions let you
Why Dual Axis is Useful for HighPeak Sports Co.: perform calculations at different levels of granularity than the default.

1. Analyzing Relationships: There are three main types of LOD expressions in Tableau:
One of the main objectives is to analyze the relationship between monthly sales and
customer satisfaction. For instance, if customer satisfaction ratings decrease during 1. Fixed LOD: Calculates a value at a specified level of detail, regardless of what is
months with higher sales, this could indicate issues with customer experience during displayed in the view.
high-demand periods (perhaps slower service or product shortages). Conversely, if 2. Include LOD: Adds extra dimensions to the calculation, making it more granular.
both metrics increase together, this might indicate that successful sales campaigns or 3. Exclude LOD: Excludes certain dimensions from the calculation, making it less
improved product offerings are positively impacting customer sentiment. granular.
A Fixed LOD expression is particularly useful when you want to perform calculations at a Example:
specific, fixed level of granularity, regardless of what dimensions are used in the {FIXED [Customer ID]: AVG([Satisfaction])} could be used to
visualization. calculate the average satisfaction score for each customer. This would allow the
company to analyze customer satisfaction trends at the customer level, even if the
How Fixed LOD Works: visualization is showing data at a broader level (e.g., by month or by store location).

With a Fixed LOD expression, you can perform calculations on data at a defined level of 3. Granular Analysis with Aggregated Visualizations: HighPeak Sports Co. may
granularity, even if the current visualization is at a different level. For example, you might want to perform analyses that require different levels of granularity. For instance, they
want to calculate the total monthly sales across all product categories, even when you are might want to see total monthly sales across all categories but also analyze average
looking at sales broken down by category in the visualization. sales per product within each category. A Fixed LOD expression allows you to mix
granular calculations with aggregated data, providing deeper insights without altering
Syntax Example: the structure of the visualization.
{FIXED [Dimension]: SUM([Measure])}
Example:
For example: You could use a Fixed LOD to calculate the average sale per product while still
displaying total sales per category. This would allow HighPeak Sports Co. to compare
• {FIXED [Month]: SUM([Sales])} would calculate the total sales per month, the performance of individual products to the overall performance of their category.
regardless of what other dimensions (like product category) are included in the
visualization. Advantages of Using Fixed LOD for HighPeak Sports Co.:
• {FIXED [Customer]: AVG([Satisfaction])} would calculate the
average satisfaction score for each customer, even if you’re viewing data at a different 1. Custom Aggregation: LOD expressions give HighPeak Sports the flexibility to
level, such as by region or month. aggregate data at different levels, independent of the current visualization. This is
particularly useful when analyzing hierarchical data (e.g., categories and
How Fixed LOD is Useful for HighPeak Sports Co.: subcategories of products).
2. Simplified Calculations: Rather than creating multiple visualizations at different
1. Consistent Aggregation Across Categories: HighPeak Sports Co. is interested in levels of detail, LOD expressions allow HighPeak Sports to use a single visualization
understanding how sales from different categories (Footwear, Apparel, Equipment) while performing detailed analyses behind the scenes.
contribute to total monthly sales. A Fixed LOD expression could be used to calculate 3. Improved Decision-Making: By gaining a clearer understanding of how specific
total monthly sales across all categories, while still allowing the visualization to show categories or time periods contribute to the overall sales or satisfaction metrics,
a breakdown of sales by category. HighPeak Sports can make more informed decisions about inventory management,
pricing strategies, or customer service improvements.
For example, you could use a Fixed LOD to calculate the total monthly sales,
regardless of category, and then compare that total to the sales for individual Fashion Fiesta
categories like Footwear or Apparel. This would help you understand which
categories are driving sales in a given month, and how much they contribute to the Background:
overall total. Fashion Fiesta is a leading fashion retail chain in India. It operates 200 stores across the
country. As part of their annual review, they are evaluating the sales performance for the
Example: year. The data team has collected data related to monthly sales, customer footfall, and
{FIXED [Month]: SUM([Sales])} would give the total sales for each month, average transaction values for each store.
regardless of the product category. Then, by adding the product category dimension to
the visualization, you can break down the contribution of each category to the Broad Data Fields:
monthly total while still displaying the total monthly sales.
• Monthly sales (INR) for each store.
2. Customer Satisfaction Analysis: HighPeak Sports Co. might want to calculate the • Monthly customer footfalls for each store.
overall average customer satisfaction for each month, regardless of which store or • Average transaction value (INR) for each store.
product line the customer interacted with. Using a Fixed LOD expression, you can
calculate the average customer satisfaction at a fixed level of detail (e.g., by month or Imagine you have the above data in a raw form. Detail the steps you would take in the
by customer) and then visualize how satisfaction varies over time or across different data visualization lifecycle to provide insightful visualizations for Fashion Fiesta's
categories. annual review.
The data visualization lifecycle involves several critical steps that transform raw data into • Summary Statistics: Calculate the mean, median, maximum, and minimum for each
meaningful and actionable insights. Below is a breakdown of how you would approach this metric (monthly sales, customer footfall, and average transaction value) across all
for Fashion Fiesta: stores.
• Correlation Analysis: Check for any correlations between the variables. For
Step 1: Data Collection example, does an increase in footfall correlate with higher sales? Does a higher
average transaction value indicate more spending per customer?
The first step involves gathering raw data from various sources such as point-of-sale (POS) • Outlier Detection: Identify stores that are outliers in terms of sales or customer
systems, customer management systems, or external databases. In this case, we have access footfall. These could represent underperforming stores or stores that are significantly
to three critical data points for each store: monthly sales, customer footfall, and average outperforming the average.
transaction value. These could be in the form of Excel files, CSV files, or an SQL database.
Tools:
At this stage, you must ensure the accuracy and completeness of the data collected from all For this, you can use Python (through Google Colab) for generating quick insights using
200 stores. If there are missing data points or inconsistencies (e.g., missing months for certain libraries like Pandas and Matplotlib/Seaborn for basic visualizations such as scatter plots,
stores), these need to be addressed in the later stages. histograms, and correlation matrices.

Actions: Insights from EDA:

• Extract and load data from the systems into a structured format. • High-Level Patterns: Identify general trends, such as whether customer footfall has
• Identify and flag missing or inconsistent data entries. been increasing over time or whether certain stores consistently outperform others.
• Regional Performance: Are stores in certain regions or cities performing better than
others? This could lead to more localized insights that could influence marketing or
inventory strategies.
Step 2: Data Cleaning and Preprocessing
Step 4: Data Modeling (if necessary)
The data gathered is likely to have inconsistencies such as missing values, duplicated entries,
or incorrect formats. For example, some stores might not have submitted complete sales data, If Fashion Fiesta wants to predict future performance or understand deeper relationships, this
or customer footfalls may have been inaccurately recorded. is the stage where predictive models or advanced analytics are applied. This could involve:

In this step, the following tasks are performed: • Sales Forecasting: Building a model to predict future sales based on past trends.
• Customer Behavior Models: Understanding customer behavior by analyzing footfall
• Handling Missing Data: If monthly sales or customer footfalls for a store are missing data in conjunction with average transaction values.
for certain months, interpolation or other methods can be used to estimate the missing
data. Alternatively, the specific stores with incomplete data can be flagged and While not always necessary in a standard data visualization lifecycle, this step can offer
excluded from certain analyses. powerful predictive insights when needed.
• Formatting: Ensure that the data types are consistent across all entries (e.g., ensure
that all sales figures are recorded in the same currency and all dates follow a Tools:
consistent format). You can use Google Colab with libraries like Scikit-learn or Tableau’s built-in forecasting
• Data Transformation: Depending on the needs of the analysis, the raw data may capabilities for this stage.
need to be aggregated. For example, monthly sales data could be rolled up into
quarterly or annual totals for higher-level analysis. Step 5: Data Visualization Design and Creation

Tools: This is the most critical step where the cleaned data is transformed into actionable
For this step, tools like Tableau Prep or Google Colab (with Python libraries such as Pandas visualizations. For Fashion Fiesta, several key types of visualizations could be highly
for data manipulation) can be used to clean and preprocess the data. effective:

Step 3: Exploratory Data Analysis (EDA) Visualizations for Monthly Sales:

Once the data is clean and ready, it’s time for Exploratory Data Analysis (EDA). This phase • Line Charts: A line chart can display trends in monthly sales across all stores over
is essential to understand the basic trends, patterns, and anomalies in the dataset before time. This could help identify seasonal trends, such as increased sales during festive
moving to visualization. Some key actions in this stage include: periods.
• Bar Charts: A bar chart comparing the sales performance of different stores can • Regional Insights: By breaking down performance by region, management can focus
reveal which stores are performing above or below the company average. It could also on improving operations in underperforming areas or invest more in high-performing
show the sales contributions of different stores to the total company revenue. areas.

Visualizations for Customer Footfall:

• Heatmaps: A heatmap showing customer footfalls across all stores would provide an Step 7: Reporting and Sharing
at-a-glance view of which stores are attracting the most customers. This could also be
broken down by region to see if specific areas are outperforming others in customer Finally, it’s important to share the results with the necessary stakeholders. Tableau makes it
visits. easy to share dashboards either via the Tableau Server, Tableau Public, or by exporting them
as PDFs or interactive reports. The ability to interact with the visualizations means that
Visualizations for Average Transaction Value: stakeholders can explore the data themselves, drilling down into specific regions or stores for
deeper analysis.
• Scatter Plots: A scatter plot comparing customer footfall with average transaction
value per store could reveal patterns in customer behavior. For example, stores with Identify which of the data types provided are discrete and which are
higher footfalls might have lower average transaction values, while stores with fewer continuous. Are these data types dimensions or measures?
customers might see higher transaction values (indicating a more affluent customer
base). Understanding Discrete vs. Continuous Data:
Interactive Dashboards: • Discrete Data refers to data that consists of distinct, separate values. These values
can be counted and are typically categorical. Examples include store names or
• Combining all of the above visualizations into a dashboard allows management to months.
interact with the data, exploring specific regions, time periods, or performance • Continuous Data refers to data that can take any value within a range and can be
metrics. Tableau offers great flexibility in creating interactive, drill-down dashboards measured. This is typically numeric data like sales figures or customer footfall.
that allow users to filter by specific stores or regions and view details in real-time.
For the Fashion Fiesta dataset, the data types can be classified as follows:
Tools:
Use Tableau Desktop to build these visualizations. Tableau’s drag-and-drop functionality, 1. Monthly Sales (INR):
combined with its ability to handle large datasets, makes it the perfect tool for building o Continuous: This is a numeric value that can take any value within a range
dashboards that can be easily shared with key stakeholders. and represents the total sales for each store.
o Measure: Since it is a quantitative value used for calculations, it is a measure.
2. Monthly Customer Footfalls:
o Continuous: The number of customers visiting the store is a continuous
Step 6: Insights and Decision Making variable as it can take any positive integer value.
o Measure: This is also a quantitative measure.
Once the visualizations are complete, the next step is to interpret the findings and make data- 3. Average Transaction Value (INR):
driven decisions. Based on the visualizations, Fashion Fiesta’s management can identify key o Continuous: This value is calculated based on the total sales and number of
insights, such as: transactions, and it can take any value within a range.
o Measure: It is a numeric measure used for calculating average sales per
• Top Performing Stores: Stores that consistently perform above average in sales and transaction.
customer footfall.
• Underperforming Stores: Stores with low sales, low footfall, or low average Dimensions vs. Measures:
transaction values, indicating potential areas for improvement (e.g., better marketing
or inventory adjustments). • Dimensions are fields used to slice and dice the data. They are typically categorical
• Footfall vs. Sales: The correlation between footfall and sales can inform decisions on fields, like store names or product categories.
marketing campaigns. If stores with high footfall don’t correspond with high sales, • Measures are numerical fields that can be aggregated (e.g., sales, revenue, or
this may indicate a need for improved conversion strategies (e.g., better store layouts, customer footfall).
promotions, or staff training).
In the Fashion Fiesta case, store names, regions, and months would be classified as location, poor quality, or lack of amenities). Conversely, a small apartment with a
dimensions, while monthly sales, customer footfall, and average transaction value are very high selling price might indicate that it is in a premium location or offers highly
measures because they can be aggregated and analyzed quantitatively. desirable amenities.
4. Potential for Regression Analysis:
Urban Heights A scatter plot can easily accommodate a regression line or trend line, which
provides a more formal analysis of the relationship between the two variables. A
Background: regression line would quantify the relationship between square footage and selling
Urban Heights is a prominent real estate developer in India. They have constructed 10 new price, showing the average increase in price per additional square foot.
residential complexes in Mumbai over the last year. The marketing team wants to understand
how apartment prices vary with square footage and amenities offered. They have collated Step-by-Step Approach to Creating the Scatter Plot in Tableau
data on the apartment sales over the past year.
1. Prepare the Data:
Broad Data Fields: Begin by ensuring that the data is clean and formatted correctly. This means checking
that both square footage and selling prices are continuous numeric values, and any
• Square footage of each sold apartment. missing or erroneous data is handled appropriately.
• Selling price (INR) of each apartment. 2. Set Up the Scatter Plot:
• List of amenities offered in each complex (e.g., swimming pool, gym, park, etc.). o X-axis: Place the square footage of each apartment on the x-axis.
o Y-axis: Place the selling price on the y-axis.
How would you visualize the relationship between the square footage of apartments and o Color/Shape Encoding (optional): Use color or shape to differentiate between
their selling prices? Which type of chart would be most appropriate and why? different amenities offered or regions in Mumbai, allowing for an additional
layer of analysis. For example, apartments with premium amenities (e.g.,
To understand the relationship between apartment square footage and selling price, we need swimming pools, gyms, etc.) might be colored differently from those without.
to choose a visualization technique that allows for easy comparison and analysis of how these 3. Add Trend Line (optional but useful):
two variables are related. The ideal chart for this purpose would be a scatter plot. After plotting the points, you can add a trend line in Tableau. This will allow you to
visually assess how closely selling prices follow square footage. A trend line can also
Why Use a Scatter Plot? give you a formula that approximates the relationship between the two variables (e.g.,
for every additional 100 square feet, the price increases by INR X).
A scatter plot is the most effective way to visualize the relationship between two continuous 4. Adjust the Visualization:
variables—in this case, square footage and selling price. It allows us to plot individual data o Adjust the axes if necessary to ensure that the chart is easy to read. You might
points for each apartment, where one axis (the x-axis) represents the square footage and the want to fix the scale of the axes to prevent outliers from distorting the overall
other axis (the y-axis) represents the selling price. view of the data.
o Consider using tooltips to display detailed information about each data point
Key Benefits of Using a Scatter Plot: (e.g., the apartment's specific square footage, price, and the amenities offered)
when users hover over the points.
1. Direct Visualization of Relationships:
A scatter plot shows how one variable changes as the other increases or decreases. In Why Other Types of Charts Are Less Suitable:
the case of apartment sales, we are interested in understanding whether larger
apartments (with greater square footage) tend to sell at higher prices. A scatter plot • Bar Charts:
would help reveal this relationship immediately by showing whether there is a While bar charts are excellent for comparing discrete categories (e.g., different
positive correlation (i.e., as square footage increases, selling price increases). apartment complexes), they are not suitable for visualizing the relationship between
2. Trend Identification: two continuous variables. A bar chart would only allow you to compare the total
A scatter plot also helps to identify any trends in the data. For example, if most data selling price or square footage for specific categories (e.g., apartment types) rather
points form a line or curve, it suggests a consistent relationship between square than show the direct relationship between price and size.
footage and selling price. If there is no clear pattern, this could indicate that other • Line Charts:
factors (such as location or amenities) are influencing prices more than square Line charts are best suited for visualizing data over time or showing cumulative
footage. trends. In this case, because square footage and selling price are not time series data, a
3. Outlier Detection: line chart would not provide meaningful insights.
A scatter plot makes it easy to spot outliers—data points that deviate significantly • Heat Maps:
from the overall trend. For example, an apartment with a large square footage but a Heat maps could be used to show regions of high or low price per square foot, but
much lower price than expected might suggest an issue with that property (e.g., poor they do not effectively convey the specific relationship between square footage and
price for individual apartments. Heat maps are better for aggregated or categorical 5,750,000) and are measurable. Selling prices are not confined to specific
data. categories or set values.
o Measure: Like square footage, selling price is a measure in Tableau because it
Thus, the scatter plot remains the most effective choice for this analysis. is a numeric field that can be aggregated or averaged across different
categories.
Based on the data types provided, differentiate between discrete and continuous data.
Identify which ones are dimensions and which ones are measures. Dimensions vs. Measures in Tableau
Understanding the difference between discrete and continuous data is crucial in data • Dimensions:
visualization, as it impacts how you visualize and analyze the data. Here’s how the data types Dimensions are typically categorical fields that describe the data. They are used to
from the "Urban Heights" case break down: segment, filter, or group data for analysis. In this case:
o The list of amenities (e.g., swimming pool, gym, etc.) is a dimension because
Discrete vs. Continuous Data it is a categorical variable that describes features of the apartments.
o If the data included locations or complex names, these would also be
• Discrete Data: dimensions because they help categorize the data.
Discrete data refers to values that are distinct and separate. These values cannot be • Measures:
subdivided into smaller parts. Discrete data often represent categories, labels, or items Measures are numeric fields that can be aggregated or calculated. They are the values
that can be counted. we want to analyze or measure. In this case:
• Continuous Data: o Square footage and selling price are measures because they are numeric
Continuous data, on the other hand, can take any value within a range and can be values that can be averaged, summed, or used in calculations.
divided into finer increments. Continuous data is usually numeric and measurable.
Additional Insights:
Discrete Data for Urban Heights:
Understanding the difference between dimensions and measures is essential because it
1. List of Amenities: dictates how you’ll build visualizations in Tableau. For example, dimensions like amenities
o Data Type: Discrete might be used to filter the data (e.g., show only apartments with swimming pools), while
o Explanation: The list of amenities offered (e.g., swimming pool, gym, park, measures like selling price can be used to create aggregated metrics (e.g., calculate the
etc.) is discrete because there are distinct categories of amenities. A given average price of apartments with swimming pools).
apartment either has a swimming pool or it doesn’t; it cannot have a fraction
of a swimming pool. These categories are separate and cannot be meaningfully When creating a scatter plot to visualize the relationship between square footage and selling
divided into smaller parts. price, dimensions like amenities could be used to color-code the points on the scatter plot,
o Dimension: Since amenities are categorical data that help to segment or filter allowing you to see how different amenities affect the price-per-square-foot relationship.
the data, they are treated as dimensions in Tableau. Dimensions typically
describe qualitative characteristics. Sigmoid Function

Continuous Data for Urban Heights: To create a sigmoid function in Tableau, you need to use calculated fields since Tableau
doesn’t have built-in support for complex mathematical functions like the sigmoid directly.
1. Square Footage of Apartments: However, with some basic mathematical understanding, you can create a sigmoid function
o Data Type: Continuous using Tableau's formula language.
o Explanation: Square footage is continuous because it is a numerical measure
that can take any value within a given range. You can measure square footage A sigmoid function is often used in logistic regression, neural networks, and other machine
down to any level of precision (e.g., 1200 square feet, 1250.5 square feet). learning models to output values between 0 and 1, and its general form is:
Continuous data like square footage is typically used to measure or quantify.
o Measure: Since it is a numeric field used for calculations and aggregations
(e.g., finding the average size of apartments), square footage is treated as a
measure in Tableau.
2. Selling Price of Apartments:
o Data Type: Continuous
o Explanation: Selling price is also continuous because it can take any value
within a certain range. Prices can vary widely (e.g., INR 3,000,000 or INR
Here, e is the base of the natural logarithm, and x is the input variable. in logistic regression, it helps model probabilities that are constrained between 0 and
1.
3. Smoothing:
The sigmoid function smooths the input values, making it a good choice for
visualizing probability-like data where a smooth transition between 0 and 1 is
important. This is often useful in data that trends toward extremes but where you still
want a smooth curve rather than a sharp cut-off.
4. Interpretation of Probability:
The output of the sigmoid function is often interpreted as a probability in logistic
regression models. When used in visualizations, this can help analysts understand the
likelihood of certain outcomes, such as customer churn, purchase likelihood, or other
binary outcomes.
5. Data Scaling:
The sigmoid function can be used for scaling data in scenarios where you need a
compressed, non-linear scale. This can help reduce the impact of outliers and extreme
values, making the data easier to visualize and interpret.

Steps to Create a Sigmoid Function in Tableau Disadvantages of Using a Sigmoid Function in Tableau

1. Open Tableau Desktop and load your data. 1. Vanishing Gradients:


2. Go to the Data pane and create a calculated field. One of the most well-known problems with the sigmoid function in machine learning
3. In the calculated field editor, you can define the sigmoid function as follows: is the vanishing gradient problem. If the input values are too high or too low (e.g.,
very large positive or negative numbers), the function's output can become saturated
Sigmoid Formula in Tableau: at 0 or 1. In such cases, small changes in input won't result in meaningful changes in
the output. For Tableau users, this means that if you apply the sigmoid function to
1 / (1 + EXP(-[Your_Field])) values with large variations, many of them may appear clustered at the extremes (near
0 or 1), losing interpretability.
• Replace [Your_Field] with the variable you want to transform using the sigmoid 2. Non-Linear Transformation:
function. The sigmoid function introduces non-linearity into the data, which may not always be
• EXP() is Tableau’s built-in function for calculating the exponent. desirable, especially if you are looking for a linear relationship between variables. The
non-linear nature of the function can sometimes obscure trends in the data, especially
For example, if you're applying this sigmoid function to a numeric field named Sales, the for exploratory data analysis.
formula would look like: 3. Output Limited Between 0 and 1:
While this is generally an advantage, there are cases where the compression of values
1 / (1 + EXP(-[Sales])) into the range of 0 to 1 may not be desirable. For instance, if you're dealing with data
that spans a larger range and you want to preserve the full range of variance, a linear
4. After creating the calculated field, drag it onto the canvas and use it in your transformation might be more appropriate.
visualizations as needed. 4. Interpretation Complexity:
When visualizing data using the sigmoid function, the interpretation can become less
Advantages of Using a Sigmoid Function in Tableau intuitive for non-technical users. Unlike linear functions, the sigmoid curve introduces
a complexity that may require additional explanation to stakeholders who are not
1. Normalization: familiar with mathematical functions.
A sigmoid function transforms values into a range between 0 and 1. This can be 5. Sensitivity to Input Values:
useful when you're working with metrics that have varying scales, allowing you to If your data contains large outliers or has a wide range of values, applying the sigmoid
compare values across different fields or datasets more easily. It’s particularly useful function could compress your data too much, making it harder to spot meaningful
in predictive models or machine learning implementations. differences between mid-range values. In such cases, other transformations like min-
2. Non-Linearity: max scaling or z-scores may be more appropriate.
The sigmoid function introduces non-linearity into your analysis, which can be
helpful if you’re dealing with complex relationships between variables. For example,
Use Cases for the Sigmoid Function in Tableau in the right format. For example, if you have multiple stages, you may need to unpivot
some columns to turn them into rows.
1. Predictive Analytics: 3. Creating a Path:
If you are building dashboards that integrate predictive models (e.g., logistic Sankey charts rely on smooth curves to represent the flow. You will need to create a
regression for binary classification tasks such as predicting customer churn), the calculated field to define the path that Tableau can follow when connecting nodes.
sigmoid function helps in converting linear model outputs to probabilities between 0 Here’s a sample calculated field that generates a smooth curve for the paths between
and 1, making the results easier to interpret and visualize. nodes:
2. Clustering and Segmentation:
When segmenting customers, products, or other data points, the sigmoid function can INDEX() / (WINDOW_MAX(INDEX()) - 1)
help scale features into a comparable range, allowing for more consistent clustering
results or visualizations. This field is used to define the curve of the lines between the nodes.
3. Binary Classifications:
In cases where the data is used to represent the likelihood of a binary outcome (such 4. Create the Sankey Shape:
as whether a customer will make a purchase or not), the sigmoid function provides an o Use Bezier curves for smooth transitions between nodes. This can be
intuitive way to transform model outputs into probabilities. achieved by calculating intermediary points between the start and end
positions of the flows.
o In the Rows and Columns shelf, drag and drop the respective fields to create
What is a Sankey Chart?
the start and end positions of your Sankey flows.
5. Add Measure to Control Flow Size:
A Sankey Chart is a type of flow diagram that visualizes the movement or flow of data
between different stages, categories, or processes. The width of the lines, or "flows," between Drag the measure that represents the flow magnitude (e.g., quantity, value) into the
Size marks card. This will ensure that the width of the flows is proportional to the
categories is proportional to the magnitude or quantity of the flow. Sankey charts are
particularly useful for showing how elements transition from one state to another or how value they represent.
6. Finalize with Formatting:
quantities split and merge as they move through different stages.
Once the paths are drawn, use formatting techniques to adjust the colors and thickness
of the lines. You can assign distinct colors to the nodes and flows to make the chart
In simple terms, a Sankey chart shows the relationships between different variables, where
more readable.
the flow between categories can represent things like money, energy, materials, or people.
7. Dual-Axis for Labels:
To display the flow labels and make the chart more informative, use a dual-axis in
Common Elements in a Sankey Chart:
Tableau. One axis will display the flow, and the other axis will contain the labels.
1. Nodes: These represent the distinct entities or categories.
Alternatively, you can also leverage Sankey Chart templates available from Tableau’s
2. Links/Flows: These represent the connections or relationships between the nodes.
community forums to speed up the process. However, creating a Sankey from scratch allows
The width of each link represents the magnitude or volume of the flow.
for more customization.
How to Create a Sankey Chart in Tableau
Advantages of Sankey Charts
Tableau does not have a built-in Sankey chart option, but you can create one through custom
1. Visualizing Complex Flows:
techniques using calculated fields and specialized data preparation. Here’s a step-by-step
Sankey charts are excellent for visualizing complex relationships between different
guide:
categories. They make it easy to see how quantities split and merge as they move
through different stages or processes.
Steps to Create a Sankey Chart in Tableau:
2. Proportional Representation:
The width of the flows is proportional to the quantity they represent, making it easy to
1. Data Preparation:
compare different flows at a glance. This makes it ideal for showing differences in
To create a Sankey diagram, your data should have two key components:
magnitude between categories.
o The source (where the flow starts).
3. Clarity of Resource Allocation:
o The target (where the flow ends). Additionally, there should be a value that
Sankey diagrams are particularly useful for tracking resource allocation, such as
represents the size of the flow between the source and the target. Ensure that
energy consumption, material use, or financial flows. They help highlight
your dataset is organized with this structure.
inefficiencies or bottlenecks in a system.
2. Pivot Data:
4. Intuitive Understanding:
Often, you’ll need to pivot your data to ensure that the flows from source to target are
By representing flows with varying widths, Sankey charts provide a more intuitive
understanding of how quantities move from one state to another, especially when 5. Supply Chain Flow:
compared to traditional bar or pie charts. In supply chain management, Sankey charts can visualize the flow of materials or
5. Interactive Potential: goods between suppliers, manufacturing stages, and distribution centers, allowing for
In Tableau, Sankey charts can be interactive, allowing users to hover over flows to see better tracking of resources.
detailed information or to filter the chart dynamically based on user input. 6. Human Resources Allocation:
Sankey diagrams can be used to track employee movement within a company,
Disadvantages of Sankey Charts showing how staff flow from one department to another, how resources are allocated,
and where inefficiencies may exist.
1. Complexity in Creation: 7. Sales and Marketing Attribution:
Unlike standard charts like bar graphs or line charts, Sankey charts require more Sankey charts can help visualize how marketing channels contribute to sales, showing
complex data preparation and calculated fields. Tableau doesn’t offer a built-in how traffic moves from one marketing campaign to conversion or how various
Sankey chart type, so you must manually set up the data structure, which can be time- channels influence customer decision-making.
consuming.
2. Difficulty in Labeling: Example: Energy Flow Use Case
Labeling the flows in a Sankey chart can be challenging, especially when there are
numerous flows with varying widths. Overlapping or closely spaced flows can make An energy company might use a Sankey diagram to show how energy is distributed across
it hard to display text labels clearly. different sectors (industrial, residential, etc.) and how much energy is lost during
3. Overcrowded Visuals: transmission. The company can visualize the efficiency of its energy distribution system by
If there are too many nodes or flows, the chart can become visually cluttered, making seeing how much energy is consumed by each sector and where the greatest losses occur.
it hard for users to interpret the data. This is particularly problematic in datasets with
many categories or highly granular flows. What is a Lollipop Chart?
4. Not Ideal for All Data Types:
Sankey charts are best suited for data that involves flows, such as energy consumption A Lollipop Chart is a variation of a bar chart, where instead of full bars, the data is
or financial transfers. They are not as effective for simple comparisons or time-series represented by a line with a circle (the "lollipop") at the top. The length of the line indicates
data, where other chart types may be more appropriate. the value, while the circle provides a clear visual focus for each data point. This type of chart
5. Limited Scalability: is often used to visualize rankings, comparisons, or discrete quantities.
As the number of categories or connections increases, the Sankey chart becomes
harder to read. This limits its scalability, particularly for datasets with large numbers A Lollipop Chart combines the simplicity of a bar chart with a cleaner and more modern
of nodes or connections. design, making it especially useful when dealing with many data points in a compact space.

Use Cases for Sankey Charts How to Create a Lollipop Chart in Tableau
1. Energy Flow Analysis: Here’s a step-by-step guide to creating a Lollipop Chart in Tableau:
One of the most common uses for Sankey charts is in energy flow diagrams, where
they can show how energy is distributed from various sources (e.g., coal, solar, wind) Step 1: Load Your Data
through different processes (generation, transmission, consumption) and where losses
occur. • Open Tableau and load your dataset. For example, let's say you're working with sales
2. Financial Flows: data across different categories.
Sankey charts are often used to visualize the movement of money, such as showing
how funds are allocated across different departments, projects, or investment Step 2: Create a Bar Chart
portfolios.
3. Customer Journey Analysis: 1. Drag the dimension (e.g., Category) to the Rows shelf.
Companies can use Sankey charts to visualize customer journeys, tracking how 2. Drag the measure (e.g., Sales) to the Columns shelf.
customers move through different stages of a sales funnel, from initial engagement to o This will generate a basic bar chart showing sales across different categories.
conversion.
4. Website Traffic Analysis: Step 3: Duplicate the Measure
In web analytics, Sankey diagrams are useful for visualizing how website visitors
move between different pages or sections of a website. You can see where traffic 1. Now, drag the same Sales measure to the Columns shelf again, right next to the
drops off and identify popular entry or exit points. original.
o You should now have two identical bar charts, one on top of the other.
Step 4: Convert One of the Bars into a Circle it easy to spot differences between values. This is especially helpful in cases where
minor differences between values are significant.
1. Click on the second Sales field in the Marks card (it will be labeled SUM(Sales)). 5. Visual Focus:
2. Change the mark type from Bar to Circle. The combination of a line and a circle draws the eye to the most important part of the
o This will convert the bars from the second Sales axis into individual circles. visualization: the data point itself (the circle). This makes it easier to identify trends,
anomalies, or outliers in the data.
Step 5: Synchronize the Axes
Disadvantages of Lollipop Charts
1. Right-click on the second axis (on the top or bottom of the chart, depending on where
the axis is displayed) and select "Synchronize Axis". 1. Less Suitable for Dense Data:
o This will ensure that both the lines and circles use the same scale, making the Lollipop Charts are not ideal when working with large datasets or highly granular
circles align with the end of the lines from the original bar chart. data. If you have too many data points, the chart can become visually cluttered, and
the individual lollipops may overlap or lose their distinctiveness.
Step 6: Format and Customize 2. Additional Steps to Create:
Unlike bar charts, which are easy to create with one drag-and-drop action, Lollipop
• Size the Circles: Adjust the size of the circles in the Marks card by clicking on Size Charts in Tableau require additional steps (duplicating measures, changing marks,
and dragging the slider. synchronizing axes). This adds a bit of complexity to their creation.
• Color the Circles and Lines: You can color the lollipops by dragging a dimension 3. Less Standard:
(e.g., Category) to the Color mark, giving each lollipop a distinct color. While bar charts are universally understood, Lollipop Charts may not be as familiar to
all audiences. Some viewers might take longer to understand the visualization,
Step 7: Hide the Second Axis especially if they haven’t encountered this chart type before.
4. Limited in Use Cases:
1. Right-click on the second axis (the one added when you duplicated the Sales field) Lollipop Charts are most effective when comparing discrete categories. They are not
and choose "Show Header" to hide the second axis. ideal for showing continuous data or trends over time. For time-series data or
o This cleans up the chart and leaves only the lollipop shapes visible. continuous variables, line charts or bar charts might be better suited.

Step 8: Final Adjustments Use Cases for Lollipop Charts


• You can further format the chart by adjusting labels, grid lines, and background colors 1. Rankings and Comparisons:
as needed to make the Lollipop Chart visually appealing and informative. Lollipop Charts are excellent for visualizing rankings, such as showing the top
products by sales, the best-performing regions, or the most visited websites. The
Now, you’ve created a Lollipop Chart in Tableau that combines both the bar’s quantitative circles emphasize the exact position of each item on the chart, making it easy to
line with a clean, focused circle for each data point. compare different categories.

Advantages of Lollipop Charts Example: Comparing sales across different product categories, with each category
represented by a line and the sales number emphasized by a lollipop.
1. Cleaner Aesthetic:
Lollipop Charts have a more minimalist appearance than traditional bar charts, 2. Highlighting Discrete Data:
making them ideal when you want a sleek, modern design. They reduce visual clutter, When you want to compare discrete categories (e.g., department performance,
making it easier for the viewer to focus on the values at the end of the lines. customer satisfaction scores, or profit margins across regions), a Lollipop Chart
2. Easy to Interpret: provides a clear way to show the relative magnitudes while reducing the visual weight
Like bar charts, Lollipop Charts are simple and intuitive, making it easy for viewers to of the bars.
understand rankings, comparisons, and relative magnitudes. The added circles 3. Visualizing Small Differences:
highlight the actual values, improving readability. Lollipop Charts are useful when the differences between data points are subtle. The
3. Compact: combination of a line and circle makes it easier to notice slight differences compared
Lollipop Charts can convey the same amount of information as bar charts but take up to a traditional bar chart, where bars may appear to blend together.
less visual space. This makes them useful when you're dealing with a large number of 4. Survey Results:
categories or data points. Lollipop Charts are often used to show survey results where respondents choose
4. Emphasizes Data Points: between a set of discrete options. For example, you could use a Lollipop Chart to
The circle (or "lollipop") at the end of each line emphasizes the precise value, making display how many respondents selected each option in a satisfaction survey.
5. Comparing Metrics Across Categories: Creating a dual-axis chart in Tableau is straightforward, but it involves some key steps to
In situations where you need to show several categories (such as department-wise ensure that both axes are properly aligned and formatted.
performance or comparison across different regions), Lollipop Charts make it easy to
compare metrics at a glance while keeping the design minimal. Step-by-Step Guide:

Example: Showing revenue per department, with each department represented by a 1. Load the Data:
circle at the top of a thin line corresponding to the revenue amount. Open Tableau and load the dataset you want to visualize.
2. Drag the First Measure:
6. Performance Analysis: Drag the first measure (e.g., Sales) to the Columns or Rows shelf. For example, you
A Lollipop Chart can help analyze the performance of various metrics in a compact might want to visualize sales over time, so drag Sales to the Columns shelf and Date
and visually appealing way, allowing users to track targets or KPIs (Key Performance (or any other dimension) to the Rows shelf.
Indicators). 3. Add the Second Measure:
Drag the second measure (e.g., Profit Margin) to the opposite shelf (Rows if the first
Example: Showing employee performance where the line shows the target is in Columns, and vice versa). When you drag the second measure, Tableau will
performance and the lollipop indicates the actual performance. This makes it easier to automatically create a second axis.
spot how close the actual performance is to the target. 4. Convert to Dual Axis:
o Right-click on the second axis that appears on the right-hand side of the chart
Conclusion and select Dual Axis. This action will overlay both measures on the same
chart, each using its own axis.
Lollipop Charts are a great alternative to bar charts when you want a cleaner, more focused 5. Synchronize Axes (Optional):
design that emphasizes data points. They are easy to interpret, visually appealing, and ideal If both measures are on similar scales, right-click on the secondary axis and select
for situations where you’re comparing discrete categories or highlighting differences in Synchronize Axis to ensure that both axes are scaled the same way.
ranking or magnitude. However, they are less effective for larger datasets or continuous data. 6. Customize Marks:
By default, Tableau will assign the same mark type to both measures (e.g., line or
In Tableau, Lollipop Charts require additional steps for creation, but they are highly versatile bar). You can change this by clicking on the Marks card for each axis and selecting
in their application and can enhance the clarity of your visualizations in many use cases. different mark types. For example:
o The first measure can be shown as bars.
What is a Dual Axis Chart? o The second measure can be shown as lines.
7. Format and Adjust:
A Dual Axis Chart (also known as a combination chart) allows you to plot two different o Adjust the colors to differentiate between the two measures.
measures (metrics) on the same chart with two vertical axes. Each axis can have its own o Hide or display axis titles depending on your preferences.
scale, making it easier to compare two different types of data that may have different units or o You can also add labels to improve readability.
ranges.
Advantages of Dual Axis Charts
For example, you can plot sales on one axis (in dollars) and profit margin on the second axis
(in percentage). This chart type is useful when comparing different variables with distinct 1. Compare Two Different Metrics:
units or scales. The most significant advantage of a dual-axis chart is that it allows you to compare
two different metrics that might have different scales or units. For example, you can
Key Elements of a Dual Axis Chart: compare revenue (in dollars) and profit margin (in percentage) on the same chart.
2. Efficient Use of Space:
1. Primary Axis: One of the two vertical axes that plots the first measure. Instead of using two separate charts to compare metrics, a dual-axis chart combines
2. Secondary Axis: The second vertical axis that plots the second measure. the information in one place, allowing for more compact and insightful visualizations.
3. Marks: These represent the data points in different visual forms (e.g., bars, lines, or 3. Highlight Relationships:
circles) for each measure on the chart. Dual-axis charts help visualize relationships between two variables. For instance, you
4. Synchronization (Optional): The scales on both axes can be synchronized to make can see how marketing spend impacts sales over time, or how temperature relates
comparisons easier. to energy consumption.
4. Flexibility:
How to Create a Dual Axis Chart in Tableau You can use different chart types (bars, lines, area charts, etc.) for each axis. This
flexibility allows for more expressive visualizations. For instance, you could show
profit as bars and profit margin as a line on the same chart.
5. Focus on Important Metrics: 4. Website Traffic vs. Conversion Rate:
By using two axes, you can bring focus to the most important metrics, allowing users Dual-axis charts can be used to compare website traffic on one axis and conversion
to quickly see both values and trends. rates on the other. This can help an e-commerce company understand whether an
increase in traffic translates into an increase in conversions.
Disadvantages of Dual Axis Charts
Example: A company might want to analyze whether increased website visitors
1. Confusing for Users: during a promotional event resulted in more sales (conversion rates).
If not designed properly, dual-axis charts can be confusing, especially if the two
metrics have vastly different scales. Users may find it difficult to interpret the 5. Stock Price vs. Trading Volume:
relationship between the two axes if they are not synchronized or labeled clearly. A financial analyst might use a dual-axis chart to show stock price on one axis and
2. Cluttered Visuals: trading volume on the second axis. This helps visualize the relationship between
Dual-axis charts can become cluttered, especially if there are too many data points or stock price movements and market activity.
if the visual design (colors, axis labels, legends) is not well thought out. The chart can
quickly become overwhelming and hard to interpret. Example: Tracking stock price trends against the volume of shares traded to identify
3. Difficult to Synchronize: potential buying or selling pressures.
In cases where the two measures have significantly different ranges (e.g., sales in the
millions and profit margin in percentages), synchronizing the axes might distort the 6. Production Output vs. Downtime:
visualization or reduce its interpretability. In manufacturing, a dual-axis chart can be used to compare production output on
4. Misleading Interpretation: one axis with machine downtime on the other. This helps managers understand
If the scales are not appropriately matched or clearly labeled, users may misinterpret whether machine maintenance or failures are affecting overall output.
the relationship between the two variables. For example, one metric may appear to
rise faster than the other due to different scales. Example: A factory might track how machine failures correlate with drops in
5. Not Ideal for Complex Data: production output over a specific period.
Dual-axis charts are best used for simple comparisons. When dealing with more
complex datasets involving multiple variables, using more sophisticated chart types Conclusion
(such as scatter plots or heatmaps) may be more effective.
Dual Axis Charts in Tableau are powerful tools for visualizing two related metrics on the
Use Cases for Dual Axis Charts same chart. They are especially useful when you need to compare variables with different
units or scales, such as sales vs. profit margin, marketing spend vs. ROI, or temperature vs.
1. Sales and Profit Analysis: energy consumption.
A common use case for dual-axis charts is comparing sales and profit over time. You
can show sales on one axis (as bars) and profit margin on the other axis (as a line). However, they can become confusing if not designed carefully, especially when the two
This setup allows users to see both metrics at the same time and understand how metrics have vastly different ranges. For this reason, careful attention must be paid to axis
profit trends relative to sales. labels, synchronization, and formatting to avoid misinterpretation.

Example: A retail company might want to compare sales revenue with profit margin In summary, dual-axis charts are best suited for comparing two measures side by side,
to understand whether higher sales correspond to higher profits. allowing you to leverage the flexibility of combining different types of data on a single,
compact visualization.
2. Marketing Spend vs. ROI:
Another use case is tracking marketing spend on one axis and return on investment What is Level of Detail (LOD) in Tableau?
(ROI) on the second axis. This shows how investments in marketing lead to returns
over time or across different channels. Level of Detail (LOD) expressions in Tableau are a powerful feature that allows you to
control the granularity (or "level of detail") of your calculations independently of what is
Example: A digital marketing agency might track marketing expenditure against the shown in your visualization. Typically, Tableau aggregates data based on the dimensions in
revenue generated by each campaign to assess effectiveness. the view (i.e., the fields on the Rows and Columns shelves). However, LOD expressions let
you define calculations at a different level of detail, regardless of the granularity of the view.
3. Temperature vs. Energy Consumption:
A utility company might use a dual-axis chart to track temperature (on one axis) and In essence, LOD expressions help you perform calculations either at a finer (more
energy consumption (on the second axis) to analyze how external factors influence granular) level, at a coarser level, or at a specific level of aggregation than what’s
energy demand. displayed in the visualization.
Types of LOD Expressions 3. Per-Category Average Calculation:
Suppose you're looking at sales by product category but want to calculate the average
There are three types of LOD expressions in Tableau: sales per product within each category. An Include LOD like {INCLUDE
[Product]: AVG([Sales])} allows you to account for product-level data even when
1. Fixed LOD you're viewing category-level data.
o Definition: A Fixed LOD expression calculates values at the specified level of 4. Removing Extra Granularity:
detail, regardless of what dimensions are in the view. This allows you to create If you have a detailed view (e.g., sales broken down by product and region) but want
a calculation that is unaffected by the granularity of the current visualization. to calculate a higher-level total (e.g., total sales by region), you can use an Exclude
o Syntax: {FIXED [Dimension1], [Dimension2]: SUM([Measure])} LOD like {EXCLUDE [Product]: SUM([Sales])} to roll up the data to the region
o Example: level while ignoring the product dimension.
Suppose you want to calculate the total sales by region, but your view shows
sales by individual products. You can use {FIXED [Region]: Advantages of LOD Expressions
SUM([Sales])} to aggregate sales at the region level, even though the view is
broken down by product. 1. Fine Control over Aggregation:
2. Include LOD LOD expressions allow you to control exactly how Tableau aggregates your data,
o Definition: An Include LOD expression adds a dimension to the existing level independently of the dimensions or filters present in the view. This flexibility is
of detail in the view. This allows you to include more granular data in your particularly useful when you need calculations that don’t align with the current
calculation, even if that level of granularity is not present in the current view. visualization’s level of detail.
o Syntax: {INCLUDE [Dimension]: SUM([Measure])} 2. Simplifies Complex Calculations:
o Example: LOD expressions simplify the process of performing calculations that might otherwise
If you want to calculate average sales per customer even though your view require more complex data manipulation or reshaping. They enable powerful
shows sales by region, you can use {INCLUDE [Customer Name]: calculations that would be hard to perform using simple calculated fields.
SUM([Sales])} to include sales per customer in your calculation while still 3. Improved Analysis Flexibility:
keeping the region-level granularity in the view. LOD expressions give you more freedom to analyze data at multiple levels of
3. Exclude LOD granularity in the same view. For example, you can look at total sales per region while
o Definition: An Exclude LOD expression removes a dimension from the view simultaneously calculating the average sale per customer within each region.
to perform a calculation at a less granular level. This allows you to exclude 4. Preserves Data Context:
specific dimensions in your calculation. LOD expressions can maintain important context in your calculations. For example,
o Syntax: {EXCLUDE [Dimension]: SUM([Measure])} you can aggregate data at a higher level (e.g., total sales for a country) while still
o Example: showing data at a more granular level (e.g., sales by city).
If you have a view that shows sales by product and region, but you want to
calculate total sales by region, ignoring the product breakdown, you can use Disadvantages of LOD Expressions
{EXCLUDE [Product]: SUM([Sales])} to calculate regional sales by
excluding the product level. 1. Performance Issues with Large Datasets:
When using LOD expressions with large datasets, particularly Fixed LOD
Use Cases for LOD Expressions in Tableau expressions, Tableau may need to perform more calculations, which can slow down
your dashboard or workbook performance. Complex LOD expressions applied to
1. Calculating a Fixed Total: large datasets can increase query processing time.
If you want to calculate a total that doesn't change depending on what dimensions are 2. Complex Syntax:
in the view, you can use a Fixed LOD. For example, if you want to calculate the total For users unfamiliar with Tableau’s calculation language, LOD expressions can be
sales for each region, but you are looking at the data broken down by month and challenging to write and understand. Their syntax may seem complex, especially
product, {FIXED [Region]: SUM([Sales])} will give you the total sales for the when combining multiple expressions or when working with more advanced
region, regardless of the product or month. calculations.
2. Customer Lifetime Value: 3. Confusion in Interpretation:
To calculate customer lifetime value (CLV), you need to sum up all purchases a Since LOD expressions operate independently of the view’s level of detail, it can be
customer has made, regardless of how you break down the data (e.g., by individual confusing for new users to interpret why certain values don’t change with the view’s
orders or dates). You can use a Fixed LOD expression like {FIXED [Customer ID]: filters or dimensions. This disconnect between what’s displayed and what’s calculated
SUM([Sales])} to calculate the total sales for each customer. can sometimes lead to misunderstandings.
4. Learning Curve:
There is a learning curve associated with understanding how LOD expressions work,
particularly when users are new to Tableau or not familiar with aggregate In Tableau, a difference chart can be created using multiple techniques, such as combining
calculations. This can slow down the initial learning process for users. line charts or area charts with calculated fields to show the difference between the two
measures.
Use Cases for Level of Detail in Tableau
How to Create a Difference Chart in Tableau
1. Customer Segmentation:
You can use LOD expressions to segment customers based on their total purchase Here’s a step-by-step guide on how to create a difference chart in Tableau:
amount (lifetime value). For example, you can calculate the total sales for each
customer using a Fixed LOD ({FIXED [Customer ID]: SUM([Sales])}), then Step 1: Load Your Data
segment customers into different tiers (e.g., Gold, Silver, Bronze).
2. Cohort Analysis: • Open Tableau and load your dataset. For example, you might want to compare sales
LOD expressions are great for cohort analysis where you need to group customers or and profit over time.
users by specific cohorts (e.g., acquisition month) and calculate metrics like retention
or lifetime value. You can use a Fixed LOD to calculate total sales per cohort and Step 2: Create a Line Chart for One Measure
then apply additional analysis to see trends.
3. Sales per Region, Ignoring Date: 1. Drag a dimension (e.g., Date) to the Columns shelf.
Suppose you are showing sales trends over time but also want to include a fixed total 2. Drag the first measure (e.g., Sales) to the Rows shelf to create a line chart.
for each region, regardless of the month selected. You can create a Fixed LOD
expression ({FIXED [Region]: SUM([Sales])}) to display the total sales for each Step 3: Add the Second Measure
region, no matter how you filter the view by date.
4. Proportion of Total: 1. Drag the second measure (e.g., Profit) to the Rows shelf, below the first measure.
LOD expressions allow you to calculate the proportion of a value relative to a higher- 2. Now, you’ll see two separate line charts, one for Sales and one for Profit.
level total. For example, if you want to calculate a product’s sales as a percentage of
the total regional sales, you can use LOD expressions to first calculate the total sales Step 4: Create a Calculated Field for the Difference
by region ({FIXED [Region]: SUM([Sales])}), then divide the individual product
sales by the total regional sales. 1. Create a new calculated field that represents the difference between the two
5. Filtering Without Affecting Calculations: measures. For example:
If you have a filter applied to the view (e.g., filtering by region or date), you can use a
Fixed LOD expression to ensure that certain calculations (like overall sales or total plaintext
customer count) remain unaffected by the filter. This is helpful in reports where you Copy code
[Sales] - [Profit]
want to see both the filtered and unfiltered results simultaneously.
o This calculated field will compute the difference between sales and profit at
Conclusion each point in time.
Level of Detail (LOD) expressions in Tableau offer a powerful way to control how data is Step 5: Add the Difference to the Visualization
aggregated and calculated, independent of the dimensions in the view. With Fixed, Include,
and Exclude LOD expressions, you can perform more granular or higher-level calculations, 1. Drag the calculated field (e.g., Sales vs. Profit Difference) to the Rows shelf,
allowing for more complex and meaningful data analysis. While they provide significant alongside the original measures.
flexibility, LOD expressions can also introduce complexity, so understanding when and how 2. Depending on your preference, you can display the difference as an area chart, a bar
to use them effectively is essential for mastering Tableau. chart, or another type of chart that visually highlights the gap between the two
measures.
What is a Difference Chart?
Step 6: Synchronize Axes (Optional)
A Difference Chart is a type of chart used to show the difference between two related
measures over time or across categories. It visually highlights the gap between two data • If the two measures you’re comparing have different scales, you might want to
series, making it easier to see changes, trends, or variations between them. Difference charts synchronize the axes so that the comparison becomes clearer. Right-click on one of
are often used to compare metrics such as actual vs. target performance, sales vs. costs, or the axes and select Synchronize Axis.
forecast vs. actual results.
Step 7: Format the Chart
• Customize colors: Use distinct colors to highlight positive vs. negative differences. 1. Actual vs. Target (Budget) Performance:
For example, green for positive values and red for negative values. A common use case is to compare actual performance against a target or budget. For
• Adjust labels: Add labels to the chart to clearly show the difference values at each example, a company can use a difference chart to compare actual sales vs. target
data point. sales over time to easily spot whether they are exceeding or falling short of their
• Highlight important differences: You can use annotations to highlight key insights, goals.
such as significant changes between the two measures at specific points.
Example: Comparing monthly sales vs. the budgeted target for a fiscal year, where
Advantages of Difference Charts the difference highlights over- or under-performance in each month.

1. Visual Clarity for Comparisons: 2. Sales vs. Costs:


Difference charts are excellent for making it easy to see the gap between two Companies can use a difference chart to compare sales vs. cost of goods sold
measures over time or across categories. They provide a clear visual indication of (COGS). The difference between the two measures shows the gross profit over time,
whether the values are converging, diverging, or staying consistent. which can help in analyzing profitability trends.
2. Highlight Trends:
This type of chart helps to highlight trends, especially when there are large differences Example: Comparing revenue with expenses to highlight profit margins across
between two measures. It can also show when one measure exceeds another or when different quarters or product categories.
both are moving in opposite directions.
3. Effective for Time Series Data: 3. Profit vs. Loss:
Difference charts are particularly useful for comparing two measures over time, A business can use a difference chart to compare profit vs. loss over a period. The
making it easy to spot seasonal trends, fluctuations, or periods of over- or under- chart makes it easy to visualize periods of profit or loss and to understand what drove
performance. those differences.
4. Good for Target vs. Actual Performance:
When comparing actual vs. target performance (e.g., sales vs. budget, forecast vs. Example: Comparing operating profit with expenses for a retail company across
actual), a difference chart visually illustrates how well you are doing relative to your multiple stores, allowing management to see which stores are generating more profit
goals. and which are operating at a loss.
5. Customizable:
Tableau allows a high degree of customization for difference charts, including color 4. Stock Price vs. Market Index:
coding, annotations, and synchronized axes, which helps to improve interpretability. Investors can use a difference chart to compare the performance of a specific stock
against a broader market index (e.g., S&P 500). The difference between the stock
Disadvantages of Difference Charts and the index shows whether the stock is outperforming or underperforming the
market.
1. Not Ideal for More Than Two Measures:
Difference charts are most effective when comparing two measures. Adding more Example: Comparing the performance of Apple stock with the NASDAQ index over
than two measures to the comparison can make the chart cluttered and difficult to time to assess relative performance.
interpret, requiring alternative chart types for better clarity.
2. Scalability Issues: 5. Employee Productivity vs. Target Productivity:
If the two measures being compared have vastly different scales (e.g., one in HR departments might use difference charts to visualize employee productivity vs.
thousands and the other in percentages), synchronizing the axes can distort the target productivity. This helps to quickly spot periods where employees are
visualization, or you may need to use dual axes. This can sometimes make the chart exceeding expectations or falling behind.
harder to read and interpret.
3. Not Suitable for All Data Types: Example: Comparing the number of tasks completed by employees vs. the target
Difference charts are not ideal for comparing categorical data or non-continuous number of tasks, allowing managers to identify underperformance or high achievers.
variables. They work best with time series or quantitative data where differences
between two measures can be computed and are meaningful. 6. Forecast vs. Actual Results:
4. Performance on Large Datasets: In forecasting scenarios, difference charts can be used to compare forecasted results
If you are working with large datasets and complex calculated fields, Tableau may with actual results. This can help identify how accurate predictions were, and
slow down, especially if you are calculating differences at a highly granular level. whether adjustments need to be made to future forecasts.

Use Cases for Difference Charts Example: Comparing forecasted revenue for a product launch vs. the actual revenue
after the launch, which helps evaluate the accuracy of the forecast.
Conclusion Step 5: Format and Customize

Difference Charts in Tableau are a great way to visualize the variation between two related • Adjust Colors: You can customize the color scheme to distinguish between the
measures over time or across categories. They make it easy to spot differences, trends, or subcategories.
patterns between the two measures, and are particularly useful in business scenarios like • Add Labels: To add value labels for each segment of the stack, click on the Label
comparing actual vs. target performance, sales vs. costs, and profit vs. loss. option in the Marks card.
• Adjust Sorting: If necessary, you can sort the categories or subcategories to ensure
However, they are best suited for comparing two measures at a time, and their effectiveness the most meaningful comparisons.
diminishes when trying to compare more than two variables or when dealing with categorical
data. Despite some limitations, difference charts are highly valuable for time series data and Now you have a stacked bar chart that shows the cumulative values across categories, with
goal-tracking scenarios, offering clear visual insights that aid decision-making. each bar split into segments based on the chosen subcategory or dimension.

What is a Stacked Bar Chart? Advantages of Stacked Bar Charts

A Stacked Bar Chart is a type of bar chart in which bars are divided into segments that 1. Easy Comparison of Totals and Parts:
represent different subcategories or components of a whole. The bars are stacked vertically or Stacked bar charts allow you to compare both the total values (represented by the full
horizontally, and the total length of each bar represents the cumulative value of all the length of the bars) and the individual components (the stacked segments within the
segments. Each segment of the bar represents a different part of the whole, making it easy to bars). This makes them ideal for comparing both absolute and relative contributions
compare both the total values and the individual segments across different categories. of subcategories.
2. Efficient Use of Space:
How to Create a Stacked Bar Chart in Tableau By stacking subcategories in a single bar, stacked bar charts allow you to present
more information within a smaller space. This is particularly useful when comparing
Here’s a step-by-step guide on creating a stacked bar chart in Tableau: multiple categories or dimensions without overwhelming the viewer.
3. Clear Representation of Distribution:
Step 1: Load Your Data Stacked bar charts make it easy to see the distribution of subcategories within each
bar, helping viewers understand the composition of each total and how different parts
• Open Tableau and connect to your dataset, which could contain categories (e.g., contribute to the whole.
products) and subcategories (e.g., sales regions or departments). 4. Quick Visual Comparison:
Stacked bar charts offer a quick way to visually compare how different subcategories
Step 2: Build a Bar Chart contribute to the overall value for each main category. For example, it’s easy to see
which segment dominates in different categories.
1. Drag the dimension you want to stack (e.g., Category or Region) to the Columns 5. Shows Trends Over Time:
shelf. When used with a time series dimension (e.g., months, years), stacked bar charts can
2. Drag the measure you want to visualize (e.g., Sales or Profit) to the Rows shelf. This show how the total value and the distribution of subcategories change over time,
creates a simple bar chart. making it easy to spot trends.

Step 3: Add a Dimension for Stacking Disadvantages of Stacked Bar Charts

1. Drag another dimension (e.g., Sub-Category or Segment) to the Color shelf on the 1. Difficult to Compare Individual Segments:
Marks card. While stacked bar charts make it easy to compare the totals across categories, it can
o This will split the bars into stacked segments, with each color representing a be difficult to compare individual segments within the bars. The segments in the
different subcategory or segment. middle or bottom of the bars can be hard to measure accurately against one another,
especially if the bars are different lengths.
Step 4: Adjust the Stack 2. Visual Clutter:
Stacked bar charts can become visually cluttered when there are too many categories
• If Tableau does not automatically stack the bars, you can manually adjust it: or subcategories. If the bars have too many segments, it can be difficult to
o Click on the Analysis tab in the top menu and ensure that Stack Marks is set differentiate between them, making the chart harder to interpret.
to "On." 3. Not Ideal for Fine Detail Comparison:
Stacked bar charts are not ideal when you need to make precise comparisons between
subcategories across different bars. If accurate segment-level comparisons are
required, other chart types (e.g., grouped bar charts or line charts) may be more Example: A company can visualize total profit for each quarter, broken down by
appropriate. product line (e.g., electronics, furniture, appliances) to see which lines contributed the
4. Overlapping Data Values: most to overall profits during each quarter.
In a stacked bar chart, each bar represents a cumulative total, and this can obscure
small differences in the individual segments. Small subcategories may be visually 5. Resource Utilization:
overwhelmed by larger ones, making them difficult to see and compare. In project management, stacked bar charts can be used to show resource utilization
5. Harder to Interpret for Many Data Points: across different teams or departments. Each bar could represent a project, with the
As the number of data points increases, stacked bar charts become harder to read and segments showing how much of each team’s resources were allocated to the project.
interpret, especially if there are many subcategories to compare.
Example: A project manager can use a stacked bar chart to show how different teams
Use Cases for Stacked Bar Charts (e.g., engineering, marketing, operations) contributed to various phases of a project,
helping to allocate resources more effectively.
1. Sales by Region and Product:
Stacked bar charts are frequently used in sales analysis to show total sales broken 6. Market Share Analysis:
down by product or region. For example, you can stack different product categories Stacked bar charts are ideal for showing market share across different competitors or
(e.g., electronics, apparel, home goods) within bars that represent each region, making brands. Each bar can represent total market size for a given period, with the segments
it easy to see which products contribute the most to total sales in each area. representing the market share of individual brands or companies.

Example: A company can visualize the sales performance of different products across Example: A company can visualize how its market share compares to competitors
various geographic regions. The total bar length shows total sales in each region, across different regions or time periods, with each bar showing the total market size
while the stacked segments represent the individual contribution of each product and the segments representing individual brands' share of the market.
category.
Conclusion
2. Budget Allocation:
Stacked bar charts can be used to visualize how a total budget is divided among Stacked Bar Charts are a highly effective tool for visualizing both the total values and the
different departments or projects. Each bar could represent a department's budget, individual components that make up those totals. They work well when you need to show the
with the segments showing how the budget is allocated across different projects or composition of different categories or segments, such as sales by region, budget allocation, or
initiatives. market share distribution. However, they can become cluttered when there are too many
segments, and they are not ideal for precise comparisons of individual subcategories.
Example: A university might use a stacked bar chart to show its budget allocation for
different departments (e.g., research, administration, infrastructure), with each Despite their limitations, stacked bar charts are widely used in business, sales analysis,
segment representing subcategories like faculty salaries, research funding, and student finance, and project management, offering a clear and compact way to present data with
services. multiple dimensions.

3. Customer Segmentation:
Companies can use stacked bar charts to analyze customer segments (e.g., age
groups, income levels) within various categories (e.g., product preferences,
purchasing behavior). This provides a clear view of how different segments contribute
to overall customer behavior.

Example: A retail company can visualize its sales broken down by customer
segments like gender, age group, and income level to understand which segment
contributes the most to overall sales in each product category.

4. Profit and Loss by Quarter:


A business might use a stacked bar chart to show profit and loss over time, with each
bar representing a quarter. The segments within each bar could show different profit
centers, such as different product lines or business units.

You might also like