0% found this document useful (0 votes)
5 views8 pages

Tableau Daily Notes

The document provides an overview of Tableau and its data source file formats, including the transition from .tds to .hyper. It outlines common date functions in Tableau, the differences between OLTP and OLAP systems, and the structure of data warehouses, including star and snowflake schemas. Additionally, it discusses the importance of normalization in database design and the benefits and drawbacks of different normalization forms.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views8 pages

Tableau Daily Notes

The document provides an overview of Tableau and its data source file formats, including the transition from .tds to .hyper. It outlines common date functions in Tableau, the differences between OLTP and OLAP systems, and the structure of data warehouses, including star and snowflake schemas. Additionally, it discusses the importance of normalization in database design and the benefits and drawbacks of different normalization forms.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

📅

🔹 📘 ➡ ➡ ➡
Tableau
Tools used before tableau:
Crystal Reports- formatting reports in-voice layout
Excel (Microsoft)-basic dashboards, tables, charts
Oracle Discoverer-reporting from oracleDB
IBM Cognos (till 2000)-reporting,scorecards,OLAP

Spreadsheets Static Reports OLAP Cubes Dashboards Self-


Service BI AI-Driven Analytics

.tds(Tableau Data Source)


.tds or .hyper are temporary files Tableau creates when you connect to a data
source. They're called shadow extracts, and they help Tableau load data
faster in Tableau Desktop.

.tds acts like template to connect with actual datasource , contain details like
instructions on how to connect to and access the data

Note: .tds files are no longer supported for Tableau versions beyond 2024.2. All
extracts are now in .hyper format.

Here’s a clear and practical list of Date Functions in Tableau, commonly used
for reporting, filtering, comparisons, and custom date logic.

Sales Ratio : [Sales] / TOTAL(SUM([Sales]))


Profit ratio : sum([Profit]) /sum([Sales])

Common Date Functions in Tableau


1. TODAY()
● Returns the current date.
● Example: TODAY() → 2025-06-10

2. NOW()
● Returns the current date and time.
● Example: NOW() → 2025-06-10 11:05:30 AM

3. DATEPART(part, date)
● Returns a number representing the part of the date.
● Example:
DATEPART('month', [Order Date]) → 6
DATEPART('weekday', [Order Date]) → 1–7

4. DATENAME(part, date)
● Returns the name (as text) of the part of a date.
● Example:
DATENAME('month', [Order Date]) → "June"
DATENAME('weekday', [Order Date]) → "Tuesday"

5. DATEDIFF(part, start_date, end_date)


● Returns the difference between two dates, in units like year, month,
day.
● Example:
DATEDIFF('year', [Order Date], TODAY()) → 2 (years difference)

6. DATEADD(part, number, date)


● Adds (or subtracts) a number of units to/from a date.
● Example:
DATEADD('month', 1, [Order Date]) → adds 1 month
DATEADD('day', -7, TODAY()) → 7 days ago

7. DATETRUNC(part, date)
● Truncates the date to the beginning of the specified part.
● Example:
DATETRUNC('year', [Order Date]) → 2025-01-01
DATETRUNC('month', TODAY()) → first day of current month

8. MAKEDATE(year, month, day)


● Creates a date from year, month, and day values.
● Example: MAKEDATE(2023, 5, 10) → 2023-05-10

9. MAKEDATETIME(date, time)
● Creates a date and time value.
● Example:
MAKEDATETIME("2024-06-10", "14:00:00") → 2024-06-10 2:00 PM

10. ISDATE(string)
● Checks if a string is a valid date.
● Example: ISDATE("2024-01-01") → TRUE

11. MIN() and MAX() (with dates)


● Returns the earliest or latest date from a field.
● Example: MIN([Order Date]) → first order date

Common Use Cases


Task Function(s)
Year-to-Date (YTD) filter YEAR([Order Date]) =
YEAR(TODAY())
Last 30 days [Order Date] >= DATEADD('day',
-30, TODAY())
Current Month Sales DATETRUNC('month', [Order
Task Function(s)
Year-to-Date (YTD) filter YEAR([Order Date]) =
YEAR(TODAY())
Last 30 days [Order Date] >= DATEADD('day',
-30, TODAY())
Current Month Sales DATETRUNC('month', [Order
Date]) = DATETRUNC('month',
TODAY())
Previous Year Sales YEAR([Order Date]) =
YEAR(TODAY()) - 1

Data warehouse Summary:

. All popular data warehouses use SQL as their primary querying


language.
. Every data warehouse is different therefore their SQL flavors are also
different
○ For instance, Snowflake has the median & ratio_to_report

functions which are not available in other data warehouses.


. Data warehouses are split into a few layers
○ Storage
○ Processing
○ Infrastructure management

. One can connect to data warehouses using


○ Web clients
○ Command line
○ BI tools like Tableau & Mode analytics
○ Programmatic languages like Python
○ Third-party ETL connectors like Hevo

Key concept of data warehouse:


1 Hosted & self-managed on the cloud
○ There is no need to provision hardware or software.
○ There is no need to update or maintain modern data warehouses.
Data warehouse providers take care of it.
○ Based on need, the hardware can be changed by clicking a few

buttons
. Performance at scale
○ Data warehouses are built to query billions of rows (structured &

semi-structured)
○ Data can be streamed in and out of warehouses multiple times

within one second


○ More than 15,000 rows can be inserted in one go

. Usage-based pricing
○ Pay only when you query, load, or unload.
○ Pay only for the resources used at that point in time

. Central repository (Single source of truth)


○ Data from multiple sources are migrated into data warehouses
○ They are cleaned, transformed, and aggregated to derive insights
○ Business Intelligence tools like Sisense data and Tableau are

plugged on top of aggregated data


○ Aggregated data are also fed into machine learning models

. Highly secure
○ Data warehouses are often HIPPA and GDPR compliant
○ Data can be masked as needed
○ Supports MFA (multi-factor authentication)
○ Data is encrypted during transport & rest
○ Access controls can go as granular as possible

. Data Marketplace
○ Data warehouses come with a marketplace which gives us access

to:
◆ Geospatial data: This data can be joined with production data

to perform demography-based analysis


◆ Public health datasets including COVID: This data can be used

to analyze health trends


◆ Public sector financial data: This can be used to monitor the

economy

OLTP(Online Transaction processing):

● Purpose: Manages day-to-day business operations and transactions


like order entry, banking transactions, and retail sales.

● Characteristics:
○ High volume of small, short transactions.

○ Focus on data consistency and reliability.

○ Optimized for quick reads and writes.

○ Relational databases are commonly used.

● Examples: Banking systems, e-commerce websites, inventory


management systems.

● Data Storage: Typically uses normalized databases to ensure data


integrity.

OLAP (Online Analytical Processing):

● Purpose: Analyzes historical data to provide insights for business


decision-making.

● Characteristics:
○ Complex queries and data analysis on large datasets.

○ Focus on multidimensional analysis (e.g., slicing and dicing data).

○ Optimized for complex analytical queries.

○ Data warehouses and data marts are commonly used.

● Examples: Sales reporting, market analysis, financial forecasting,


budgeting.

● Data Storage: Often uses denormalized databases or data cubes to


facilitate faster analysis.

In essence, OLTP handles the "what" (current transactions), while OLAP


explores the "why" (historical trends and patterns).

ETL (Extract, Transform, Load): This process extracts data from source
systems, transforms it into a consistent format, and loads it into the data
warehouse

Today’s most popular data warehouses are Snowflake, AWS Redshift, Google
BigQuery, and Azure Synapse Analytics.

Data modeling technique

A star schema is a simple, denormalized structure with a central fact table


surrounded by dimension tables. A snowflake schema, on the other hand, is a
normalized structure with dimension tables that can be further broken down
into sub-dimensions.

Star Schema:
● Structure: A central fact table surrounded by dimension tables.
● Denormalized: Dimension tables are not further normalized.
● Simplicity: Easier to understand and query.
● Performance: Generally faster for simpler queries due to fewer joins.
● Use Cases: Ideal for simpler data analysis and BI reporting.

Snowflake Schema:
● Structure: A central fact table with dimension tables that can be
further normalized into sub-dimension tables.

● Normalized: Dimension tables are normalized, reducing data


redundancy.

● Complexity: More complex to design and understand than star


schema.

● Performance: May be slower for complex queries due to more joins,


but more efficient for storage.

● Use Cases: Suitable for complex datasets with many hierarchies and
when data consistency is crucial.

Galaxy Schema (Fact Constellation):

Multiple fact tables sharing dimension tables.


Advantages: Supports complex queries across multiple business
processes.

Key Differences Summarized:


Feature Star Schema Snowflake Schema
Normalization Denormalized Normalized
Complexity Simpler More Complex
Performance (simple Faster Slower
queries)
Performance (complex Slower Faster
queries)
Data Redundancy Higher Lower
Storage Higher Lower

Fact tables store numerical, measurable data, while dimension tables


provide descriptive context for that data. They work together to enable
efficient data analysis and reporting.

Fact Table:
● Purpose:
Stores quantitative data, often referred to as "facts" or "measures,"
that represent business events or processes.

● Contents:
Includes numerical metrics (e.g., sales amount, order quantity, profit),
and foreign keys that link to dimension tables.

● Example:
A sales fact table might include columns for order ID, product ID,
customer ID, order date, and sales amount.

Dimension Table:
● Purpose:
Stores descriptive attributes about the business context related to the
facts in the fact table.

● Contents:
Contains textual and categorical data (e.g., product names, customer
demographics, time periods), providing context for analysis.

● Example:
A dimension table for products might include columns for product ID,
product name, category, price, and supplier.

Key Differences:
● Data Type:
Fact tables primarily contain numerical data, while dimension tables
contain descriptive attributes.

● Relationship:
Fact tables and dimension tables are linked through foreign keys,
allowing for querying and analysis based on specific dimensions.

● Granularity:
Fact tables represent the lowest level of detail (grain) in the data, while
dimension tables provide various levels of aggregation and filtering.

A normalized database is a database design that organizes data into


related tables to reduce redundancy and ensure data integrity. It
follows a set of rules called normal forms (1NF, 2NF, 3NF, etc.) that
break down large, repetitive tables into smaller, more manageable
ones.

Goal Benefit
Avoid data duplication Saves storage, prevents
inconsistency
Maintain data integrity Changes update in one place only
Make relationships clear Easier to enforce logical structure
Improve data accuracy Reduces update, insert, and delete
anomalies
Normal Form Rule Example Problem
Solved
1NF – First Normal No repeating groups No multiple phone
Form or arrays. Each cell = numbers in one field
one value.
2NF – Second Normal Meet 1NF + no partial Break fields that
Normal Form Rule Example Problem
Solved
1NF – First Normal No repeating groups No multiple phone
Form or arrays. Each cell = numbers in one field
one value.
2NF – Second Normal Meet 1NF + no partial Break fields that
Form dependency on depend only on part of
composite keys. a key
3NF – Third Normal Meet 2NF + no Separate fields that
Form transitive depend on other non-
dependencies (non- key fields
key → non-key).

Pros Cons
Eliminates duplication Complex queries (more joins)
Saves storage Slower for reporting/BI
Ensures consistency Can be harder for non-technical
users

You might also like