0% found this document useful (0 votes)
1 views19 pages

Data Modeling

Data modeling is the process of organizing raw data into structured formats for easier analysis and understanding. It involves identifying, organizing, connecting data, and defining rules, with three main types of models: conceptual, logical, and physical. Two popular data organization methods are Star Schema, which is simple and efficient for smaller datasets, and Snowflake Schema, which is more complex and suited for larger datasets with a focus on reducing redundancy.

Uploaded by

suraj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views19 pages

Data Modeling

Data modeling is the process of organizing raw data into structured formats for easier analysis and understanding. It involves identifying, organizing, connecting data, and defining rules, with three main types of models: conceptual, logical, and physical. Two popular data organization methods are Star Schema, which is simple and efficient for smaller datasets, and Snowflake Schema, which is more complex and suited for larger datasets with a focus on reducing redundancy.

Uploaded by

suraj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

What is Data Modeling?

Data modeling helps us organize data in a way that makes it easy to use, understand,
and analyze.

In simple terms:

• Data is like raw information (e.g., names, addresses, sales numbers).


• Modeling is the process of structuring this data so it’s useful.

Why Do We Need Data Modeling?

Without data modeling:

• Data can be messy and hard to work with.


• It becomes difficult to find the information you need.
• Reports and analyses might not make sense.

With data modeling:

• Data is organized logically.


• It’s easier to build systems like databases, reports, or dashboards.
• Everyone using the data understands how it’s structured.

How Does Data Modeling Work?

Think of data modeling as arranging things into tables (like Excel sheets) and connecting
them properly. Here’s a simple breakdown:

1. Identify the Data


o Figure out what kind of information you’re dealing with.
o Example:

▪ Customer data: Name, age, address.


▪ Sales data: Product name, price, quantity sold.
2. Organize the Data

o Group related data together.


o Example:

▪ A "Customer" table with columns like Name, Age, Address.


▪ A "Sales" table with columns like Product, Price, Quantity.
3. Connect the Data

o Link related tables.


o Example:

▪ Each customer can have multiple sales records, so you connect the "Customer" table to
the "Sales" table using a unique ID (like Customer ID).
4. Define Rules

o Set rules for how the data should behave.


o Example:

▪ Every customer must have a unique ID.


▪ Prices in the "Sales" table cannot be negative.

Types of Data Models

There are three main types of data models, depending on how detailed they are:

1. Conceptual Model

o The big picture.


o What data do we need? Who will use it?
o Example: "We need customer and sales data."
2. Logical Model

o More detailed.
o Defines the structure of the data (tables, columns, relationships).
o Example: "The ‘Customer’ table has Name, Age, Address. The ‘Sales’ table has Product,
Price, Quantity."
3. Physical Model

o The actual implementation.


o How the data is stored in a database.
o Example: Writing SQL queries to create tables and relationships.

Tools Used for Data Modeling

There are tools that help create data models visually. Some popular ones are:

• Microsoft Excel: For simple data organization.


• Power BI/Tableau: For creating visual models and dashboards.
• SQL: For creating and managing databases.
• ER Diagram Tools: For drawing diagrams of how tables are connected.

Snowflake and Star Schema

Snowflake and Star Schema are two popular approaches to organizing data in a
database, especially for analytics and reporting purposes. Both are types of dimensional
modeling, which is a method used to structure data so it’s easy to query and analyze.

What is Dimensional Modeling?


Dimensional modeling is a way to organize data into two main types of tables:

• Fact Tables: These store the "measures" or "metrics" (numbers you want to analyze, like
sales amount, quantity sold, etc.).
• Dimension Tables: These store descriptive information about the data (like customer
name, product details, date, etc.).

The goal is to make it easy to answer business questions like:

• "How many products were sold last month?"


• "What was the total revenue from a specific region?"

What is Star Schema?

Structure:

• One central Fact Table surrounded by multiple Dimension Tables.


• The relationships between the fact table and dimension tables are direct (no
intermediate tables).

Example:
Imagine a retail company that wants to track sales data.

• Fact Table: Sales

o Columns: SalesID, ProductID, CustomerID, DateID, Quantity, Revenue.


• Dimension Tables:

o Product: Product details like ProductID, ProductName, Category, Price.


o Customer: Customer details like CustomerID, Name, City, Country.
o Date: Date details like DateID, Day, Month, Year.

Diagram:
Key Features:

• Simple and easy to understand.


• Fast for querying because there are fewer joins.
• Best suited for small to medium-sized datasets.

What is Snowflake Schema?

Structure:

• The dimension tables are normalized (broken down further into sub-dimensions).
• Some dimension tables are connected to other dimension tables.
Example:
Using the same retail company example, but now with normalized dimensions:

• Fact Table: Sales

o Columns: SalesID, ProductID, CustomerID, DateID, Quantity, Revenue.


• Dimension Tables:

o Product → Connected to a Category table.

▪ Product: ProductID, ProductName, CategoryID.


▪ Category: CategoryID, CategoryName.
o Customer → Connected to a City table and Country table.

▪ Customer: CustomerID, Name, CityID.


▪ City: CityID, CityName, CountryID.
▪ Country: CountryID, CountryName.
o Date: DateID, Day, Month, Year.

Diagram:
Key Features:

• More complex than Star Schema due to additional normalization.


• Reduces data redundancy (e.g., instead of storing full city and country names
repeatedly, they’re stored in separate tables).
• Slower for querying because there are more joins.
• Best suited for large datasets where storage efficiency is important.

When to Use Each Schema?

• Star Schema:

o When simplicity and query performance are priorities.


o For smaller datasets where storage efficiency isn’t a concern.
o Ideal for business intelligence tools like Power BI, Tableau, etc.
• Snowflake Schema:

o When dealing with very large datasets where storage efficiency matters.
o When you need to reduce data redundancy and maintain strict normalization.
o Commonly used in data warehouses for enterprise-level systems.
Summary

• Star Schema: Simple, fast, and great for small to medium datasets. Fact table is directly
connected to dimension tables.
• Snowflake Schema: Normalized, efficient for large datasets, but slower for queries due
to more joins.
• Both are widely used in data warehousing and analytics. The choice depends on your
dataset size, performance needs, and storage constraints.

PROJECT ON DATA MODELING USING SQL


We’ll follow the process of Conceptual → Logical → Physical modeling and then
implement both Star Schema and Snowflake Schema using SQL.

Step 1: Conceptual Model

The conceptual model is the high-level understanding of the data and its relationships.
It defines:

• What entities (tables) are involved?


• What attributes (columns) describe each entity?
• How the entities are related to each other?

Example Scenario:

Let’s use a Retail Sales System as our example. The system tracks sales data for
products sold to customers.
Entities (Tables):

• Sales: Tracks sales transactions.


• Product: Stores product details.
• Customer: Stores customer details.
• Date: Stores date-related information.

Relationships:

• A sale involves one Product, one Customer, and one Date.


• Each product belongs to a category.
• Each customer lives in a city, which belongs to a country.

Diagram:
Step 2: Logical Model

The logical model adds more detail to the conceptual model. It specifies:

• The structure of tables (columns and data types).


• Primary keys (unique identifiers for each table).
• Foreign keys (relationships between tables).

Diagram:

Star Schema Logical Model:

• Fact Table: Sales (contains measures like quantity and revenue).


• Dimension Tables: Product, Customer, Date.

Snowflake Schema Logical Model:


• Same as the Star Schema, but dimension tables are normalized:

o Product is linked to Category.


o Customer is linked to City and Country.

Diagram:

Step 3: Physical Model

The physical model is the actual implementation in SQL. It includes:

• Creating tables with proper data types.


• Defining primary and foreign keys.
• Inserting sample records for testing.

Step 3: Physical Model

The physical model is the actual implementation in SQL. It includes:

• Creating tables with proper data types.


• Defining primary and foreign keys.
• Inserting sample records for testing.

SQL Implementation

Star Schema
Product Table (Dim)

Customer Table (Dim)

Date Table (Dim)


Sales Table (Fact)

Testing the Data Models

To test the models, you can run queries to analyze the data. For example:

Query in Star Schema:


Snowflake Schema
Category Table (Dim)

Product Table (Dim)


Country Table (Dim)

City Table (Dim)

Customer Table (Dim)

Date Table (Dim)

Sales Table (Fact)


Testing the Data Models

To test the models, you can run queries to analyze the data. For example:

Query in Snowflake Schema:

Key Takeaways

• Star Schema is simpler and faster for querying because it has fewer joins.
• Snowflake Schema reduces redundancy and is more storage-efficient but requires
more joins.
• Both schemas are useful depending on the size of your dataset and performance
requirements.

You might also like