Data Modeling
Data Modeling
Data modeling helps us organize data in a way that makes it easy to use, understand,
and analyze.
In simple terms:
Think of data modeling as arranging things into tables (like Excel sheets) and connecting
them properly. Here’s a simple breakdown:
▪ Each customer can have multiple sales records, so you connect the "Customer" table to
the "Sales" table using a unique ID (like Customer ID).
4. Define Rules
There are three main types of data models, depending on how detailed they are:
1. Conceptual Model
o More detailed.
o Defines the structure of the data (tables, columns, relationships).
o Example: "The ‘Customer’ table has Name, Age, Address. The ‘Sales’ table has Product,
Price, Quantity."
3. Physical Model
There are tools that help create data models visually. Some popular ones are:
Snowflake and Star Schema are two popular approaches to organizing data in a
database, especially for analytics and reporting purposes. Both are types of dimensional
modeling, which is a method used to structure data so it’s easy to query and analyze.
• Fact Tables: These store the "measures" or "metrics" (numbers you want to analyze, like
sales amount, quantity sold, etc.).
• Dimension Tables: These store descriptive information about the data (like customer
name, product details, date, etc.).
Structure:
Example:
Imagine a retail company that wants to track sales data.
Diagram:
Key Features:
Structure:
• The dimension tables are normalized (broken down further into sub-dimensions).
• Some dimension tables are connected to other dimension tables.
Example:
Using the same retail company example, but now with normalized dimensions:
Diagram:
Key Features:
• Star Schema:
o When dealing with very large datasets where storage efficiency matters.
o When you need to reduce data redundancy and maintain strict normalization.
o Commonly used in data warehouses for enterprise-level systems.
Summary
• Star Schema: Simple, fast, and great for small to medium datasets. Fact table is directly
connected to dimension tables.
• Snowflake Schema: Normalized, efficient for large datasets, but slower for queries due
to more joins.
• Both are widely used in data warehousing and analytics. The choice depends on your
dataset size, performance needs, and storage constraints.
The conceptual model is the high-level understanding of the data and its relationships.
It defines:
Example Scenario:
Let’s use a Retail Sales System as our example. The system tracks sales data for
products sold to customers.
Entities (Tables):
Relationships:
Diagram:
Step 2: Logical Model
The logical model adds more detail to the conceptual model. It specifies:
Diagram:
Diagram:
SQL Implementation
Star Schema
Product Table (Dim)
To test the models, you can run queries to analyze the data. For example:
To test the models, you can run queries to analyze the data. For example:
Key Takeaways
• Star Schema is simpler and faster for querying because it has fewer joins.
• Snowflake Schema reduces redundancy and is more storage-efficient but requires
more joins.
• Both schemas are useful depending on the size of your dataset and performance
requirements.