ADMT
ADMT
Query optimization refers to the process of improving the efficiency of a query by finding the most efficient way to execute
it. The goal is to minimize the resources needed to retrieve the required data, such as time, CPU, memory, and disk I/O,
without changing the results of the query.
How Query Optimization Works:
1. Query Parsing:
○ When a query is submitted, the DBMS first checks its syntax and semantics. This step ensures the query is valid and
correctly references the database schema.
2. Generation of Possible Execution Plans:
○ The DBMS generates multiple execution plans, which are different ways to retrieve the required data. Each execution
plan involves a different combination of operations (e.g., table scans, index scans, join algorithms) that can be used to
execute the query.
3. Cost Estimation:
○ For each execution plan, the optimizer estimates the cost in terms of resource usage (CPU, memory, disk I/O) and
execution time. Cost estimates are based on factors like:
▪ Number of rows in the table.
▪ Availability and selectivity of indexes.
▪ Estimated number of disk accesses or joins.
○ The goal is to choose the plan with the lowest cost.
4. Selection of the Optimal Plan:
○ The query optimizer picks the most efficient execution plan based on the cost estimations and then passes it to the
query execution engine.
Account-Level Privileges
Account-level privileges are permissions related to the management of the database system or user accounts. These privileges typically
control the overall access and management of the database environment. Common account-level privileges include:
1. CREATE USER: Allows the creation of new database user accounts.
2. DROP USER: Permits the deletion of existing user accounts.
3. ALTER USER: Allows modifications to existing user accounts (e.g., changing passwords).
4. GRANT: Permits a user to grant specific privileges to other users.
5. REVOKE: Allows a user to remove privileges previously granted to other users.
6. CREATE DATABASE: Allows the creation of new databases.
7. DROP DATABASE: Permits the deletion of existing databases.
8. ALTER DATABASE: Allows modifications to the database structure or settings.
Relational-Level Privileges
Relational-level privileges are permissions granted to users to perform actions on database objects such as tables, views, procedures, etc.
These privileges are typically more granular and specific to the data and objects within the database. Common relational-level privileges
include:
1. SELECT: Allows a user to retrieve data from tables or views.
2. INSERT: Permits a user to add new rows of data to tables.
3. UPDATE: Allows a user to modify existing data within tables.
4. DELETE: Permits a user to remove data from tables.
5. CREATE: Allows the creation of new database objects like tables, views, indexes, etc.
6. DROP: Permits the deletion of existing database objects.
7. ALTER: Allows modifications to existing database objects (e.g., changing table structure).
8. REFERENCES: Allows a user to create foreign key constraints that reference a table.
9. EXECUTE: Permits a user to run stored procedures or functions.
A temporal database model is a type of database that tracks changes to data over time. Unlike a regular database, which only stores
the current state of the data, a temporal database keeps a history of how the data looked at different points in time. This allows you
to query not only the current data but also past versions.
Example:
In a normal database, if a user’s address is updated, the old address is simply replaced. In a temporal database, both the old and
new addresses are stored, along with the dates they were valid.
Applications
Financial Systems: Track historical financial transactions, stock prices, and account balances for auditing and analysis.
Healthcare: Manage patient medical records and treatment histories over time.
Version Control Systems: Manage and track different versions of documents, software, or code changes.
Challenges
1. Data Storage: Requires more storage due to the need to maintain multiple
versions of data.
2. Schema Design: More complex than non-temporal databases due to the
inclusion of time dimensions.
3. Query Processing: Slower due to the additional complexity of managing
and retrieving temporal data.
Key Components:
3. Applications:
• Geographic Information Systems (GIS): Used to store, manage, and analyze geographic data, allowing for map
visualization and spatial analysis.
• Urban Planning: Helps design city layouts, analyze traffic patterns, and plan infrastructure projects by evaluating
spatial relationships.
• Navigation Systems: Stores map data, calculates routes, and provides directions, considering road types and traffic
conditions.
A Distributed Database System is a type of database system where data is stored across multiple physical
locations, which may be on different computers or networks.
Despite being distributed, the system appears to users as a single, unified database.
The main goal of a distributed database system is to allow data to be stored in various locations while still enabling
efficient access, management, and query processing.
A Distributed Database System is a union of two important technologies- Database Management System and
Computer Network.
DBMS involves centralization while computer Network technology involves decentralization.
Features
• Data Distribution: Data is stored across multiple physical locations, often on different computers or networks.
• Transparency: The system hides the complexity of data distribution, presenting a unified view to users.
• Replication: Data may be copied to multiple sites for better availability and fault tolerance.
• Autonomy: Each site can independently manage its data and transactions.
• Scalability: The system can easily grow by adding more nodes or databases.
• Fault Tolerance: If one site fails, others can continue to operate, ensuring system reliability.
• Concurrency Control: Manages simultaneous access to data across multiple sites to avoid conflicts.
Granting Privileges
Without Grant Option:
Revoking Privileges
New Section 1 Page 6
Revoking Privileges
Attribute-Specific Privileges
Summary
Granting: Allows assigning privileges to users, with or without
the ability to propagate these privileges.
Revoking: Removes previously granted privileges and may
affect users
who received privileges indirectly.
Attribute-Specific: Privileges like UPDATE can be restricted to
specific
attributes of a table.
scss
Copy code
EMPLOYEE(Emp_ID, Name, Department, Location)
Let’s say we want to fragment this table based on the Location attribute.
We can define two fragments based on location:
• Fragment 1 (Employees located in New York):
sql
Copy code
EMPLOYEE_NY = SELECT * FROM EMPLOYEE WHERE Location = 'New York'
• Fragment 2 (Employees located in San Francisco):
sql
Copy code
EMPLOYEE_SF = SELECT * FROM EMPLOYEE WHERE Location = 'San Francisco'
These fragments store rows of employees based on their location. Each fragment is stored locally at the
respective branch (New York, San Francisco).
2. Derived Horizontal Fragmentation:
• Definition:
Derived horizontal fragmentation refers to further fragmentation of the already fragmented
data based on additional criteria. This type of fragmentation is applied to existing fragments
rather than the original table.
Purpose: This helps in maintaining locality of reference, ensuring that data related across multiple
tables is stored together.
Example: Now, consider another table PROJECT that is related to EMPLOYEE via the Emp_ID (foreign key):
scss
Copy code
PROJECT(Project_ID, Emp_ID, Project_Name)
Let’s say we have already horizontally fragmented the EMPLOYEE table by location, as in the example
above. We now want to fragment the PROJECT table using derived horizontal fragmentation. Since each
New Section 1 Page 9
above. We now want to fragment the PROJECT table using derived horizontal fragmentation. Since each
employee works on projects, we can fragment PROJECT based on the employees’ location (using the
Emp_ID as a reference from the EMPLOYEE table).
• Fragment 1 (Projects of employees located in New York):
sql
Copy code
PROJECT_NY = SELECT * FROM PROJECT WHERE Emp_ID IN (SELECT Emp_ID FROM EMPLOYEE WHERE
Location = 'New York')
• Fragment 2 (Projects of employees located in San Francisco):
sql
Copy code
PROJECT_SF = SELECT * FROM PROJECT WHERE Emp_ID IN (SELECT Emp_ID FROM EMPLOYEE WHERE
Location = 'San Francisco')
Here, the PROJECT table is fragmented based on the fragmentation of the EMPLOYEE table, which is known
as derived fragmentation.
Query evaluation is the process by which a database management system (DBMS) translates and
executes a high-level query, typically written in SQL, into a series of low-level operations that
retrieve or manipulate data stored in the database.
Aspect DAC (Discretionary Access Control) MAC (Mandatory Access Control) RBAC (Role-Based Access Control)
Access Controlled by data owner/creator Controlled by central authority via Based on roles assigned to user
Control Security labels
Flexibility High: Owners can decide and change access Low: Access is strictly defined by system Moderate: Access can be changed by
permissions easily. policies and cannot be changed by users. assigning or removing roles.
Common Use Personal computers or smaller systems where Highly secure environments like military or Businesses and organizations where roles
users manage their own files. government systems. and responsibilities are clearly defined.
Access Based Ownership of resources (files, data, etc.). Security labels or classifications (e.g., Top Job roles (e.g., Manager, Developer, HR).
On Secret, Confidential).
Granularity Fine-grained: Individual users can be granted or Coarse-grained: Access is based on broader Medium-grained: Access is based on roles
denied access to specific files or resources. security labels, with strict rules. that can apply to many users.
Ease of Harder: Owners must manually set permissions Strict but simpler: System administrators Easier: Admins manage roles, and users are
Management for every resource. define policies, and users have no control. assigned to roles.
Security Level Lower: Can be less secure if owners set Very High: Strong security, no user Medium to High: Security depends on how
permissions carelessly. discretion, but less flexibility. roles and permissions are defined.
Examples File sharing on a personal computer, shared Government agencies, military systems with Corporate environments, hospitals, or large
documents. classified data. organizations.
Advantages Flexible and easy to use for small-scale systems. Highly secure, especially in sensitive Easy to manage in large organizations with
environments. structured roles.
Disadvantage Can lead to accidental security lapses due to Very rigid, lacks flexibility for dynamic Requires well-defined roles, can be complex
s human error. environments. to set up initially.
1. Consistency
Consistency means that all the nodes (databases) inside a network will have the same copies of a replicated data item visible for various transactions. It guarantees that every
node in a distributed cluster returns the same, most recent, and successful write. It refers to every client having the same view of the data. There are various types of
consistency models. Consistency in CAP refers to sequential consistency, a very strong form of consistency.
Note that the concept of Consistency in ACID and CAP are slightly different since in CAP, it refers to the consistency of the values in different copies of the same data item in a
replicated distributed system. In ACID, it refers to the fact that a transaction will not violate the integrity constraints specified on the database schema.
For example, a user checks his account balance and knows that he has 500 rupees. He spends 200 rupees on some products. Hence the amount of 200 must be deducted
changing his account balance to 300 rupees. This change must be committed and communicated with all other databases that hold this user’s details. Otherwise, there will be
inconsistency, and the other database might show his account balance as 500 rupees which is not true.
Consistency problem
2. Availability
Availability means that each read or write request for a data item will either be processed successfully or will receive a message that the operation cannot be completed. Every
non-failing node returns a response for all the read and write requests in a reasonable amount of time. The key word here is “every”. In simple terms, every node (on either side
of a network partition) must be able to respond in a reasonable amount of time.
For example, user A is a content creator having 1000 other users subscribed to his channel. Another user B who is far away from user A tries to subscribe to user A’s channel.
Since the distance between both users are huge, they are connected to different database node of the social media network. If the distributed system follows the principle of
availability, user B must be able to subscribe to user A’s channel.
Availability problem
3. Partition Tolerance
A network partition occurs when there is a failure in communication between nodes (computers or servers) in a distributed system. This means that the system is split into two
or more parts that cannot communicate with each other.
Partition tolerance means that even when such a communication breakdown happens, the system can still function and serve requests in the parts that are operational, without
crashing or losing data.
For example, take the example of the same social media network where two users are trying to find the subscriber count of a particular channel. Due to some technical fault,
there occurs a network outage, the second database connected by user B losses its connection with first database. Hence the subscriber count is shown to the user B with the
help of replica of data which was previously stored in database 1 backed up prior to network outage. Hence the distributed system is partition tolerant.
1. Key-Value Database
• What it is: A key-value database is like a big dictionary where each piece of data (the value) is stored and accessed using a unique identifier (the key). It's simple and highly
scalable.
• How it works: Think of it like a locker system. You have a key (ID), and you use it to open a locker that contains something inside (the value).
• Use case: It's great for simple lookups, caching, or session management, where you just need to store and retrieve data based on a unique key.
• Example: Redis, Amazon DynamoDB
• Example scenario: In an e-commerce app, you can store user session data (like a shopping cart) with the user ID as the key and the cart data as the value.
• Advantages:
• Very fast read and write operations because of its simple data model.
• Suitable for scenarios where data relationships are not complex.
• Disadvantages:
• Lacks the ability to perform complex queries or relationships between different data entries.
• Not ideal when there is a need for querying based on the value fields.
3. Graph-Based Database
• What it is: A graph database is designed to store data in a graph structure, where data is represented as nodes (entities) and edges (relationships between those entities).
It’s particularly useful for managing and querying complex relationships between data.
• How it works: Imagine a social network where users (nodes) are connected to each other by relationships (edges). The graph database helps to quickly find things like
"Who are the friends of my friends?"
• Use case: Ideal for applications where relationships are key, such as social networks, recommendation engines, and fraud detection.
• Example: Neo4j, Amazon Neptune
• Example scenario: In a social media app, you can use a graph database to find mutual friends, suggest new connections, or analyze how users are connected through
various interactions.
• Advantages:
• Excels at handling complex relationships and interconnected data.
• Queries involving pathfinding, shortest paths, and patterns within the data can be very efficient.
• Disadvantages:
• Can be more complex to design and query compared to other database types.
• May not perform as well with simple lookups or scenarios that don’t require relationship traversal.
From <https://www.geeksforgeeks.org/difference-between-olap-and-oltp-in-dbms/>
ChatGPT said:
ChatGPT
Here is a comparison of OLAP and OLTP in tabular format:
Feature OLAP (Online Analytical Processing) OLTP (Online Transaction Processing)
Purpose Analytical, decision-making support Transactional, operational processing
Data Processing Complex queries on large datasets Simple queries for day-to-day tasks
Operations Read-heavy Write-heavy
Data Volume Large volume of historical data Small volume, current data
Data Type Historical and aggregated Current, real-time, and detailed
Query Type Multi-dimensional, complex queries Simple, short queries (INSERT, UPDATE)
Response Time Slower (minutes to hours) Fast (milliseconds to seconds)
Normalization Denormalized (star/snowflake schema) Highly normalized (3NF or higher)
Users Data analysts, management Operational staff, end-users
Data Integrity Not as critical Critical (ACID compliance)
Backup and Recovery Periodic Frequent (real-time recovery needed)
Example Use Cases Data warehousing, BI, reporting Banking, order entry, e-commerce
Focus Insights and trends Day-to-day transactions
System Throughput Low to medium High (supports many concurrent users)
This comparison highlights the key differences between OLAP and OLTP in terms of purpose, data handling, and system characteristics.
4o
You said:
Explain diff OLAP operation
ChatGPT said:
ChatGPT
OLAP (Online Analytical Processing) operations enable users to analyze multidimensional data from multiple perspectives. Here are the primary OLAP operations:
1. Roll-Up (Aggregation)
• Description: This operation aggregates data by climbing up a hierarchy or reducing the level of detail.
• Example: Summarizing sales data from monthly to quarterly figures.
4o mini
You said:
Explain BASE property of SQ:
ChatGPT said:
ChatGPT
The BASE properties are a set of principles used in the design of distributed systems, particularly NoSQL databases, to provide a more flexible alternative to the rigid
consistency model of relational databases, which follow the ACID properties. BASE stands for Basically Available, Soft state, Eventual consistency, and these properties
prioritize availability and partition tolerance over immediate consistency. Here’s a breakdown:
1. Basically Available
• Description: The system guarantees availability, meaning that the database will always respond to a request (even if it returns an outdated or approximate response in
some cases).
• Implication: Instead of failing during partitioning or network issues, the system remains functional and serves requests, though the data may not be the most up-to-date.
• Example: A NoSQL database may serve read requests even if it can't immediately synchronize writes due to network issues, but it will still provide some form of response.
2. Soft State
• Description: The state of the system may change over time, even without input due to the eventual propagation of changes across nodes.
• Implication: The system doesn’t have to maintain a consistent state at all times. The state is "soft" because it’s continuously evolving until it reaches a stable state.
• Example: In distributed databases, updates to a record may not immediately be reflected across all nodes but will propagate over time.
3. Eventual Consistency
• Description: The system will eventually reach a consistent state, given enough time, assuming no new updates are made. However, immediate consistency isn’t guaranteed.
• Implication: While updates made to the database may not be visible to all users immediately, all nodes will eventually reflect the latest state.
• Example: A distributed database might return slightly different results depending on which node is queried, but over time, all nodes will converge on the same value.
BASE vs ACID
• ACID (Atomicity, Consistency, Isolation, Durability) emphasizes strict consistency and integrity, often at the expense of availability or performance in distributed systems.
• BASE, on the other hand, trades strict consistency for high availability and partition tolerance, making it suitable for large-scale, distributed systems such as NoSQL
databases (e.g., Cassandra, DynamoDB).
Summary
BASE properties provide a foundation for building distributed systems where high availability and fault tolerance are crucial. They sacrifice immediate consistency but allow the
system to remain operational even under failure conditions, making them suitable for modern, large-scale, highly distributed applications.
4o
You said:
What is noSQL
ChatGPT said:
ChatGPT
NoSQL (short for "Not Only SQL") is a class of database management systems designed to handle large volumes of unstructured, semi-structured, or structured data that
relational databases (RDBMS) may struggle to manage efficiently. Unlike traditional SQL databases that rely on structured data and fixed schemas, NoSQL databases provide
a more flexible approach to storing and retrieving data, making them ideal for modern, large-scale, and real-time web applications.
Key Characteristics of NoSQL Databases
1. Non-Relational:
○ NoSQL databases do not rely on the relational model used by SQL databases (with tables, rows, and columns). Instead, they use flexible models such as document,
key-value, wide-column, or graph-based data storage.
2. Schema-less or Flexible Schema:
○ Unlike SQL databases that require predefined schemas, NoSQL databases can store data without a fixed schema, allowing for mor e flexible and dynamic data
structures. This is particularly useful when the data format is constantly changing or when dealing with unstructured data.
4o
You said:
what is ETL Process
ChatGPT said:
ChatGPT
The ETL process refers to Extract, Transform, and Load, a sequence of steps used in data integration and preparation to move data from various sources into a target system
(usually a data warehouse or data mart). It is essential for organizations to consolidate data from different sources, clean and process it, and make it available for analysis,
reporting, and business intelligence purposes. Here's a breakdown of each step in the ETL process:
1. Extract (E)
• Description: The extraction phase involves retrieving raw data from multiple, often disparate, source systems. These sources can include databases, flat files, web services,
cloud applications, or other structured/unstructured data repositories.
• Challenges: Data may be in different formats, and systems may have varying update frequencies.
• Examples of Sources:
Databases (e.g., MySQL, Oracle, SQL Server)
4o
You said:
Discuss steps of ETL Process
ChatGPT said:
ChatGPT
The ETL (Extract, Transform, Load) process consists of three major steps: Extract, Transform, and Load. These steps enable organizations to consolidate, clean, and move data
from multiple sources into a centralized system, like a data warehouse, for analysis and reporting. Let’s discuss each step in detail:
1. Extract
The first step in the ETL process involves extracting raw data from various sources. This data could come from structured, semi-structured, or unstructured sources such as
databases, flat files, APIs, cloud applications, etc.
Key Activities in Extraction:
• Identify Data Sources: Pinpoint the systems from which data needs to be extracted, such as operational databases, CRM systems, external web services, or even
spreadsheets.
• Extract Methods:
Full Extraction: All data is extracted from the source system, regardless of whether it has changed. This is typically used for initial data loads.
plaintext
Copy code
+-------------+ +-------------+ +-------------+
| Extract | ---> | Transform | ---> | Load |
+-------------+ +-------------+ +-------------+
| Source | | Cleansing | | Data Store |
| Systems | | Formatting | | (e.g., DW) |
+-------------+ | Aggregation| +-------------+
+-------------+
Summary of ETL Steps:
1. Extract: Pull data from various sources.
2. Transform: Cleanse, validate, and prepare the data to fit business requirements.
3. Load: Move the transformed data into the target system for storage, analysis, and reporting.
Key Benefits of ETL:
• Data Consolidation: Aggregates data from multiple sources into a single, unified view.
• Improved Data Quality: Cleans and standardizes data, ensuring accuracy and consistency.
• Efficiency: Automates data integration and transformation processes, reducing manual effort.
• Actionable Insights: Provides ready-to-use data for business intelligence and analytics platforms, enabling informed decision-making.
The ETL process is foundational for building data warehouses, enabling businesses to handle large volumes of diverse data efficiently, supporting data-driven decision-making.
ChatGPT said:
ChatGPT
A data warehouse is a centralized repository designed to store large volumes of data from various sources, structured in a way that supports efficient querying, analysis, and
reporting. It acts as a central hub for an organization's historical data, enabling business intelligence (BI), analytics, and decision-making processes. Data in a data warehouse is
typically transformed, cleaned, and organized into schemas (often star or snowflake schemas) to optimize for analytical workloads.
Characteristics of a Data Warehouse:
1. Subject-Oriented:
○ A data warehouse is organized around key business subjects or domains, such as sales, finance, customer, or product, rather t han specific applications or operations.
This allows for better focus on data analysis related to particular business areas, making it easier to perform queries and r eports on specific subjects.
○ Example: A data warehouse might store data specifically related to customer behavior or product performance to generate insights.
2. Non-Volatile:
○ Once data is entered into the data warehouse, it doesn’t change. Data is read-only and used primarily for analysis and reporting. This characteristic ensures that
historical data remains consistent and unchanged over time, allowing for accurate trend analysis and historical reporting.
○ Example: Monthly sales data from five years ago remains the same, ensuring consistency when generating year -over-year comparisons.
These characteristics make data warehouses highly effective for business intelligence and analytical purposes, providing reliable and organized data for decision-making.
4o
You said:
Define surrogate key
ChatGPT said:
ChatGPT
A surrogate key is an artificially generated, unique identifier used in a database table to represent a record. Unlike natural keys (which use existing, real-world data like Social
Security numbers or email addresses), surrogate keys have no business meaning and are typically implemented as sequential numbers or GUIDs (Globally Unique Identifiers).
Key Characteristics of a Surrogate Key:
1. Uniqueness: Each record in the table has a unique surrogate key, ensuring that it can be uniquely identified.
2. No Business Meaning: Surrogate keys are independent of the data itself and do not have any intrinsic meaning related to the business context.
3. Immutability: Surrogate keys do not change, even if the associated data is modified.
Example:
In a customer table:
• CustomerID (surrogate key): 101, 102, 103...
• CustomerName (natural key): John Doe, Jane Smith...
Here, CustomerID is a surrogate key, a simple number that uniquely identifies each customer, and it has no business significance.
Advantages of Surrogate Keys:
• Consistency: Surrogate keys remain constant even if other business attributes change.
• Efficiency: They are often simpler and faster for database indexing and querying than natural keys, especially when dealing with large datasets.
4o
You said:
Define fact table and factless table
ChatGPT said:
ChatGPT
Fact Table:
A fact table is the central table in a star schema or snowflake schema of a data warehouse that stores quantitative data (facts) for analysis. These facts are typically numerical
measures, like sales amount, quantity sold, or revenue, which can be aggregated and analyzed. Each fact is associated with dimension tables that describe the context (e.g.,
date, product, customer).
Key Characteristics:
• Contains measurable data: Stores numeric values like sales revenue, number of units sold, etc.
• Linked to dimension tables: Fact tables are surrounded by dimension tables (e.g., time, location, product).
• Foreign keys: They contain foreign keys that reference the primary keys in dimension tables.
Example:
A sales fact table might include:
DateKey ProductKey StoreKey SalesAmount UnitsSold
20241001 101 201 5000 100
20241002 102 202 8000 150
Here, DateKey, ProductKey, and StoreKey are foreign keys linking to dimension tables, while SalesAmount and UnitsSold are the facts (measurable values).
Factless Fact Table:
A factless fact table is a type of fact table that does not contain any measurable, numeric facts. Instead, it captures events or occurrences that happen, but with no associated
metrics. Factless fact tables are often used to track the presence or absence of certain events or conditions.
Use Cases:
• Tracking Events: Logging events like student attendance, product promotion, or customer activity without having measurable data (e.g., sales amount).
4o
You said:
Snow flake schema
ChatGPT said:
ChatGPT
Snowflake Schema:
A snowflake schema is a logical arrangement of tables in a data warehouse where the central fact table is connected to multiple dimension tables, and the dimension tables
themselves may be normalized into additional related tables. This leads to a structure resembling a snowflake, hence the name. The snowflake schema is an extension of the
star schema, where dimension tables are further broken down into sub-dimension tables to reduce redundancy.
Key Features:
1. Normalized Dimension Tables:
○ Unlike a star schema, where dimension tables are usually denormalized, in a snowflake schema, they are normalized. This means the dimension tables are split into
multiple related tables.
○ For example, instead of having a single Product dimension table, it might be split into Product, Category, and Brand tables.
2. Multiple Levels of Tables:
○ Dimension tables may have multiple levels of related tables. These tables are connected via foreign keys, creating a multi -level structure.
○ This design helps minimize data redundancy but may increase the complexity of queries due to more joins between tables.
3. Central Fact Table:
○ The central fact table stores measurable data (e.g., sales, revenue, units sold) and links to the dimension tables via foreig n keys.
Example of a Snowflake Schema:
Fact Table: SalesFact
DateKey ProductKey CustomerKey StoreKey SalesAmount
20241001 101 1001 201 5000
20241002 102 1002 202 8000
Dimension Tables:
• Product Dimension: Normalized into multiple tables:
○ Product: Contains basic product information.
○ Category: Contains product categories.
○ Brand: Contains brand details.
ProductKey ProductName CategoryKey BrandKey
101 Phone 10 1
102 Laptop 11 2
○ Category:
CategoryKey CategoryName
10 Electronics
11 Computers
○ Brand:
BrandKey BrandName
1 Apple
2 Dell
• Customer Dimension: Normalized into Customer and Region tables.
CustomerKey CustomerName RegionKey
1001 John Doe 101
1002 Jane Smith 102
○ Region:
RegionKey RegionName
101 North America
102 Europe
Advantages:
1. Reduced Data Redundancy: Due to normalization, the snowflake schema minimizes data duplication, making the storage more efficient.
2. Better Data Integrity: With normalization, updates and changes to dimensions can be made in a single table, ensuring consistency.
3. Easier to Maintain: By breaking down dimension tables, it becomes easier to manage and modify dimensions independently.
4o