Unit 2
Unit 2
1. Entities
o Represent real-world objects, concepts, or events that have data stored about
them.
o Examples: Customer, Order, Product.
2. Attributes
o Characteristics or properties of entities.
o Examples: Customer (CustomerID, Name, Address), Order (OrderID, OrderDate,
TotalAmount).
3. Relationships
o Define how entities are related to each other.
o Examples: A Customer places an Order, an Order contains Products.
4. Primary Keys
o Unique identifiers for entities.
o Examples: CustomerID for Customer, OrderID for Order.
5. Foreign Keys
o Attributes that create a link between entities.
o Examples: CustomerID in Order to link an Order to a Customer.
6. Normalization
o The process of organizing data to minimize redundancy.
o Ensures that data is logically stored, avoiding anomalies during data operations
(insert, update, delete).
1. Identify Entities
o Determine the main objects and concepts relevant to the business requirements.
2. Define Attributes
o List out the attributes for each entity, ensuring they capture all necessary data.
3. Establish Relationships
o Identify and define how entities interact with each other.
4. Apply Normalization
o Normalize the data to ensure it is organized efficiently. Common forms include
1NF (First Normal Form), 2NF (Second Normal Form), and 3NF (Third Normal
Form).
5. Define Keys
o Assign primary keys to each entity and foreign keys to establish relationships.
6. Validate the Model
o Review the model with stakeholders to ensure it meets business requirements.
Clarity and Communication: Provides a clear and detailed representation of business data
requirements, facilitating communication among stakeholders.
Data Integrity: Ensures data accuracy and consistency through normalization and
relationship definitions.
System Independence: Abstracts data structures from physical considerations, making the
model independent of specific database technologies.
Scalability: A well-designed LDM can adapt to changing business requirements and is
scalable for future growth.
Customer:
o Attributes: CustomerID (Primary Key), Name, Email, Address
Order:
o Attributes: OrderID (Primary Key), OrderDate, TotalAmount, CustomerID
(Foreign Key)
Product:
o Attributes: ProductID (Primary Key), ProductName, Price
OrderDetail:
o Attributes: OrderDetailID (Primary Key), OrderID (Foreign Key), ProductID
(Foreign Key), Quantity, UnitPrice
Relationships:
scss
Copy code
Customer (CustomerID, Name, Email, Address)
|
| (1-to-many relationship)
|
Order (OrderID, OrderDate, TotalAmount, CustomerID)
|
| (1-to-many relationship)
|
OrderDetail (OrderDetailID, OrderID, ProductID, Quantity, UnitPrice)
|
| (many-to-1 relationship)
|
Product (ProductID, ProductName, Price)
A Physical Data Model (PDM) represents how data is stored in a database, detailing the technical
implementation and structure. It includes specifications for tables, columns, data types, indexes,
and constraints, and it maps the logical data model to the physical storage.
1. Tables
o Represents entities from the logical model but implemented as database tables.
o Each table has columns corresponding to attributes.
2. Columns
o Defines the data type and constraints for each attribute.
o Includes metadata such as data type (e.g., VARCHAR, INT), nullability, default
values, etc.
3. Primary Keys
o Uniquely identify rows in a table.
o Ensure that each record can be uniquely retrieved.
4. Foreign Keys
o Enforce relationships between tables.
o Ensure referential integrity by linking primary keys from one table to another.
5. Indexes
o Improve the speed of data retrieval.
o Can be applied to one or more columns to speed up queries.
6. Constraints
o Rules applied to data in the table to ensure data integrity.
o Includes primary key constraints, foreign key constraints, unique constraints, and
check constraints.
7. Views
o Virtual tables based on the result set of a query.
o Provide a way to simplify complex queries and enhance security by limiting
access to specific data.
8. Partitioning
o Divides a table into smaller, more manageable pieces for performance and
maintenance.
o Can be based on ranges, lists, or hashing.
9. Storage Parameters
o Define how and where data is physically stored.
o Includes tablespaces, file groups, and storage engines.
Customer
o Columns: CustomerID (INT, Primary Key), Name (VARCHAR(100)), Email
(VARCHAR(100)), Address (VARCHAR(255))
Order
o Columns: OrderID (INT, Primary Key), OrderDate (DATETIME), TotalAmount
(DECIMAL), CustomerID (INT, Foreign Key)
Product
o Columns: ProductID (INT, Primary Key), ProductName (VARCHAR(100)),
Price (DECIMAL)
OrderDetail
o Columns: OrderDetailID (INT, Primary Key), OrderID (INT, Foreign Key),
ProductID (INT, Foreign Key), Quantity (INT), UnitPrice (DECIMAL)
Constraints:
Primary Keys:
o Customer (CustomerID)
o Order (OrderID)
o Product (ProductID)
o OrderDetail (OrderDetailID)
Foreign Keys:
o Order.CustomerID references Customer.CustomerID
o OrderDetail.OrderID references Order.OrderID
o OrderDetail.ProductID references Product.ProductID
Indexes:
CustomerEmailIndex: Index on Customer(Email) to speed up searches by email.
Storage Parameters:
Tablespace:
o Assign tables to specific tablespaces for better storage management.
sql
Copy code
Customer
-------
CustomerID (PK, INT)
Name (VARCHAR(100))
Email (VARCHAR(100))
Address (VARCHAR(255))
Order
-------
OrderID (PK, INT)
OrderDate (DATETIME)
TotalAmount (DECIMAL)
CustomerID (FK, INT)
Product
-------
ProductID (PK, INT)
ProductName (VARCHAR(100))
Price (DECIMAL)
OrderDetail
-------
OrderDetailID (PK, INT)
OrderID (FK, INT)
ProductID (FK, INT)
Quantity (INT)
UnitPrice (DECIMAL)
A Dimensional Data Model (DDM) is designed to support data warehousing and online
analytical processing (OLAP) by structuring data for efficient querying and reporting. This
model is based on dimensions and facts, providing a way to organize data that facilitates
complex queries and analysis. The key components of a Dimensional Data Model include fact
tables and dimension tables.
1. Star Schema
o Structure: A single fact table in the center connected to multiple dimension tables.
o Characteristics: Simple design, easy to understand and query.
o Use Case: Suitable for simpler data marts and smaller data warehouses.
2. Snowflake Schema
o Structure: Similar to a star schema but with normalized dimension tables.
o Characteristics: More complex due to normalization, which can reduce
redundancy.
o Use Case: Useful when dimensions have hierarchical relationships that need to be
explicitly defined.
3. Galaxy Schema (or Fact Constellation Schema)
o Structure: Multiple fact tables sharing dimension tables.
o Characteristics: Complex schema supporting multiple business processes.
o Use Case: Suitable for large-scale data warehouses with multiple star schemas.
SalesFact
o SalesID (Primary Key)
o DateID (Foreign Key)
o ProductID (Foreign Key)
o StoreID (Foreign Key)
o SalesAmount (DECIMAL)
o QuantitySold (INT)
Dimension Tables:
DateDimension
o DateID (Primary Key)
o Date
o Month
o Quarter
o Year
ProductDimension
o ProductID (Primary Key)
o ProductName
o Category
o Supplier
StoreDimension
o StoreID (Primary Key)
o StoreName
o Location
o Region
scss
Copy code
DateDimension
-------------
DateID (PK)
Date
Month
Quarter
Year
|
|
|
ProductDimension -------------------- SalesFact -------------------- StoreDimension
------------------ ------------- -------------
ProductID (PK) SalesID (PK) StoreID (PK)
ProductName DateID (FK) StoreName
Category ProductID (FK) Location
Supplier StoreID (FK) Region
SalesAmount
QuantitySold
1. Business-Oriented:
o Purpose: Defines business entities, their attributes, and relationships in a business
context.
o Audience: Business stakeholders, subject matter experts, and technical teams.
2. Abstraction:
o Level of Detail: Provides a high-level overview of data requirements without
specifying how data will be physically stored or implemented.
o Focus: Emphasizes on what data is needed rather than how it will be used or
processed.
3. Entities and Relationships:
o Entities: Represent major business objects or concepts.
o Relationships: Capture associations and connections between entities.
4. Attributes:
o Describe Properties: Defines the characteristics or properties of each entity.
o Example: Customer (CustomerID, Name, Address).
5. Normalization:
o Avoids Redundancy: Ensures that data is organized efficiently to minimize
redundancy and anomalies.
o Example: Ensuring each piece of information is stored only once.
6. Notation:
o Diagrammatic Representation: Often depicted using Entity-Relationship
Diagrams (ERDs) or similar diagramming techniques.
o Symbols: Entities represented as rectangles, relationships as lines connecting
entities.
Customer
o Attributes: CustomerID, Name, Email, Address.
Order
o Attributes: OrderID, OrderDate, TotalAmount.
o Relationships: Customer places Order.
Product
o Attributes: ProductID, ProductName, Price.
o Relationships: Order contains Products.
scss
Copy code
Customer (CustomerID, Name, Email, Address)
|
| (1-to-many relationship)
|
Order (OrderID, OrderDate, TotalAmount)
|
| (1-to-many relationship)
|
Product (ProductID, ProductName, Price)
Unified View: Provides a single, consistent view of data across the organization.
Improved Decision-Making: Facilitates better decision-making through comprehensive
and accurate data insights.
Efficiency and Consistency: Promotes efficiency by reducing redundancy, ensuring data
consistency, and simplifying data integration efforts.
Alignment with Business Goals: Aligns data management practices with business
strategies and objectives.
Scalability and Adaptability: Supports scalability and adapts to evolving business
requirements and technological changes.
Complexity: Managing the complexity of integrating diverse data sources and business
domains.
Coordination: Requires coordination across different departments, stakeholders, and IT
teams.
Maintenance: Requires ongoing maintenance and updates to reflect changes in business
processes and technology.
Governance: Ensuring proper governance and stewardship to maintain data quality and
integrity.
The Data Modeling Development Life Cycle (DMDLC) outlines the stages and processes
involved in creating, maintaining, and evolving data models within an organization. It
encompasses activities from initial planning and requirements gathering to implementation and
ongoing maintenance. Here are the typical stages of the Data Modeling Development Life Cycle:
Purpose: Translate the Conceptual Data Model into a more detailed and normalized
structure.
Activities:
o Refine entities, attributes, and relationships into a Logical Data Model (LDM).
o Apply normalization techniques (e.g., 1NF, 2NF, 3NF) to minimize redundancy
and ensure data integrity.
o Specify primary keys, foreign keys, and constraints.
o Validate the LDM with stakeholders to ensure accuracy and completeness.
Purpose: Design the physical implementation of the data model for specific database
systems.
Activities:
o Translate the Logical Data Model into a Physical Data Model (PDM).
o Define database tables, columns, data types, indexes, and storage parameters.
o Optimize performance considerations (e.g., partitioning, indexing) based on
database platform requirements.
o Document the PDM and generate database schema scripts for implementation.
Purpose: Manage changes, updates, and enhancements to the data model over time.
Activities:
o Monitor data model performance and usage patterns.
o Address data quality issues and maintain metadata documentation.
o Incorporate changes driven by business needs, regulatory requirements, or
technological advancements.
o Conduct periodic reviews and optimizations to ensure the data model remains
aligned with evolving business goals.
Key Considerations:
Iterative Process: Data modeling often involves iterative cycles of refinement and
validation based on feedback from stakeholders and users.
Collaboration: Collaboration between business stakeholders, data architects, database
administrators, and developers is critical throughout the DMDLC.
Documentation: Documenting each stage of the data modeling process is essential for
transparency, knowledge transfer, and future maintenance.
Benefits of DMDLC:
Challenges: