DDB Final Note (Full)
DDB Final Note (Full)
CSE-4845 || Distributed Database || Part-A & B || Final Term Note (Hon. MMR Sir Supremacy)
By- Sorowar Mahabub, C201032
Segment-0
Imp. 01. Discuss about the Normalization of Query Decomposition. What do you know about the Query Optimization?
Normalization of Query Decomposition:
Query decomposition involves mapping a calculus query (such as SQL) to algebraic operations like select, project, join,
and rename. The goal is to produce a semantically correct and efficient query that avoids redundant work. The process of
query decomposition comprises several steps, and one of the crucial steps is normalization.
• Normalization Steps: • In SQL, the WHERE clause is often the most complex
• Lexical and Syntactic Analysis: part, with arbitrary predicates preceded by quantifiers
• Check the validity of the query, similar to compiler (e.g., EXISTS, FOR ALL).
checks. • Normalize to Conjunctive Normal Form (CNF) or
• Verify attributes and relations. Disjunctive Normal Form (DNF).
• Perform type checking on the qualification.
Query Optimization: Query optimization is a crucial and challenging aspect of overall query processing. The objective is
to minimize the cost function, which includes I/O cost, CPU cost, and communication cost.
..
2. What do you know about the Anomalies? Explain different types of anomalies with example.
Anomalies in database systems are inconsistencies or errors that can arise in the data stored in the database. These
anomalies can occur due to improper database design or due to the way that data is manipulated by transactions.
Anomalies can lead to data integrity issues and can make it difficult to retrieve accurate information from the database.
There are three main types of anomalies in database systems:
1. Update Anomaly:
An update anomaly occurs when a single operation can change multiple records in the database in a way that violates the
database's integrity constraints. This can lead to data inconsistencies and make it difficult to maintain the accuracy of the
database.
Example: Consider a database that stores information about • DEPARTMENTS: This table stores department
employees and their departments. The database has two information, including department ID, department
tables: name, and manager ID.
• EMPLOYEES: This table stores employee
information, including employee ID, name, and If the database is not properly designed, an update anomaly
department ID. could occur when an employee's department is changed.
For example, if an employee is moved from department 1 to This update statement would change the department ID for
department 2, the following update statement could be employee 123 to 2. However, if employee 123 is also the
executed: manager of department 1, then this update would also
change the manager ID for department 1 to 2. This is an
UPDATE EMPLOYEES update anomaly because it has changed multiple records in
SET DEPT_ID = 2 the database (the employee record and the department
WHERE EMP_ID = 123; record) in a way that violates the database's integrity
constraints.
2. Insertion Anomaly
An insertion anomaly occurs when a new record cannot be inserted into the database because it violates the database's
integrity constraints. This can lead to data loss and make it difficult to add new information to the database.
Example: Consider the same database as the update
anomaly example. If the database is not properly designed, This inserts statement would fail because department 3 does
an insertion anomaly could occur when a new employee is not have a manager. This is an insertion anomaly because it
hired into a department that does not have a manager. For is not possible to insert a new employee record into the
example, if the following insert statement is executed: database while maintaining the database's integrity
INSERT INTO EMPLOYEES (EMP_ID, NAME, DEPT_ID) constraints.
VALUES (456, 'John Smith', 3);
3. Deletion Anomaly
A deletion anomaly occurs when a deletion operation can delete multiple records from the database in a way that violates
the database's integrity constraints. This can lead to data loss and make it difficult to remove information from the
database.
Example: Consider the same database as the update
anomaly example. If the database is not properly designed, This deletes statement would delete the employee record for
a deletion anomaly could occur when the manager of a employee 123. However, if employee 123 is also the
department is deleted. For example, if the following delete manager of department 1, then this delete would also delete
statement is executed: the department record for department 1. This is a deletion
anomaly because it has deleted multiple records from the
DELETE FROM EMPLOYEES database (the employee record and the department record)
WHERE EMP_ID = 123; in a way that violates the database's integrity constraints.
Preventing Anomalies
Anomalies can be prevented by using proper database design techniques, such as normalization. Normalization is a process of
organizing data in a database to reduce redundancy and ensure data integrity. There are three levels of normalization: first normal
form (1NF), second normal form (2NF), and third normal form (3NF). Databases that are normalized to 3NF are generally free fr om
anomalies.
.
Imp. 3. What do you know about the Data Localization? Discuss about the Normalization of Query Decomposition.
Data localization is a technique used in distributed database systems to optimize query processing by taking into account
the distribution of data across different fragments. The goal of data localization is to reduce the amount of data that needs
to be transferred between fragments during query execution.
Benefits of data localization:
• Reduced network traffic: By minimizing data
transfer between fragments, data localization can
significantly reduce network traffic, which can
improve the overall performance of distributed
database systems.
• Improved query performance: By reducing the
amount of data that needs to be processed, data
localization can improve the performance of
individual queries.
• Reduced contention: Data localization can help to
reduce contention for shared resources, such as
network bandwidth and CPU time.
Data localization techniques: There are a number of different data localization techniques that can be used, including:
Fragment replication: Replicating data fragments to multiple nodes can reduce the amount of data that needs to be transferred during
query execution.
Data partitioning: Partitioning data fragments based on specific attributes can improve the efficiency of certain types of queries.
Data caching: Caching frequently accessed data can reduce the need to retrieve data from remote fragments.
.
4. How many steps you need for Distributed Query Processing? Discuss it.
Distributed Query Processing involves transforming a high-level query on a distributed database into an equivalent and
efficient lower-level query on relation fragments. It is a more complex process compared to centralized query processing
due to the fragmentation/replication of relations and the additional communication costs associated with distributed
environments. The distributed query processing can be broken down into several steps:
1. Parsing and Translation: 3. Evaluation:
• Syntax Check: Ensure that the query follows the • Execution Plan Execution: The query-execution engine
correct syntax and verify the relations involved. takes the optimal evaluation plan generated in the
• Translation: Translate the high-level query (relational optimization step and executes that plan.
calculus/SQL) into an equivalent relational algebra • Answer Retrieval: Retrieve and return the answers to the
expression that considers the distributed nature of the query based on the execution of the plan.
database and the fragmentation/replication of relations. • Handling Communication Costs: In a distributed
environment, communication costs become crucial. The
2. Optimization: execution plan must consider the location of data fragments,
• Generate Evaluation Plan: Generate an optimal potentially involving data transmission between different
evaluation plan for the query, considering the nodes.
distributed nature of the database and minimizing costs
associated with communication, I/O, and CPU.
• Consider Fragmentation and Replication: Optimize
the query plan by taking into account the distribution
of data across fragments and potential replication
strategies.
• Cost Estimation: Estimate the costs associated with
different execution plans and choose the plan with the
lowest cost.
Segment-0.
Exercise 5. Indicate whether the given following schedules can produce anomalies; the symbols ci and ai indicate
the result (commit or abort) of the transaction: (All examples added)-
Ques. Anomaly: r1(x); w1(x); r2(x); w2(y); a1; c2 Ques. Anomaly: r1(x); r2(x); w2(x); w1(x); c1; c2
Dirty read: r1(x); w1(x); r2(x); w2(y); a1; c2 Update loss: r1(x); r2(x); w2(x); w1(x); c1; c2
Ques. Anomaly: r1(x); w1(x); r2(y); w2(y); a1; c2 Ques. Anomaly: r1(x); r2(x); w2(x); r1(y); c1; c2
No anomaly No anomaly
Ques. Anomaly: r1(x); r2(x); r2(y); w2(y); r1(z); a1; c2 Ques. Anomaly: r1(x); w1(x); r2(x); w2(x); c1; c2
No anomaly No anomaly
.
6. Write down the processing issues of Transactions. What do you know about the Formalization of Transactions?
Processing Issues of Transactions
Transactions are fundamental units of work in database systems. They ensure that data remains consistent in the face of
concurrent access and failures. However, there are a number of processing issues that can arise when handling
transactions. These issues can be categorized into the following areas:
• Concurrency control: Transactions need to be coordinated to ensure that they do not interfere with each other and
that the database remains in a consistent state. This can be a challenge, especially when there are a large number of
concurrent transactions.
• Isolation: Transactions should be isolated from each other, so that they cannot see the uncommitted changes of other
transactions. This is important to prevent data anomalies and ensure that transactions are atomic.
• Durability: Transactions should be durable, meaning that their effects should be permanent even in the event of
system failures. This is typically achieved by logging the changes made by transactions and using the log to recover in
case of a failure.
• Performance: Transactions should be processed efficiently, so that they do not impact the performance of other
transactions or the overall performance of the database system.
Formalization of Transactions
The formalization of transactions provides a rigorous framework for understanding and reasoning about transaction
processing. This framework is based on the concept of a transaction model, which is a mathematical model that captures
the behavior of transactions. Transaction models can be used to prove properties of transactions, such as consistency,
isolation, and durability.
Benefits of Formalizing Transactions
The formalization of transactions has a number of benefits, including:
• Improved understanding: Transaction models provide a precise and unambiguous way of understanding the behavior of
transactions.
• Formal reasoning: Transaction models can be used to prove properties of transactions, such as consistency, isolation, and
durability.
• Design and analysis: Transaction models can be used to design and analyze transaction processing systems.
• Correctness proofs: Transaction models can be used to prove the correctness of transaction processing algorithms.
Example: Formalizing the Two-Phase Locking (2PL) Protocol
The Two-Phase Locking (2PL) protocol is a widely used concurrency control mechanism for transactions. It ensures that transactions do not interfere
with each other and that the database remains in a consistent state. The 2PL protocol can be formalized using a transaction model.
The 2PL protocol consists of two phases:
• Growing phase: In the growing phase, transactions acquire locks on the data items they need.
• Shrinking phase: In the shrinking phase, transactions release the locks they hold.
The 2PL protocol ensures that transactions do not interfere with each other by requiring that transactions acquire a lock on a data item before they
can read or write it. If a transaction tries to acquire a lock on a data item that is already locked by another transaction, the transaction is blocked until
the lock is released.
.
These states represent the fundamental phases of a transaction's lifecycle, ensuring data integrity and consistency in the
face of concurrent access and potential failures.
.
8\1. Write down the steps of process that transforms a high-level query (of relational calculus/SQL) into an
equivalent and more efficient lower-level query (of relational algebra).
Transformation of High-Level Queries to Relational Algebra
The transformation of high-level queries (of relational calculus/SQL) into an equivalent and more efficient lower-level query (of
relational algebra) is a crucial step in query processing. This process involves translating the user's intent expressed in a high-level
language into a sequence of relational algebra operations that can be directly executed by the database engine. The goal of this
transformation is to optimize the query for efficient execution while preserving its semantic meaning.
Steps in Query Transformation
The process of transforming a high-level query to relational algebra typically involves the following steps: (Example:
Transformation of SQL Query to Relational Algebra)
8/2. Show that “A transaction partially committed is not necessarily be committed”, with necessary example.
Consider a bank transfer transaction where funds are transferred from one account to another. The transaction can be
divided into two steps:
1. Deduct the amount from the sender's account.
2. Add the amount to the recipient's account.
If the transaction is partially committed, then the first step has been completed, but the second step has not yet been
completed. In this state, the transaction is not fully committed, and the funds have not been permanently transferred from
the sender's account to the recipient's account.
If the transaction is not completed successfully, then the changes made in the first step will be rolled back, and the
sender's account will not be debited. In other words, the transaction will not be committed.
Therefore, a transaction that is partially committed is not necessarily be committed. It is possible for the transaction to be
rolled back and the changes made in the first step to be reversed.
.
Segment-0
9. Write down the properties of Transactions? Briefly discuss it.
ACID Properties: Transactions are fundamental units of work in database systems. They ensure that data remains
consistent in the face of concurrent access and failures. ACID properties are a set of four critical properties that
transactions must satisfy to ensure data integrity and consistency. These properties are:
1. Atomicity: A transaction is either fully completed or fails entirely. There is no intermediate state where only part of
the transaction is executed.
2. Consistency: A transaction must transform the database from one consistent state to another consistent state. It cannot
leave the database in an inconsistent state.
3. Isolation: Transactions must be isolated from each other, so that they cannot see the uncommitted changes of other
transactions. This is important to prevent data anomalies and ensure that transactions are atomic.
4. Durability: Transactions should be durable, meaning that their effects should be permanent even in the event of
system failures. This is typically achieved by logging the changes made by transactions and using the log to recover in
case of a failure.
Brief Discussion: The ACID properties are essential for ensuring the reliability and integrity of database systems. They
provide a framework for designing and implementing transaction processing systems that can handle concurrent access
and failures.
• Atomicity: Atomicity ensures that transactions are all- • Isolation: Isolation ensures that transactions are
or-nothing propositions. Either the entire transaction is executed independently and do not interfere with each
executed successfully, or none of it is executed. This other's operations. This prevents data anomalies or
prevents data from being inconsistently updated due to conflicts that can arise if multiple transactions attempt
partial transaction executions. to modify the same data simultaneously.
• Consistency: Consistency ensures that transactions • Durability: Durability ensures that the effects of
maintain the overall integrity of the database. They committed transactions are permanent, even in the
cannot leave the database in a state that violates any event of system failures. This is achieved by logging
data integrity constraints or relationships. This transaction changes to stable storage, ensuring that
maintains the overall correctness of the data stored in these changes are not lost even if the system crashes or
the database. loses power.
The ACID properties are fundamental principles for designing and managing transactions in database systems. They
provide a solid foundation for ensuring data integrity, consistency, and reliability in the face of concurrent access and
failures.
.
Exercise 10. Write the possible Transformation of an Given SQL-query into a Relational Algebra-query (Added
from Monna).
Example: Transformation of an SQL-query into an RA-query. Relations: EMP (ENO, ENAME, TITLE), ASG (ENO, PNO, RESP,
DUR)
Query: Find the names of employees who are managing a project?
– High level query
SELECT ENAME
FROM EMP, ASG
WHERE EMP.ENO = ASG.ENO AND DUR > 37
– Two possible transformations of the query are:
_ Expression 1: _ENAME(_DUR>37∧EMP.ENO=ASG.ENO (EMP × ASG))
_ Expression 2: _ENAME (EMP ⋊⋉ENO (_DUR>37(ASG)))
– Expression 2 avoids the expensive and large intermediate Cartesian product, and therefore typically is better.
.
11. Write down the properties of Strict 2PL (two phase locking) protocol? When two schedules are said to be
Equivalent?
Properties of Strict 2PL (two phase locking) protocol
Strict 2PL, or Rigorous 2PL, is a variation of the Two-Phase Locking (2PL) protocol for concurrency control in database systems. It is
a stricter version of 2PL that ensures both serializability and conflict-serializability, which are stronger consistency properties than
those guaranteed by standard 2PL.
The main properties of Strict 2PL are:
• Serializability: Strict 2PL ensures that the execution of a set of transactions is equivalent to some serial execution,
meaning that the final state of the database is the same as if the transactions had been executed one at a time.
• Conflict-Serializability: Strict 2PL also ensures conflict-serializability, which means that if two transactions
conflict, then their execution is equivalent to some serial execution where one transaction executes entirely before
the other.
• Deadlock Freedom: Strict 2PL is deadlock-free, meaning that it will not allow a situation to occur where two or
more transactions are waiting for each other to release locks, preventing either of them from making progress.
• Recoverability: Strict 2PL ensures that transactions are recoverable, meaning that the effects of a committed
transaction are not lost even if the system fails.
Equivalence of Schedules
In the context of transaction processing, two schedules are said to be equivalent if they produce the same final state of the
database. This means that the order in which the transactions are executed does not affect the overall outcome.
There are two main types of equivalence:
• State equivalence: Two schedules are state equivalent • Conflict equivalence: Two schedules are conflict
if they produce the same final state of the database, equivalent if they produce the same final state of the
regardless of the initial state. database for all possible initial state.
Strict 2PL ensures that all schedules produced by the protocol are equivalent, meaning that the order in which transactions are
executed does not affect the overall outcome. This is important for ensuring data consistency and avoiding anomalies.
Segment-0
Imp. 13. What is the site failure in 2PC protocol? Write down the phases of three phase commit protocol(3PC).
The 2PC protocol is a distributed commit protocol used to ensure atomicity (either all participants commit or all participants abort) in
distributed transactions. The protocol involves two phases:
1. Prepare Phase: 2. Commit or Abort Phase:
• The coordinator sends a prepare message to all • If all participants reply "ready," the coordinator sends a
participants, asking them if they are ready to commit. commit message to all participants.
• Each participant replies with either "ready" or "not • If any participant replies "not ready" or if a timeout
ready." occurs, the coordinator sends an abort message to all
participants.
Site failure in the context of the Two-Phase Commit (2PC) protocol refers to the situation where a participant site in
the distributed transaction becomes unavailable or fails to respond during the protocol execution. This can occur due to
various reasons, such as hardware or software failures, network disruptions, or power outages.
Now, let's discuss the implications of a site failure in the 2PC protocol:
• Participant Failure: • Network Partition or Site Failure:
• If a participant fails during the 2PC protocol, and the • If there is a network partition or a failure at a
coordinator receives no response from that participant site after it has voted "ready," the
participant, the coordinator has to make a decision coordinator might not receive the acknowledgment.
based on the responses it did receive. • If the coordinator is unable to determine the outcome
• If all other participants voted "ready," the of the participant, it may decide to abort the
coordinator might decide to commit the transaction, transaction to ensure consistency.
assuming the missing participant was ready.
• If the coordinator receives any "not ready" votes or
if it has any doubts, it might decide to abort the
transaction.
• Coordinator Failure:
• If the coordinator fails after participants have
already agreed to commit but before sending the
commit message, a new coordinator must be chosen
to complete the protocol.
• The new coordinator can use information from the
previous coordinator's logs to determine the state of
the transaction and proceed accordingly.
Shrinking Phase:
1. A transaction releases a lock on a data item by sending an
unlock request message to the lock manager.
2. The lock manager removes the lock from the lock table and
sends an unlock grant message to the transaction.
3. The transaction can now commit or abort. If it commits, it
notifies the lock manager of the committed data items. The The 2PL protocol ensures that transactions do not interfere with
lock manager releases any locks held by the committed each other and that all transactions see a consistent view of the
transaction. If it aborts, it notifies the lock manager of the data.
aborted data items. The lock manager releases any locks held
.
Imp. 15. What do you know about the site failures in 2PC protocol? What is the Distributed Reliability Protocol
and Commit protocol?
Distributed Reliability Protocols: Distributed reliability Commit Protocols: Commit protocols are specifically
protocols manage the coordination and recovery of designed to manage the final phase of a distributed
distributed transactions, ensuring that all participating transaction, ensuring that all participating nodes agree on
nodes agree on the outcome of a transaction, whether it's a whether to commit or abort the transaction. They provide
successful completion or an abort. These protocols address atomicity and durability guarantees for distributed
the challenges arising from communication delays, node transactions, ensuring that all participants either commit
failures, and network disruptions that can occur in the entire transaction or abort the entire transaction.
distributed environments.
Common commit protocols include:
Common distributed reliability protocols include: • Two-Phase Commit (2PC): The commit phase of 2PC
• Two-Phase Commit (2PC): A widely used protocol involves sending a commit message to all participants and
that involves two phases – prepare and commit – to waiting for commit-ok messages from all participants before
ensure that all participants agree on the outcome of a declaring the transaction committed.
transaction. • Voting Commit: The commit phase of voting commit
involves collecting votes from all participants and deciding
• Three-Phase Commit (3PC): An extension of 2PC
based on the majority vote whether to commit or abort the
that adds an additional pre-commit phase to handle transaction.
coordinator failures. • Global Lock Manager (GLM): A centralized approach
• Voting Commit: A protocol that uses voting among that uses a global lock manager to coordinate the commit
participants to determine the outcome of a transaction. process and ensure data consistency across all participants.
.
16. What is Reliability and Out-of-place Update? Describe the steps of two-phase commit protocol (2PC).
Reliability in the context of data management refers to the ability of a system to maintain data integrity and availability in
the face of errors, failures, or disruptions. It ensures that data remains consistent and accessible even when unexpected
events occur, such as hardware failures, software glitches, or network outages.
Out-of-place update is a technique used in data management to modify data without overwriting its existing location.
Instead, the updated data is placed in a new location, and the original location is marked as obsolete. This approach helps
maintain data integrity and consistency by preserving the original data in case of rollback or recovery.
17. What is Concurrency Control? What do you know about the Locking based concurrency algorithms?
Concurrency control is a set of techniques that ensure data consistency and transaction integrity in the presence of
multiple concurrent users accessing and modifying a shared database. It manages the simultaneous access to shared data
by multiple users, preventing conflicts and ensuring that all users see a consistent view of the data.
Locking-based concurrency algorithms are a type of concurrency control mechanism that uses locks to control access to
shared data. A lock is a mechanism that prevents other users from modifying a data item while it is being accessed by a
user. Locks can be applied at different levels of granularity, from individual data items to entire tables or even the entire
database. Common locking-based concurrency algorithms include:
• Two-Phase Locking (2PL): A simple and widely used algorithm that involves two phases: growing and shrinking. In
the growing phase, transactions acquire locks on the data items they need, and in the shrinking phase, they release the
locks they hold.
• Optimistic Locking: An algorithm that defers lock acquisition until a transaction is about to commit. This reduces
lock contention and improves performance, but it also increases the risk of conflicts.
• Timestamp-Based Locking: An algorithm that assigns timestamps to transactions and uses timestamps to determine
the order in which locks are granted. This ensures that transactions are serialized in a consistent order.
Locking-based concurrency algorithms are effective at preventing conflicts and ensuring data consistency, but they can
also introduce overhead and reduce performance. The choice of locking algorithm depends on the specific requirements of
the application, such as the level of concurrency, the type of data being accessed, and the desired performance
characteristics.
.
Segment-0
Imp. 19. What is the Deadlock Management? Describe the steps of three phase commit protocol (3PC).
Imp. 21. What is the Deadlock Management? What we need to do for the Deadlock prevention?
Deadlock Management: Deadlock management is a technique used in database systems to prevent deadlocks, situations
where two or more processes are waiting for each other to release resources, causing a standstill. Deadlocks can occur
when multiple processes hold exclusive locks on resources needed by others, creating a circular dependency.
There are two main approaches to manage deadlocks:
1. Deadlock prevention: This approach ensures that a deadlock 2. Deadlock detection and recovery: This approach allows
can never occur by analyzing resource allocation requests and deadlocks to occur but identifies them and then takes
only granting them if they cannot lead to a circular corrective action to release resources and allow processes to
dependency. Techniques like resource ordering, lock-down, continue. Resource allocation graphs and wait-for graphs are
and preemption are used for prevention. used for detection, and rollback of one or more transactions is
used for recovery.
Preventing Deadlocks: Deadlock prevention is a technique used to prevent deadlocks from occurring in the first place.
This can be done by carefully designing the system to avoid situations where two or more processes can wait for each
other indefinitely. There are a few different approaches to deadlock prevention, including:
1. Resource Ordering: Establish a strict ordering of resources that processes must acquire. This prevents circular
waiting patterns that lead to deadlocks.
2. Resource Preemption: Allow the system to temporarily revoke a resource held by a process if it's required by
another process in a critical situation.
3. Careful Resource Allocation: Avoid allocating all resources to a single process at the start, as this increases the
likelihood of deadlocks.
4. Hold and Wait: Ensure that processes only hold the resources they need for the current task and release them
promptly when finished.
5. Deadlock Detection Algorithms: Implement algorithms that can detect deadlocks when they occur, allowing for
appropriate recovery measures.
Three Phase Commit Protocol (3PC)
The Three Phase Commit Protocol (3PC) is a distributed algorithm used to ensure consistency in a distributed database
system when committing a transaction across multiple nodes. It improves upon the Two-Phase Commit (2PC) protocol by
addressing its vulnerability to coordinator failure.
Here are the steps of 3PC, as highlighted in the image you sent:
Phase 1: Voting/Prepare Phase: Phase 3: Commit/Deciding Phase:
1. Coordinator sends a Prepare message to all 1. If all participants respond with Ready-to-
participants. commit, coordinator broadcasts a Global-
2. Participants vote based on their ability to commit: commit message.
o Vote-commit: If prepared to commit, participant 2. Participants log Commit, commit the
logs Ready and replies with a Vote- transaction, and send an acknowledgment.
commit message.
o Vote-abort: If unable to commit, participant
logs Abort and replies with a Vote-
abort message.
20. What is the concept of Conceptual design and the Logical design? What is the difference between star schema
and snow flake schema design?
Conceptual Design Logical Design
Conceptual design is the initial stage of database design, Logical design takes the conceptual design one step
where the focus is on understanding the problem domain further and translates the high-level models into a more
and identifying the entities, attributes, and relationships detailed data model. It involves defining the data
between them. It involves creating high-level models structures, data types, and relationships within the
that represent the real-world entities and their database. The logical model is typically represented
relationships. These models are typically represented using a specific data modeling language, such as the
using Entity-Relationship Diagrams (ERDs) or Unified Data Definition Language (DDL) of a particular
Modeling Language (UML) diagrams. database management system (DBMS).
The goal of conceptual design is to establish a common The goal of logical design is to create a detailed
understanding of the data requirements among representation of the database structure that can be
stakeholders and to identify the core concepts of the implemented using a specific DBMS. It helps to ensure
system. It helps to ensure that the database is designed to that the database is efficient and scalable, and that it can
meet the specific needs of the organization. meet the performance and security requirements of the
organization.
Star Schema vs. Snowflake Schema: Both star schema and snowflake schema are data modeling approaches used for
designing data warehouses. They differ in their structure and how they handle data redundancy.
Feature Star Schema Snowflake Schema
Definition A simplified data model with flat tables A normalized data model with hierarchical tables
More complex, with fact tables, dimension tables, and
Structure Simpler, with fact tables and dimension tables
sub-dimension tables
Data redundancy Higher Lower
Data Warehouse (DW) related Exercise (If possible, go through the PDF): https://t.me/c/1653334055/127