Document Database
Document Database
In the EER model, we can create groups of related entities using supertypes and subtypes. A supertype is
a general category that has specific subtypes underneath it. Each subtype has its own unique attributes
and relationships.
For example, imagine we have a database for a zoo. We can create a supertype called "animals" and
have subtypes like "mammals," "birds," and "reptiles." Each subtype would have its own specific
attributes and relationships, but they would all share some common attributes and relationships with
the "animals" supertype.
We can also use clusters to group together entities that share common characteristics or relationships.
For example, we can create a cluster for "animal habitats" to group together entities like "lion habitat,"
"elephant habitat," etc.
Using the EER model helps us to better understand the relationships between different groups of entities
in a database and allows us to represent more complex relationships than we can with just the basic ER
model.
SUMMARY 2
In the EER model, we can organize entities into a hierarchy using supertypes and subtypes. A supertype
is a general category of entities, while subtypes are more specific categories that inherit attributes and
relationships from their supertype.
There are two types of subtype hierarchies: disjoint, where an entity can belong to only one subtype,
and overlapping, where an entity can belong to more than one subtype. A subtype discriminator is used
to determine which subtype an entity belongs to.
Subtypes can also exhibit partial or total completeness. In a partially complete hierarchy, not all entities
in the supertype have to belong to a subtype, while in a totally complete hierarchy, every entity in the
supertype must belong to a subtype.
There are two approaches to developing a specialization hierarchy: specialization and generalization.
Specialization involves creating more specific subtypes from a general supertype by adding additional
attributes or relationships. Generalization involves creating a more general supertype from specific
subtypes by identifying common attributes and relationships.
Overall, the specialization hierarchy helps us to organize entities in a database and understand the
relationships between them. It allows us to represent complex relationships and to easily add or modify
attributes and relationships.
SUMMARY 3
In a database, there are often many different entities (like people, orders, or products) and relationships
between them (like a person placing an order or a product being part of an order). Sometimes, these
entities and relationships can get complex and difficult to understand when you look at them all
together.
An entity cluster is a way to group together multiple related entities and relationships into a single
abstract entity. It's like creating a "super-entity" that represents several other entities and their
relationships with each other. This can help simplify the diagram of the database and make it easier to
understand.
For example, imagine you have a database for a library. There might be entities for books, authors,
borrowers, and loans. The relationships between these entities could get complex, with loans being tied
to both books and borrowers. To simplify things, you could create an entity cluster called "Library
Transactions" that represents all loans and their relationships with books and borrowers. This cluster
would make the ERD easier to understand because it groups related entities and relationships together
in a clear way.
Overall, entity clusters are a useful tool for simplifying complex databases and making them more
understandable.
SUMMARY 4
In a database, each record needs to have a unique identifier. This identifier is called a "primary key". A
primary key can be made up of one or more columns in a table, and it's used to make sure that each
record in the table is unique.
A "natural key" is a type of primary key that comes from the real world. For example, if you have a table
of people, their social security number could be a natural key. The idea is that the natural key is
something that already uniquely identifies the record in the real world.
However, there are some criteria that a primary key needs to meet in order to work well in a database.
First, it needs to be unique - no two records can have the same primary key value. Second, it needs to be
"nonintelligent" - it shouldn't contain any meaning or information beyond being a unique identifier.
Third, it needs to be stable - it shouldn't change over time, because this could cause issues with data
consistency. And finally, it should be simple - preferably made up of a single attribute.
Sometimes, a natural key can meet all of these criteria and make a great primary key. But other times, it
might not. For example, a person's name might not be unique, or a product's serial number might
change over time. In these cases, an artificial or "surrogate" key might need to be created, which is just a
unique identifier that has no meaning beyond being a primary key.
So in summary, a natural key is a type of primary key that comes from the real world, but it might not
always meet all the criteria needed to be an effective primary key.
SUMMARY 5
In a database, each record needs to have a unique identifier. This identifier is called a "primary key".
Normally, a primary key is made up of a single column in a table. For example, in a table of students, the
primary key might be the student ID number.
However, there are some situations where a single column isn't enough to uniquely identify a record. In
these cases, a "composite key" can be used. A composite key is just a primary key that is made up of
multiple columns in a table.
One situation where a composite key is useful is when representing many-to-many relationships. A
many-to-many relationship means that one record in a table can be associated with many records in
another table, and vice versa. For example, in a database for a music store, a customer can buy many
CDs, and each CD can be bought by many customers. To represent this relationship, you would need a
third table that contains composite keys representing the IDs of both the customers and the CDs
involved in each transaction.
Another situation where a composite key is useful is when representing a "weak entity". A weak entity is
an entity that can't be uniquely identified on its own - it needs to be associated with another entity using
a foreign key. For example, in a database for a hospital, a hospital room is a weak entity because it can't
be uniquely identified on its own - it needs to be associated with a patient using a foreign key. In this
case, you could use a composite key made up of both the patient ID and the room number to uniquely
identify each record in the "room" table.
In summary, a composite key is just a primary key that is made up of multiple columns in a table. It's
useful in situations where a single column isn't enough to uniquely identify a record, such as many-to-
many relationships and weak entities.
SUMMARY 6
In a database, a primary key is a unique identifier for each record in a table. A natural primary key is a
primary key that already exists in the data, such as a customer's social security number or an employee's
ID number. However, sometimes there is no suitable natural key available, or the primary key may be too
complex or too long to be practical. This is where surrogate primary keys come in.
Surrogate primary keys are artificially created keys that are assigned to each record in a table. They are
typically simple numeric values generated by the database management system (DBMS) that have no
inherent meaning or relationship to the data being stored. Surrogate keys can be used when a natural
key is not available or when the natural key is not suitable for use as a primary key.
Surrogate primary keys are useful in situations where the primary key is a composite key that contains
multiple data types or attributes. Using a composite key as the primary key can be cumbersome and may
result in slower performance when querying the database. By using a surrogate key, the primary key can
be simplified and made more efficient.
Surrogate keys can also be useful when the primary key is subject to change. For example, if the primary
key is based on a customer's name, and the customer changes their name, then the primary key would
need to be updated throughout the database. By using a surrogate key, this issue is avoided because the
surrogate key is not tied to any specific attribute of the data being stored.
Overall, surrogate primary keys provide a simple and efficient way to manage primary keys in a database.
They are used when a natural primary key is not available or practical, and they can simplify the primary
key and improve database performance.
SUMMARY 7
In a 1:1 relationship, there are two entities involved, and each entity has a unique identifier known as a
primary key (PK). One entity is mandatory, which means it must exist, while the other entity is optional,
which means it may or may not exist.
When it comes to placing the PK of the mandatory entity as a foreign key (FK) in the optional entity,
there are three common options:
This means that the PK of the mandatory entity becomes a FK in the optional entity. This option is often
used when the optional entity is not expected to have many null values, or when it makes sense from a
logical standpoint.
Place the PK of the mandatory entity as a FK in the entity that causes the fewest nulls:
This option is used when one of the entities in the 1:1 relationship is expected to have fewer null values
than the other. In this case, it may be more efficient to place the PK of the mandatory entity as a FK in
the entity that is expected to have fewer null values.
This option is based on the role played by each entity in the relationship. For example, if the 1:1
relationship is between a person and their passport, the PK of the person entity could be placed as a FK
in the passport entity, since the passport "belongs to" the person.
Ultimately, the decision of where to place the PK as a FK in a 1:1 relationship depends on the specific
requirements of the database and the entities involved. It is important to choose the option that leads to
the most efficient and effective database design.
SUMMARY 8
Time-variant data is simply data that changes over time. When you have time-variant data in your
database, it's important to keep track of the changes that occur over time. To do this, you need to create
an entity that will store the history of these changes.
This entity will contain information about the new value of the data, the date the change occurred, and
any other relevant data related to the change. This entity will maintain a one-to-many (1:M) relationship
with the entity for which the history is being maintained. This means that for each record in the main
entity, there will be one or more related records in the history entity.
For example, let's say you have a database for tracking employee salaries. If an employee's salary
changes over time, you need to keep track of these changes. To do this, you would create a history entity
that would contain the new salary, the date of the change, and any other relevant data related to the
change (such as the person who made the change). This history entity would maintain a 1:M relationship
with the employee entity, so for each employee, there would be one or more related records in the
history entity that would track changes to their salary over time.
By maintaining a history of time-variant data, you can keep track of changes over time and analyze
trends or patterns in the data. This can be useful in many different applications, such as tracking
employee performance or analyzing financial data.
SUMMARY 9
A fan trap occurs when an entity is related to two other entities in a one-to-many relationship, and those
two other entities are also related to each other, but that relationship is not explicitly represented in the
model. This can create problems because the data model does not accurately represent the relationships
between the entities, which can lead to issues when querying or updating data.
For example, imagine a data model where Customers, Orders, and Products are related. If Customers are
related to both Orders and Products, but Orders and Products are not related to each other in the
model, a fan trap can occur. This is because it's possible for a customer to be associated with multiple
orders, and for each of those orders to be associated with multiple products, but the model doesn't
represent the relationship between orders and products. This can lead to confusion when trying to query
or update data.
Redundant relationships, on the other hand, occur when there are multiple relationship paths between
related entities. This can create confusion and inconsistencies in the data, especially if updates to one
path don't affect data in the other path.
For example, imagine a data model where Customers, Orders, and Products are related, and there are
two paths from Customers to Products: one through Orders, and one directly. This creates redundancy in
the model and can cause issues when trying to query or update data because changes made to one path
might not be reflected in the other.
To avoid these issues, it's important to carefully analyze the relationships between entities and ensure
that they are accurately represented in the data model. This will prevent fan traps and redundant
relationships from occurring and make querying and updating data much easier.
REDUNDANT
This paragraph discusses the concept of redundancy in databases. Redundancy refers to having multiple
copies of the same data or relationships in a database. While redundancy can be beneficial in other
computer environments, it is generally not a good idea in a database as it can lead to data
inconsistencies or anomalies.
Redundant relationships occur when there are multiple paths between related entities in a database.
These relationships need to remain consistent across the model to avoid data inconsistencies. However,
in some cases, redundant relationships are used to simplify database design.
The paragraph provides an example of a redundant relationship in Figure 5.14, where there is a transitive
1:M relationship between DIVISION and PLAYER through the TEAM entity set. The relationship that
connects DIVISION and PLAYER is redundant and can be safely deleted without losing any information-
generation capabilities in the model.
FAN TRAP
This paragraph discusses a database design issue known as a "fan trap." The fan trap occurs when an
entity has two or more 1:M relationships with other entities, resulting in a "fan out" relationship
representation that can be confusing and difficult to interpret.
The paragraph provides an example of a fan trap in Figure 5.12, where the DIVISION entity is in a 1:M
relationship with both TEAM and PLAYER. However, this relationship representation does not identify
which players belong to which team, creating a fan trap.
To eliminate the fan trap, the correct ERD is shown in Figure 5.13. In this design, DIVISION is in a 1:M
relationship with TEAM, and TEAM is in a 1:M relationship with PLAYER. This design allows easy
identification of which players play for which team. However, to find out which players play in which
division, a transitive relationship between DIVISION and PLAYER via the TEAM entity must be used.
TIME VARIANT
Time-variant data is an important consideration in data modeling and database design, and requires
careful consideration of the specific requirements for tracking changes to data over time. By properly
modeling and managing time-variant data, organizations can ensure that they have access to a complete
and accurate historical record of their data.
The passage discusses how time-variant data can affect the relationships between entities in a data
model. For example, in the case of tracking salary histories for employees, the EMP_SALARY attribute
becomes multivalued, and for each employee, there will be one or more records in the SALARY_HIST
entity. In another example, if the data model includes data about the different departments in the
organization and which employee manages each department, then an M:N relationship would be
necessary to track the history of all department managers as well as the current manager. This would
involve creating an MGR_HIST entity with 1:M relationships with both EMPLOYEE and DEPARTMENT to
reflect the fact that an employee could be the manager of many different departments over time, and a
department could have many different employee managers. The passage also discusses how the trade-
off for such a model is that each time a new manager is assigned to a department, there will be two data
modifications: one update in the DEPARTMENT entity and one insert in the MGR_HIST entity. Finally, the
passage suggests modifying the model to add a JOB_HIST entity to maintain the employee's job history.
CASES
When creating a relationship between two entities, it is recommended to place the primary key (PK) of
the entity on the mandatory side as a foreign key (FK) in the entity on the optional side. The FK should
also be made mandatory to ensure referential integrity.
This practice helps to maintain data integrity and consistency by ensuring that each record in the
optional entity is linked to a valid record in the mandatory entity. By making the FK mandatory, it
ensures that there cannot be any records in the optional entity without a corresponding record in the
mandatory entity.
For example, if we have an "Orders" table and a "Customers" table, we can create a relationship
between them by placing the PK of the "Customers" table (i.e., CustomerID) as a FK in the "Orders"
table. The FK should be mandatory to ensure that each order is associated with a valid customer.
Overall, this practice can help to ensure data integrity and consistency, which are important factors in
designing a high-quality database.
ONE TO ONE
In a 1:1 relationship between two entities in a relational database, you need to decide where to place
the foreign key. One option is to place a foreign key in both entities, but this is not recommended
because it can lead to duplication and conflicts with other relationships. The preferred option is to place
the foreign key in one of the entities, with the primary key of that entity appearing as a foreign key in the
other entity. To determine which primary key should be used as the foreign key, you can refer to Table
5.5, which provides guidance based on the relationship properties in the ERD.
SURROGATE KEY
In some situations, we may not have a natural primary key that can be used to uniquely identify each
instance of an entity. For example, in the case of the park recreation facility mentioned in the text, there
may not be a natural primary key that can be used to uniquely identify each party event that takes place
in the facility.
In such cases, we can create a surrogate key, which is a primary key that is created solely for the purpose
of identifying each instance of the entity. Surrogate keys have no meaning in the real world, but they
help us to keep track of different instances of an entity.
For example, in the case of the park recreation facility, we could create a surrogate key called Event_ID to
uniquely identify each party event that takes place in the facility. This surrogate key would have no
meaning in the real world, but it would help us to distinguish one party event from another.
One advantage of using surrogate keys is that they can be generated automatically by the database
management system (DBMS), ensuring that unique values are always provided. This eliminates the
need for manual entry of primary key values, which can be time-consuming and error-prone.
Identifiers of composite entities:
This statement is true. In a many-to-many (M:N) relationship involving composite entities, each primary
key combination is allowed only once to ensure that the relationship is unique.
This statement is true. A weak entity has a strong identifying relationship with its parent entity, which
means that it cannot exist without the parent entity.
This statement is also true. A weak entity represents a real-world object that is existence-dependent
on another real-world object. For example, a dependent child is existence-dependent on their parent
in the real world.
This statement is true as well. In a data model, a weak entity is represented as two separate entities in a
strong identifying relationship, with the parent entity serving as the identifying owner and the weak
entity serving as the dependent owner. The strong identifying relationship between the two entities
ensures that the weak entity is uniquely identified within the context of its parent entity.
This statement is true. Non-intelligent primary keys are those that are not derived from any
meaningful information about the entity being modeled, such as social security numbers or employee
IDs. Using non-intelligent keys can help ensure that the primary key is not tied to any external
information and remains stable over time.
This statement is also true. Primary keys should be stable over time and not subject to change. Changing
a primary key can have significant implications for the data stored in a database and can lead to
inconsistencies and errors.
This statement is true as well. While primary keys can be composed of multiple attributes, it is generally
preferable to use a single attribute whenever possible. This can help to simplify the design of the
database and make it easier to maintain.
This statement is also true. While primary keys can be composed of any data type, it is generally
preferable to use a numeric data type such as an integer. Numeric primary keys can help to simplify
database operations, improve performance, and make it easier to generate unique identifiers.
This statement is true too. Primary keys should be designed to be compliant with security best
practices, such as using encryption, hashing, or other methods to protect sensitive data from
unauthorized access or tampering. Additionally, primary keys should be designed to comply with any
relevant data privacy regulations or industry standards.
NATURAL KEY
This statement is true. A natural key, also known as a natural identifier, is a real-world identifier that is
used to uniquely identify real-world objects. Examples of natural keys include social security numbers,
employee IDs, and product codes.
This statement is also true. Because natural keys are based on real-world identifiers that are familiar to
end users, they can help to make a database more intuitive and user-friendly. When users see a natural
key in a database, they can quickly understand what it represents and how it relates to their daily work.
This statement is also true. In some cases, a natural key can be used as the primary key of the entity
being modeled. However, there are some situations where it may be more appropriate to use a
surrogate key, which is a system-generated identifier that is not based on a real-world identifier.
Surrogate keys can be useful when the natural key is too long or complex to use as a primary key, or
when the natural key is subject to change.
One potential drawback of using natural keys as primary keys is that they can be sensitive or
confidential information, which may need to be protected in the database. This can require additional
security measures and make it more difficult to share data with external stakeholders. Additionally,
natural keys may not be unique or stable over time, which can create challenges for maintaining data
integrity and consistency in the database.
PRIMARY KEY
This statement is true. Primary keys can be composed of a single attribute or a combination of attributes
that uniquely identify each entity instance.
This statement is also true. A primary key is a unique identifier for each entity instance in the database.
No two instances of the same entity can have the same primary key.
This statement is also true. The primary key plays a critical role in maintaining entity integrity, which
refers to on how the data is accurately and consistently stored in a database which in return it helps to
prevent duplicate data and maintain the integrity of the database.
This statement is true as well. Primary keys work in conjunction with foreign keys to establish
relationships between entities in a database. A foreign key is a field in one table that refers to the
primary key of another table, linking the two tables together. By using primary and foreign keys, to
implement or to connect complex relationships between tables in a database and ensure the data
remains consistent and accurate.
ENTITY CLUSTERS
When creating an ER diagram, it's important to show the relationships between entities and the
attributes that define those entities. Sometimes, there are so many entities and relationships that the
diagram becomes too complex and difficult to read. In these cases, entity clusters can be used to group
related entities and simplify the diagram.
An entity cluster is a way to group together multiple related entities and relationships into a single
abstract entity. It's like creating a "super-entity" that represents several other entities and their
relationships with each other. This can help simplify the diagram of the database and make it easier to
understand.
However, when using entity clusters, the key attributes of the combined entities are no longer available.
This can cause issues with the primary key inheritance rules and relationships between entities, and can
result in the loss of foreign key attributes from some entities.
To avoid these problems, it's generally recommended to avoid displaying attributes when using entity
clusters. This helps maintain the integrity of the relationships and prevents the loss of important
information.
For example, in the Tiny College example, an entity cluster named OFFERING is used to group SEMESTER,
COURSE, and CLASS entities and relationships. Another entity cluster named LOCATION is used to group
ROOM and BUILDING entities and relationships. By using entity clusters, the ERD becomes easier to
understand and communicate to stakeholder.