DBMS Chapter 3 PDF

Download as pdf or txt
Download as pdf or txt
You are on page 1of 27

Chapter 5, Problem 1RQ

Problem

Define the following terms as they apply to the relational model of data: domain, attribute, n-
tuple, relation schema, relation state, degree of a relation, relational database schema, and
relational database state.

Step-by-step solution

Step 1 of 7

575-5-1RQ

1. Domain: Domain is a set of atomic (indivisible) values that can appear in a particular column
in a relational schema. A common method of specifying domain is to specify a data type (integer,
character, floating point, etc...) from which the data values forming a domain can be drawn.

For example: Consider a relational schema called Student that may have facts about students
in a particular course. Consider a fact to be name of the student. Name of a student must be a
char string. So we can say domain of name is char string.

Comment

Step 2 of 7

2. Attribute: An Attribute is a role played by some domain in the relational schema.

For example: In relational Schema STUDENT, NAME can be one of the attributes of the relation

NOTATIONS:

• Relational Schema R1 >> R(A1,A2,…..,AN)

• Attributes>> A1, A2 ….

• Domain of say A1>> dom(A1)

• Tuple>> t

Comment

Step 3 of 7

3. N-tuple: If a Relational Schema consists of n Attributes, i.e., degree of relational schema is n,


then n-tuple is an ordered list of n values that represent a tuple , t = ; where each value
vi,1<=i<=n, is a element of dom(Ai) or is a special NULL value.

For example: In a relational schema STUDENT, if we have four attributes, viz., Name, Roll No.,
Class, and , Rank then n-tuple for a student can be where

Student Ram has roll number 1 and studies in class to and got rank 5 in class.

4. Relational Schema: Relational schema is but collection of attributes that define facts and
relation between a real world entity and name. In other words a relational schema R, denoted by
R (A1,A2,….,AN), is made up of a name and a list of attributes A1, A2,…,An.

For example: STUDENT can be name of a relational schema and Name, Roll No., Class, and ,
Rank can be its four attributes.

Comment

Step 4 of 7

5. : A relation state, r, of a relation schema R(A1, A2,……An), is a set of n-tuples. In another


words a relation state of a relational schema is a collection of various tuples, where each tuple
represents information about single entity.

For example: In relational schema for student collection of data for 2 students, viz., , is a relation
state.

Formal Definition: A relation state, r(R), is a mathematical relation of degree n on the domains
of all attributes, which is a subset of the cartesian product of the domains that define R:

r(R) C (dom (A1) × dom (A2)×……..× dom (An))

Comment

Step 5 of 7

6. Degree of a Relation: The degree (or arity) of a relation is the number of attributes n of its
relational schema.

Comment
Step 6 of 7

7. Relational Database Schema: A Relational Database Schema S is a set of relation schemas,


S = { R1,R2,….Rn} and a set of integrity constraints IC.

Comment

Step 7 of 7

8. : A Relational Database State DB of S is set of relation states, DB = {r1,r2,….rn}, such that


each ri is state of Ri and such that the ri relation states satisfy the integrity constraints specified
in IC.

Comment
Problem
Chapter 5, Problem 2RQ

Why are tuples in a relation not ordered?

Step-by-step solution

Step 1 of 2

A relation in database management is defined as a set of tuples.

And mathematically, the elements of a set have no order among them.

Comment

Step 2 of 2

Hence, the tuples in a relation are not ordered.

Comment
Chapter 5, Problem 3RQ

Problem

Why are duplicate tuples not allowed in a relation?

Step-by-step solution

Step 1 of 1

Duplicate tuples are not allowed in a relation as it violates the relational integrity constraints.

• A key constraint states that there must be an attribute or combination of attributes in a relation
whose values are unique.

• There should not be any two tuples in a relation whose values are same for their attribute
values.

• If the tuples contains duplicate values, then it violates the key constraint.

Hence, duplicate tuples are not allowed in a relation.

Comment
Chapter 5, Problem 4RQ

Problem

What is the difference between a key and a superkey?

Step-by-step solution

Step 1 of 2

A super key SK is a set of attributes that uniquely identifies the tuples of a relation. It satisfies the
uniqueness constraint.

A key K is an attribute or set of attributes that uniquely identifies the tuples of a relation. It is a
minimal super key. In other words, when an attribute is removed from super key, it will no longer
be a super key.

Comment

Step 2 of 2

The differences between key and super key are as follows:

Comment
Chapter 5, Problem 5RQ

Problem

Why do we designate one of the candidate keys of a relation to be the primary key?

Step-by-step solution

Step 1 of 1

Every relation must contain an attribute or combination of attributes which can used to uniquely
identify each tuple in a relation.

• An attribute or combination of attributes which can used to uniquely identify each tuple in a
relation is known as candidate key.

• A relation can have more than one candidate key.

• Among several candidate key, one candidate key which is usually single and simple is chosen
as a primary key.

• A primary key is an attribute that uniquely identifies each tuple in a relation.

Comment
Chapter 5, Problem 6RQ

Problem

Discuss the characteristics of relations that make them different from ordinary tables and files.

Step-by-step solution

Step 1 of 2

The tables, relations, and files are the key concepts of the relational data model. A relation
resembles a table, but it has some added constraints to it to use the link between two tables in
an efficient way.

A file is basically a collection of records or a table stored on a physical device.

Comment

Step 2 of 2

Even though both the relation and a table are used to store/represent data, there are differences
between them as shown below:

Comment
Chapter 5, Problem 7RQ

Problem

Discuss the various reasons that lead to the occurrence of NULL values in relations.

Step-by-step solution

Step 1 of 2

NULL value:

The absence of a data, that is “nothing” represented as an empty value.

• The NULL value can be considered as a data.

• The data may be “zero”, “blank” or “none”

For example,

If the student does not have any pen or pencil for the exam,

• For that particular student, the values of those attributes are defined as NULL.

• The NULL value can be either the values do not exist, an unknown value or the value not yet
available.

Comment

Step 2 of 2

The Occurrence of NULL values in relations:

• The tuple can be marked as NULL, When the value of an attribute is not applicable.

• The tuple can be marked as NULL, When the existing value of an attribute is unknown.

• If the value of an attribute does not apply to a tuple, it is also marked as NULL.

• If the value of an attribute is not known or not found, the particular tuple is marked as NULL.s

• For instance, suppose the values are known but specifically does not apply to the tuple it is
marked as NULL.

• In relations of NULL values, the values exist but at present, it is not available.

• In relations of NULL values, the different meanings can be conveyed by different codes.

• In relations, the operations of NULL value have been proved when the lack of value (NULL) is
found.

Comment
Chapter 5, Problem 8RQ

Problem

Discuss the entity integrity and referential integrity constraints. Why is each considered
important?

Step-by-step solution

Step 1 of 2

Entity Integrity Constraint: It states that no primary key value can be NULL.

Importance: Primary key values are used to identify a tuple in a relation. Having NULL value for
primary key will mean that we cannot identify some tuples.

Referential Integrity Constraints: It states that a tuple in one relation that refers to another
relation must refer to an existing tuple in that relation

Comment

Step 2 of 2

Definition using Foreign Key: For two relational schemas R1 and R2, a set of attributes FK in
relation schema R1, is foreign key of R1 that references relation R2 f it satisfies following
condition:

• Attributes in FK have same domain(s) as primary key attributes PK of R2; the attributes FK are
said to reference relation R2.

• A value of FK in a tuple t1 of the current state r1 (R1) either occurs in as a value of PK for some
tuple in the current state r2 (R2) or is NULL . In former case (t1 [FK] = t2 [PK]) tuple t1 is said to
refer to the tuple t2.

When these two conditions hold true between R1 the referencing relation and R2 the referenced
relation the referential integrity constraint is said to hold true.

Importance: Referential Integrity constraints are specified among two relations and are used to
maintain consistency among tuples in two relations.

Comment
Chapter 5, Problem 9RQ

Problem

Define foreign key. What is this concept used for?

Step-by-step solution

Step 1 of 2

A foreign key is an attribute or composite attribute of one relation which is/are a primary key of
other relation that is used to maintain relationship between two relations.

• A relation can have more than one foreign key.

• A foreign key can contain null values.

Comment

Step 2 of 2

The concept of foreign key is used to maintain referential integrity constraint between two
relations and hence in maintaining consistency among tuples in two relations.

• The value of a foreign key should match with value of the primary key in the referenced relation.

• A value to a foreign key cannot be added which does not exist in the primary key of the
referenced relation.

• It is not possible to delete a tuple from the referenced relation if there is any matching record in
the referencing relation.

Comment
Chapter 5, Problem 10RQ

Problem

What is a transaction? How does it differ from an Update operation?

Step-by-step solution

Step 1 of 2

A transaction is a program in execution that involves various operations that can be done on the
database.

The operations that are included in a transaction are as follows:

• Reading data from the database.

• Deleting a tuple from the database.

• Inserting new tuples to the database

• Updating values of existing tuples in the database.

Comment

Step 2 of 2

The main difference between update operation and a transaction is as follows:

• In an update operation, only a single attribute value can be changed at one time.

• In a transaction, more than one update operation along with reading data from the database,
insertion and deletion operations can be done.

Comment
Chapter 5, Problem 11E

Problem

Suppose that each of the following Update operations is applied directly to the database state
shown in Figure 5.6. Discuss all integrity constraints violated by each operation, if any, and the
different ways of enforcing these constraints.

a. Insert <‘Robert’, ‘F’ ‘Scott’, ‘943775543’, ‘1972-06-21’, ‘2365 Newcastle Rd, Bellaire, TX’, M,
58000, ‘888665555’, 1> into EMPLOYEE.

b. Insert <‘ProductA’, 4, ‘Bellaire’, 2> into PROJECT.

c. Insert <‘Production’, 4, ‘943775543’, ‘2007-10-01’> into DEPARTMENT.

d. Insert <‘677678989’, NULL, ‘40.0’> into WORKS_ON.

e. Insert <‘453453453’, ‘John’, ‘M’, ‘1990-12-12’, ‘spouse’> into DEPENDENT.

f. Delete the WORKS_ON tuples with Essn = ‘333445555’.

g. Delete the EMPLOYEE tuple with Ssn = ‘987654321’..

h. Delete the PROJECT tuple with Pname = ‘ProductX’.

i. Modify the Mgr_ssn and Mgr_start_date of the DEPARTMENT tuple with Dnumber = 5 to
‘123456789’ and ‘2007-10-01’, respectively.

j. Modify the Super_ssn attribute of the EMPLOYEE tuple with Ssn = ‘999887777’ to
‘943775543’.

k .Modify the Hours attribute of the WORKS_ON tuple with Essn = ‘999887777’ and Pno = 10 to
‘5.0’.

Step-by-step solution
Step 1 of 11

(a)

Acceptable operation.

Comment

Step 2 of 11

(b)

Not Acceptable. Violates referential integrity constraint as value of Department number that is
foreign key is not present in DEPARTMENT relation.

Ways of enforcing as follows:

• Not performing the operation and explain to user cause of the same.

• Inserting NULL value in department field and performing operation.

• Prompting user to insert department with Dept number 2 in DEPRTMENT relation and then
performing the operation.

Comment

Step 3 of 11

(c)

Not Acceptable. Violates Key constraint. Department with dept number 4 already exist. Ways of
enforcing as follows:

• Not performing the operation and explain to user cause of the same.

Comment

Step 4 of 11

(d)

Not Acceptable. Violates entity Integrity constraint and referential integrity constraint. Value of
one of the Attributes of primary is NULL. Also value of Essn is not present in referenced relation,
i.e., EMPLOYEE.

Ways of enforcing as follows:

• Not performing the operation and explain to user cause of the same.

• Prompting user to specify correct values for the primary key and performing the operation.

Comment

Step 5 of 11

(e)

Acceptable

Comment

Step 6 of 11

(f)

Acceptable

Comment

Step 7 of 11

(g)

Not Acceptable.

Violates referential integrity constraint as value of Ssn has been used as foreign key of
WORKS_ON, EMPLOYEE, DEPENDENT, DEPARTMENT relations and deleting record with Ssn
= ‘987654321’ will leave no corresponding entry for record in WORKS_ON relation.

Ways of enforcing as follows:

• Not performing the operation and explain to user cause of the same.

• Deleting corresponding records in corresponding tables as well.

Comment
Step 8 of 11

(h)

Not Acceptable.

Violates referential integrity constraint as value of Pnumber has been used as foreign key of
WORKS_ON relation and deleting record with Pname = ‘ProductX’ will also delete product with
Pnumber = ’1’. Since this value has been used in WORKS_ON table so deleting this record will
violate referential integrity constraint. Ways of enforcing as follows:

• Not performing the operation and explain to user cause of the same.

• Deleting corresponding records in corresponding tables as well.

Comment

Step 9 of 11

(i)

Acceptable.

Comment

Step 10 of 11

(j)

Not Acceptable.

Violates referential integrity constraint as value of Super_Ssn is also foreign key for EMPLOYEE
relation. Since no employee with Ssn = ‘943775543’ exist so Super_Ssn of any employee cannot
be ‘943775543’.

Ways of enforcing as follows:

• Not performing the operation and explain to user cause of the same.

• Prompting user to either add a record in EMPLOYEE relation with Ssn = ‘943775543’ or to
change Super_Ssn to some valid value.

Comment

Step 11 of 11

(k)

Acceptable.

Comment
Chapter 5, Problem 12E

Problem

Consider the AIRLINE relational database schema shown in Figure, which describes a database
for airline flight information. Each FLIGHT is identified by a Flight_number, and consists of one or
more FLIGHT_LEGs with Leg_numbers 1, 2, 3, and so on. Each FLIGHT_LEG has scheduled
arrival and departure times, airports, and one or more LEG_INSTANCEs—one for each Date on
which the flight travels. FAREs are kept for each FLIGHT. For each FLIGHT_LEG instance,
SEAT_RESERVATIONs are kept, as are the AIRPLANE used on the leg and the actual arrival
and departure times and airports. An AIRPLANE is identified by an Airplane_id and is of a
particular AIRPLANE_TYPE. CAN_LAND relates AIRPLANE_TYPEs to the AIRPORTs at which
they can land. An AIRPORT is identified by an Airport_code. Consider an update for the AIRLINE
database to enter a reservation on a particular flight or flight leg on a given date.

a. Give the operations for this update.

b. What types of constraints would you expect to check?

c. Which of these constraints are key, entity integrity, and referential integrity constraints, and
which are not?

d. Specify all the referential integrity constraints that hold on the schema shown in Figure.

The AIRLINE relational database schema.

Step-by-step solution

Step 1 of 4

a.

First it is necessary check if the seats are available on the on a particular flight or flight leg on a
given date. This can be done by checking the LEG_INSTANCE relation.

SELECT Number_of_available_seats FROM LEG_INSTANCE

WHERE Flight_number ='FL01' and Date='2000-06-07';

If the Number_of_available_seats>0, then perform the following operation to reserve a seat.

INSERT INTO SEAT_RESERVATION VALUES

('FL01', '1', '2000-06-07', '1', 'John','9910110110');

Comment

Step 2 of 4

b.

The constraints that need to be checked into to perform the update are as follows:

• Check if Number_of_available_seats in LEG_INSTANCE relation for the particular flight on the


particular date is greater than 1.

• Check if the particular SEAT_NUMBER for particular flight on the particular date is available or
not.

Comments (1)
Step 3 of 4

c.

Checking the Number_of_available_seats in LEG_INSTANCE relation does not come under


entity or referential integrity constraint.

Checking for SEAT_NUMBER particular flight on the particular date comes under entity integrity
constraint.

Comment

Step 4 of 4

d.

A referential integrity constraint specifies that the value of a foreign key should match with value
of the primary key in the primary table.

The referential integrity constraints hold are as follows:

• Flight_number of FLIGHT_LEG relation is a foreign key which references the Flight_number of


FLIGHT relation.

• Flight_number of LEG_INSTANCE is a foreign key which references the Flight_number of


FLIGHT relation.

• Flight_number of FARE is a foreign key which references the Flight_number of FLIGHT relation.

• Flight_number of SEAT_RESERVATION is a foreign key which references the Flight_number of


FLIGHT relation.

• Departure_airport_code and Arrival_airport_code of FLIGHT_LEG are foreign keys which


references the Airport_code of AIRPORT relation.

• Departure_airport_code and Arrival_airport_code of LEG_INSTANCE are foreign keys which


references the Airport_code of AIRPORT relation.

• Airport_code of CAN_LAND is a foreign key which references the Airport_code of AIRPORT


relation.

• Flight_number and Leg_number of LEG_INSTANCE are foreign keys which references


Flight_number and Leg_number of FLIGHT_LEG.

• Airplane_id of LEG_INSTANCE is a foreign key which references the Airplane_id of AIRPLANE


relation.

• Flight_number, Leg_number and Date of SEAT_RESERVATION are are foreign keys which
references Flight_number, Leg_number and Date of LEG_INSTANCE relation.

• Airplane_type_name of CAN_LAND is a foreign key which references the Airplane_type_name


of AIRPLANE_TYPE relation.

Comment
Chapter 5, Problem 13E

Problem

Consider the relation CLASS(Course#, Univ_Section#, Instructor_name, Semester,


Building_code, Room#, Time_period, Weekdays, Credit_hours). This represents classes taught
in a university, with unique Univ_section#s. Identify what you think should be various candidate
keys, and write in your own words the conditions or assumptions under which each candidate
key would be valid.

Step-by-step solution

Step 1 of 2

The relation CLASS specified about the uniqueness of and classes that are
taught in University.

As per the CLASS relation, the following are the possible candidate keys:

1. – If this is unique throughout all the semesters.

2. – If at least one course is taught by an instructor for each


semester.

3. – If at
given same time, for a specific semester, same room cannot be used by more than one course.

Comment

Step 2 of 2

4. – These would be the candidate keys if the


is not unique. In this case, more than one Universities are considered and
depending on the section numbers used by rules of University.

5. Otherwise, – If is unique, then all the sections


are assigned with unique numbers throughout the semester.

Comment
Chapter 5, Problem 14E

Problem

Consider the following six relations for an order-processing database application in a company:

CUSTOMER(Cust#, Cname, City)

ORDER(Order#, Odate, Cust#, Ord_amt)

ORDER_ITEM(Order#, Item#, Qty)

ITEM(Item#, Unit_price)

SHIPMENT(Order#, Warehouse#, Ship_date)

WAREHOUSE(Warehouse#, City)

Here, Ord_amt refers to total dollar amount of an order; Odate is the date the order was placed;
and Ship_date is the date an order (or part of an order) is shipped from the warehouse. Assume
that an order can be shipped from several warehouses. Specify the foreign keys for this schema,
stating any assumptions you make. What other constraints can you think of for this database?

Step-by-step solution

Step 1 of 2

Foreign Keys:

a. Cust# of ORDER is FK for CUSTOMER: orders are taken from recognized customers only.

b. Order# of ORDER_ITEM is FK of ORDER.

c. Item# of ORDER_ITEM is FK of ITEM: Orders are taken only for items in stock.

d. Order# of SHIPMENT is FK of ORDER: Shipment is done only for orders taken.

e. Warehouse# of SHIPMENT is FK of WAREHOUSE: shipment is done only from companies


warehouses.

Comment

Step 2 of 2

Other Constraints:

• Ship_date must be greater (later date) then Odate in ORDER. Order must be taken before it is
shipped.

• Ord_amt must be greater than Unit_price.

Comment
Chapter 5, Problem 15E

Problem

Consider the following relations for a database that keeps track of business trips of salespersons
in a sales office:

SALESPERSON(Ssn, Name, Start_year, Dept_no)

TRIP(Ssn, From_city, To_city, Departure_date, Return_date, Trip id)

EXPENSE(Trip id, Account#, Amount)

A trip can be charged to one or more accounts. Specify the foreign keys for this schema, stating
any assumptions you make.

Step-by-step solution

Step 1 of 3

A foreign key is a column or composite of columns which is/are a primary key of other table that
is used to maintain relationship between two tables.

• A foreign key is mainly used for establishing relationship between two tables.

• A table can have more than one foreign key.

Comment

Step 2 of 3

The foreign keys in the given relations are as follows:

• Ssn is a foreign key in TRIP relation. It references the Ssn of SALESPERSON relation.

• Trip_id is a foreign key in EXPENSE relation. It references the Trip_id of TRIP relation.

Comment

Step 3 of 3

Assume that there are additional tables that stores the department information and account
details. Then possible foreign keys are as follows:

• Dept_no is a foreign key in SALESPERSON relation.

• Account# is a foreign key in EXPENSE relation.

Comment
Chapter 5, Problem 16E

Problem

Consider the following relations for a database that keeps track of student enrollment in courses
and the books adopted for each course:

STUDENT(Ssn, Name, Major, Bdate)

COURSE(Course#, Cname, Dept)

ENROLL(Ssn, Course#, Quarter, Grade)

BOOK ADOPTION(Course#, Quarter, Book_isbn)

TEXT(Book_isbn, Book_title, Publisher, Author)

Specify the foreign keys for this schema, stating any assumptions you make.

Step-by-step solution

Step 1 of 2

A foreign key is a column or composite of columns which is/are a primary key of other table that
is used to maintain relationship between two tables.

• A foreign key is mainly used for establishing relationship between two tables.

• A table can have more than one foreign key.

Comment

Step 2 of 2

The foreign keys in the given relations are as follows:

• Ssn is a foreign key in ENROLL table which references the Ssn of STUDENT table . Ssn is a
primary key in STUDENT table.

• Course# is a foreign key in ENROLL table which references the Course# of COURSE table .
Course#is a primary key in COURSE table.

• Course# is a foreign key in BOOK_ADOPTION table which references the Course# of


COURSE table . Course# is a primary key in COURSE table.

• Book_isbn is a foreign key in BOOK_ADOPTION table which references the Book_isbn of


TEXT table . Book_isbn is a primary key in TEXT table.

Comment
Chapter 5, Problem 17E

Problem

Consider the following relations for a database that keeps track of automobile sales in a car
dealership (OPTION refers to some optional equipment installed on an automobile):

CAR(Serial no, Model, Manufacturer, Price)

OPTION(Serial_no, Option_name, Price)

SALE(Salesperson_id, Serial_no, Date, Sale_price)

SALESPERSON(Salesperson_id, Name, Phone)

First, specify the foreign keys for this schema, stating any assumptions you make. Next, populate
the relations with a few sample tuples, and then give an example of an insertion in the SALE and
SALESPERSON relations that violates the referential integrity constraints and of another
insertion that does not.

Step-by-step solution

Step 1 of 4

Foreign keys are:

a. Serial_no from OPTION is FK for CAR: spare parts can be added to cars with serial number.

b. Serial_no from is FK for CAR:only car with serial number can be put to sale.

c. Salesperson_id from is FK for SALESPERSON: salesperson can sell any car.

Comments (2)

Step 2 of 4

Consider a relation schema state:

CAR:

Serial_no Model Manufacturer Price(lakh)

1 1987 ford 7

2 1998 Tata 4

3 1988 Ferrari 20

4 1952 Ford 2

Serial_no Option_name Price

2 Abc 200

4 def 400

OPTION:

Comment

Step 3 of 4

SALESPERSON:

Saleperson_id Name Phone

Sl1 Ram 9910101010

Sl2 John 9999999999

Sl3 Mario 9090909090

Saleperson_id Serial_no Date Sale_price(lakh)

Sl1 1 2000-6-07 7.5


Sl2 2 2000-6-08 4.1

Comment

Step 4 of 4

Insertion in that violates Referential Integrity constraint:

Insert <’Sl4’, ‘5’,’2000-07-07’,’21’> into

Invalid Saleperson_id and Serial_no.

Insertion in that does not violates Referential Integrity constraint:

Insert < ’Sl1’,’4’,’2000-09-07’,’2.1’> into

Insertion in SALESPERSON can not violate Referential Integrity constraint. A valid insertion for
SALESPERSON can be:

Insert <’Sl4’, ‘Jack’,’9190000000’> into SALESPERSON.

Comment
Chapter 5, Problem 18E

Problem

Database design often involves decisions about the storage of attributes. For example, a Social
Security number can be stored as one attribute or split into three attributes (one for each of the
three hyphen-delineated groups of numbers in a Social Security number—XXX-XX-XXXX).
However, Social Security numbers are usually represented as just one attribute. The decision is
based on how the database will be used. This exercise asks you to think about specific situations
where dividing the SSN is useful.

Step-by-step solution

Step 1 of 2

Usually during the database design, the social security number (SSN) is stored as single
attribute.

• SSN is made up of 9 digits divided into three parts.

• The format of SSN is XXX-XX-XXXX.

• Each part is separated by a hyphen.

• The first part represents the area number.

• The second part represents the group number.

• The third part represents the serial number.

Comment

Step 2 of 2

The situations where it is preferred to store the SSN as parts instead of as a single attribute is as
follows:

• Area number determines the location or state. In some cases, it is necessary to group the data
based on the location to generate some statistical information.

• The area code (or city code) is required and sometimes country code is needed for dialing the
international phone numbers.

• Every part has its own independent existence.

Comment
Chapter 5, Problem 19E

Problem

Consider a STUDENT relation in a UNIVERSITY database with the following attributes (Name,
Ssn, Local_phone, Address, Cell_phone, Age, Gpa). Note that the cell phone may be from a
different city and state (or province) from the local phone. A possible tuple of the relation is
shown below:

Name Ssn Local_phone Address Cell_phone Age Gpa

123 Main St.,


George Shaw William 123-45-
555-1234 Anytown, CA 555-4321 19 3.75
Edwards 6789
94539

a. Identify the critical missing information from the Local_phone and Cell_phone attributes. (Hint:
How do you call someone who lives in a different state or province?)

b. Would you store this additional information in the Local_phone and Cell_phone attributes or
add new attributes to the schema for STUDENT?

c. Consider the Name attribute. What are the advantages and disadvantages of splitting this field
from one attribute into three attributes (first name, middle name, and last name)?

d. What general guideline would you recommend for deciding when to store information in a
single attribute and when to split the information?

e. Suppose the student can have between 0 and 5 phones. Suggest two different designs that
allow this type of information.

Step-by-step solution

Step 1 of 5

a. State, province or city code is missing from phone number information.

Comment

Step 2 of 5

b. Since cell phone and local phone can be of different city or state, additional information must
be added in Local_phone and Cell_phone attributes.

Comment

Step 3 of 5

c. If Name is Split in First_name, Middle_name and Last_name attributes there can be following
advantages:

• Sorting can be done on basis of First Name or Last Name or Middle Name.

Disadvantages:

• By splitting single attribute into three attributes NULL values may increase in database. (If few
students don’t have a Middle Name.)

• Extra Memory will be consumed for storing NULL values of attributes that may not exist for a
particular student. (Middle Name).

Comment

Step 4 of 5

d. To decide when to store information in single attribute:

• When storing information in different attributes will create NULL values, single attribute must be
preferred.

• When while using single attribute atomicity can not be maintained, we must use different
attributes.

• When information needs to be sorted on the basis of some Sub-field of and attribute or when
any sub-field is needed for decision making, we must split single attribute into many.

e.
Comment

Step 5 of 5

First Design

• STUDENT(Name, Ssn, Phone_number_count, Address, Age, Gpa)

Phone (Ssn, Phone_number)

Second Design:

• STUDENT(Name, Ssn, Phone_number1, Phone_number2, Phone_number3, Phone_number4,


Phone_number5, Address, Age, Gpa)

Although schema can be designed in either of the two ways but design first is better than second
as it leaves lesser number of NULL values.

Comment
Chapter 5, Problem 20E

Problem

Recent changes in privacy laws have disallowed organizations from using Social Security
numbers to identify individuals unless certain restrictions are satisfied. As a result, most U.S.
universities cannot use SSNs as primary keys (except for financial data). In practice, Student_id,
a unique identifier assigned to every student, is likely to be used as the primary key rather than
SSN since Student_id can be used throughout the system.

a. Some database designers are reluctant to use generated keys (also known as surrogate keys)
for primary keys (such as Student_id) because they are artificial. Can you propose any natural
choices of keys that can be used to identify the student record in a UNIVERSITY database?

b. Suppose that you are able to guarantee uniqueness of a natural key that includes last name.
Are you guaranteed that the last name will not change during the lifetime of the database? If last
name can change, what solutions can you propose for creating a primary key that still includes
last name but remains unique?

c. What are the advantages and disadvantages of using generated (surrogate) keys?

Step-by-step solution

Step 1 of 1

(a)

Some Operation on Students Name and Local and cell phone numbers
(originals) can jointly be used for generating id for student.

For Example:

First name + initials of name+ ‘_’ + last name + ‘_’ + digits of


local_phone_number + sum of digits of cell phone number + ‘_’ + increasing
record counter.

For Example: for record

Let it be 57th entry into the system. We can have unique identifier as:
GeorgeGWE_Edwards_555-123430_57.

Assumptions: Each student has different local_number unless they have same
address and two students with same address will not have same names.

Some hash operations can also be used on various fields for generation of
key.

(b)

In case if natural key uses Last name and as last name can change we can
include a column called original last name. That can be used for identification.

(c)

Advantages of Surrogate keys:

Immutability:

• Surrogate keys do not change while the row exists. This has two advantages:

Database applications won't lose their "handle" on the row because the data changes;

• Many database systems do not support cascading updates of keys across foreign keys of
related tables. This results in difficulty in modifying the primary key data.

Flexibility for changing requirements

Because of changing requirements, the attributes that uniquely identify an entity might change. In
that case, the attribute(s) initially chosen as the natural key will no longer be a suitable natural
key.

Example :

An employee ID is chosen as the natural key of an employee DB. Because of a merger with
another company, new employees from the merged company must be inserted, who have
conflicting IDs (as their IDs were independently generated when the companies were Separate).
In these cases, generally a new attribute must be added to the natural key (e.g. an attribute
"original_company"). With a surrogate key, only the table that defines the surrogate key must be
changed. With natural keys, all tables (and possibly other, related software) that use the natural
key will have to change. More generally, in some problem domains it is simply not clear what
might be a suitable natural key. Surrogate keys avoid problems from choosing a natural key that
later turns out to be incorrect.

Performance
Often surrogate keys are composed of a compact data type, such as a four-byte integer. This
allows the database to query faster than it could multiple columns.

• A non-redundant distribution of keys causes the resulting b-tree index to be completely


balanced.

• If the natural key is a compound key, joining is more expensive as there are multiple columns to
compare. Surrogate keys are always contained in a single column.

Compatibility

Several database application development systems, drivers, and object-relational mapping


systems, such as Ruby on Rails or Hibernate (Java), depend on the use of integer or GUID
surrogate keys in order to support database-system-agnostic operations and object-to-row
mapping.

Disadvantages of surrogate keys:

Disassociation

Because the surrogate key is completely unrelated to the data of the row to
which it is attached, the key is disassociated from that row. Disassociated
keys are unnatural to the application's world, resulting in an additional level of
indirection from which to audit.

Query Optimization

Relational databases assume a unique index is applied to a table's primary


key. The unique index serves two purposes: 1) to enforce entity integrity—
primary key data must be unique across rows—and 2) to quickly search for
rows queried. Since surrogate keys replace a table's identifying attributes—the
natural key—and since the identifying attributes are likely to be those queried,
then the query optimizer is forced to perform a full table scan when fulfilling
likely queries. The remedy to the full table scan is to apply a (non-unique)
index on each of the identifying attributes. However, these additional indexes
will take up disk space, slow down inserts, and slow down deletes.

Normalization

The presence of a surrogate key can result in the database administrator


forgetting to establish, or accidentally removing, a secondary unique index on
the natural key of the table. Without a unique index on the natural key,
duplicate rows are likely to appear and are difficult to identify.

Business Process Modeling

Because surrogate keys are unnatural, flaws can appear when modeling the
business requirements. Business requirements, relying on the natural key,
then need to be translated to the surrogate key.

Inadvertent Disclosure

Proprietary information may be leaked if sequential key generators are used.


By subtracting a previously generated sequential key from a recently
generated sequential key, one could learn the number of rows inserted during
that time period. This could expose, for example, the number of transactions
or new accounts per period. The solution to the inadvertent disclosure
problem is to generate a random primary key. However, a randomly generated
primary key must be queried before assigned to prevent duplication and cause
an insert rejection.

Inadvertent Assumptions

Sequentially generated surrogate keys create the illusion that events with a
higher primary key value occurred after events with a lower primary key value.
This illusion would appear when an event is missed during the normal data
entry process and is, instead, inserted after subsequent events were
previously inserted. The solution to the inadvertent assumption problem is to
generate a random primary key. However, a randomly generated primary key
must be queried before assigned to prevent duplication and cause an insert
rejection.

Comment

You might also like