Section 1 - ER Modelling: Figure 1: Australian Medicare Card (DHHS, Australia)
Section 1 - ER Modelling: Figure 1: Australian Medicare Card (DHHS, Australia)
The Australian Department of Health and Human Services (DHHS) has a requirement to store
the vaccination record of every individual (known as a patient) that resides in Australia who is
eligible for a Medicare card. For families, children under the age of 15 are stored on the family
card (refer Figure 1). Each Medicare card number is associated with only one family. Every
Medicare card has a 'valid to' date stored as a month and year on the card (refer Figure 1).
Each family member holds a position number on the card. For example to identify Jessica
Smith (refer Figure 1) both her Medicare number (1234567890) and position (4) on the
Medicare card would be required.
Each Medicare card is attached to one residential address, contact email, and phone number.
For all patients listed on a Medicare card we record their gender, birthdate, first name, last
name, and if they have any known allergies (e.g. Penicillin, Cortisone, Codeine). If patients do
have an allergy we need to know what the allergy is and the reaction (severe, moderate or
mild).
The DHHS needs to record mandatory vaccinations (e.g. Measles, Polio, Whooping Cough,
Diphtheria, Tuberculosis and Tetanus) as well as optional vaccination types (e.g. HPV, Flu,
Hepatitis A, Hepatitis B, Cholera, Typhoid, Yellow Fever).
For each vaccination event that is given to patients we must record the vaccination type, date
of vaccination and the vaccination batch number. Vaccine producers can produce many
different types of vaccines and each vaccine can have many batches.
Patients can receive their vaccination from any registered doctor. Every doctor is identified by
a unique medical practitioner number (MPN). We record the Medical Practitioner's title (Dr,
Mr, Mrs, Ms, Prof.), first name, last name, registered business address, email, and business
phone numbers.
Q1. Draw a conceptual model in either Crow’s Foot or Chen’s notation for this case study
(in your script book). Be sure to write down any assumptions you make.
(20 marks)
Q2. Write SQL statements to create the tables for the above data model. Be sure to specify
primary and foreign keys. You do not need to specify whether the fields are NULL/NOT NULL.
Choose appropriate data types for attributes that are not obvious.
(5 Marks)
Write a single SQL statement to correctly answer each of the following questions (3A – 3D).
DO NOT USE VIEWS or VARIABLES to answer questions. Query nesting is allowed.
FK
employees (empid, lastname, firstname, hiredate, address, phone, managerid )
FK FK
orders (orderid, custid , empid , orderdate, shippeddate, freight, shipname)
FK FK
orderdetails (orderid , productid , quantity, discount)
Q3A. Write a query that returns customers (company names) and the details of their orders
(orderid and orderdate), including customers who placed no orders.
(3 marks)
Q3B. Write a query that returns the first name and last name of employees whose manager
was hired prior to 01/01/2002.
(4 marks)
Q3C. Write a query that returns customers whose company name is ‘Google’, and for each
customer return the total number of orders and total quantities for all products that were not
discontinued (‘1’ means discontinued, ‘0’ not discontinued).
(5 marks)
Q3D. Write a query that returns the ID and company name of customers who placed orders
in 2007 but not in 2008.
(8 marks)
SELECT *
FROM Employees NATURAL JOIN Orders
WHERE freight > 1000;
There are 502 buffer pages available in memory. Both relations are stored as simple heap
files. Neither relation has any indexes built on it.
Q4A. What is the cost (in disk I/O’s) of performing this join using the Block-oriented Nested
Loops Join algorithm? Provide the formulae you use to calculate your cost estimate.
(3 marks)
Q4B. What is the cost (in disk I/O’s) of performing this join using the Hash Join algorithm?
Provide the formulae you use to calculate your cost estimate.
(3 marks)
Q4C. In comparing the cost of different algorithms, we count I/O (page accesses) and ignore
all other costs. What is the reason behind this approach?
(2 marks)
Q4D. Which approach should be the least expensive for the given buffer size of 502 pages:
1. Simple Nested Loops Join
2. Page-oriented Nested Loops Join
3. Block-oriented Nested Loops Join
4. Hash Join
Please write the number of the correct response. No need to provide formulae for question
4D.
(2 marks)
SELECT *
FROM OrderDetails
WHERE quantity > 15 AND discount > 90;
Your job is to analyse the following query plans and estimate the cost of the best plan utilizing
the information given about different indexes in each part.
Q5A. Compute the estimated result size of the query, and the reduction factor of each filter.
(3 marks)
Q5B. Compute the estimated cost of the best plan assuming that a clustered B+ tree index on
quantity is (the only index) available. Suppose there are 200 index pages.
(3 marks)
Q5C. Compute the estimated cost of the best plan assuming that an unclustered Hash index
on discount is (the only index) available.
(2 marks)
Q5D. If you are given complete freedom to create one index to speed up this query, which
index would be the best one to answer this query? Please give complete information about
the index, e.g. is it clustered or unclustered, is it hash or B+ tree, which attributes will it cover.
(2 marks)
SELECT *
FROM Employees AS E, Orders AS O, OrderDetails AS OD
WHERE E.empid = O.empid AND O.orderid = OD.orderid;
Q6A. Compute the cost of the plan shown below. NLJ is a Page-oriented Nested Loops Join.
Assume that empid is the candidate key of Employees, orderid is the candidate key of Orders,
and 100 tuples of a resulting join between Employees and Orders can fit on one page.
(4 marks)
Q6B. Would the plan presented below be a valid candidate that System R would consider and
compute cost for during query optimisation? Why?
(3 marks)
Q6C. Consider the query presented below. Does the following equivalence class hold?
Yes/No and Why?
(3 marks)
FK
TableName (PrimaryKey, Column, ForeignKey )
AnotherTable (PrimaryKey, Column, AnotherColumn)
Item ID is the candidate key for this table. Item ID determines Description, Quan, Cost/Unit
and Dept, while Dept determines Dept Name and Dept Head.
Inventory
Item ID Description Dept Dept Name Dept Head Quan Cost/Unit Value
4011 5 ft desk MK Marketing Jane Thompson 5 200 1000
4020 File cabinet MK Marketing Jane Thompson 10 75 750
4005 Executive chair MK Marketing Jane Thompson 5 100 500
4036 5 ft desk ENG Engineering Ahmad Rashere 7 200 1400
(10 marks)
Q8A. Draw a star schema to support the design of this data warehouse, showing the attributes
in each table. Use PK to denote primary key, PFK to denote primary foreign key, and FK to
denote foreign key. You do not need to specify data types nor whether the fields are
NULL/NOT NULL.
(8 marks)
Q8B. Why are star schemas preferred over relational database designs to support decision
making?
(2 marks)
(5 marks)
Q9B. Vine is a social media sharing service where users can host 6-second video clips within
multiple categories (e.g. “Comedy”, “Science”, “Social”). Part of the database schema for the
Vine service is given below.
INT 4
DATETIME 8
There are 15 different categories that users can share videos about and 1 million users to start
with. A user posts 5 videos on average per month. Assume that the average storage
requirement for the BLOB data type is 20,000 bytes.
Estimate the disk space requirements only for the Video table at go-live and after one month
of operation.
(5 marks)
Q10A. Which of the following is NOT part of the database development lifecycle?
A) Implementation
B) Maintenance
C) Requirement analysis
D) First-level support
Q10B. A relation which is NOT in the conceptual and logical models but is made available to
users is a:
A) Data type
B) View
C) Revoke
D) Grant
Q10D. Which of the following is NOT a true statement about data and information?
A) Dates and audio are two examples of data
B) Information has been processed so that it increases the audience’s knowledge
C) Information is used to create data
D) The term “data” refers to raw facts
Q10F. Which of the following is NOT one of the basic operations of Relational Algebra?
A) Set-difference
B) Cross-product
C) Equality
D) Union
Q10J. Which of the following is NOT a valid type of logical database backup?
A) Incremental backup
B) Online backup
C) Onsite backup
D) Offline backup
Q10O. Which of the following is NOT true about choosing data types?
A) May help improve query optimisation
B) Help DBMS store and use information efficiently
C) Help minimise storage space
D) May help improve data integrity
END OF THE EXAM