Database and Data Warehouse Assignment 1
Database and Data Warehouse Assignment 1
ASSIGNMENT Details
Page 1 of 15
Section A
Suppose that we have a relational database, dbdvdclub, with the following tables.
use dbdvdclub;
Page 2 of 15
iii) tblactor_role [3]
use dbdvdclub;
c) i) Using the table tblactor_acts, carefully explain what is meant by saying that
actor_id is a foreign key. [3]
Foreign keys are the columns of a table that points to the candidate key of another
table where a candidate key is a set of attributes that uniquely identify tuples in a table
(Hernandez, 2003). The table that contains the foreign key is called the child table,
and the table that has the primary key is called the referenced or parent table. They act
as a cross-reference between tables. For example, the actor_id column in
the tblactor_acts table is a foreign key as it points to the primary key of the tblactor
table which is actor_id.
ii) Using table tblactor compare and contrast Candidate Key and Alternate Key -
[6]
A candidate key refers to a set of attributes that uniquely identify tuples in a table. A
candidate key refers to a super key which has no repeated attributes. Due to this fact a
primary key must be selected from the candidate keys. Every table must have at least
a single candidate key (Hernandez, 2003). A table may have multiple candidate keys
but only a single primary key. For example, in the tblactor table the actor_id and
national_id are candidate keys which help us to uniquely identify an actor in the
tblactor table.
Page 3 of 15
iii) Using tables a) tblactor_acts and b) tblactor_role differentiate between a
Foreign Key and a Composite Key [6]
Composite key is a key which is composed of two or more attributes that collectively
uniquely identify each record (Kahate, 2004). If we are using the table tblactor_acts,
there will be two keys that developers might identify as candidate keys which are:
Movie_id
Actor_id
one of these attributes will qualify as the primary key. A composite key in this case
would be the combination of two keys, for example the combination of movie_id and
actor_id might qualify as a composite key. If so, then when someone searches using
both the movie_id and actor_id they should only get one single record returned.
d) The DVD rental business with the database above has a policy whereby movies which
are 15 years of age (i.e. 15 years post production) are donated to the society. Write an
SQL statement to retrieve all movies which are due for donations. [3]
e) Write an SQL statement to compute the average age of Actors who acted in movies
which belong to the ‘Horror” genre. [6]
In the genre table (tblgenres) the genre ID for horror movies is 100
Page 4 of 15
f) Write an SQL statement to retrieve actors who have acted in movies and played the
“Main Actor” role. [4]
g) Explain the principle of entity integrity using the table tblactor_role table. [4]
Entity Integrity is concerned with ensuring that the rows in a table have no duplicate
records and that the field that identifies each record within the table are unique and
never null. That is the table should have one column or a set of columns that provides
a unique identifier for the rows of the table (Tylor, 2011). For example, in the
tblactor_role the columns actor_id, movie_id, role_id act as unique identifiers. This
set of columns is referred to as the parent key of the table.
A primary key acts as a unique identifier for rows in the table. Entity Integrity ensures
two properties for primary keys:
- It ensures that the primary key for a row is unique and it does not match
the primary key of any other row in the table.
- It also ensures that the primary key is not null and no component of the
primary key may be set to null.
A system enforces entity integrity by ensuring that any operation that creates a
duplicate primary key or one containing nulls is rejected.
Page 5 of 15
Section B
The commands used to create or modify database objects such as tables, indices and
users are called data definition language (DDL) commands. DDL commands are used
for altering the database structure such as creating new tables or objects along with all
their attributes (data type, table name, etc.) (Casteel, 2016). The most commonly used
DDL statement are CREATE, ALTER, DROP, and TRUNCATE.
For example:
The create command - This command builds a new table as show below
An alter command modifies an existing database table. This command can add up
additional column, drop existing columns and even change the data type of columns
involved in a database table.
For example:
In this example, we added a unique primary key to the table to add a constraint and
enforce a unique value. The constraint “employee_pk” is a primary key and is on the
Employee table.
Page 6 of 15
Data Manipulation Language Example
For example
For example:
b) Relational databases are very effective in situations for which they are appropriate. In
other situations, simpler file-based solutions may be sufficient. Suppose you are required to
implement a system for storing information about a library’s books, borrowers, and loans.
Give FOUR reasons why a database system is superior to a file-based system for this task.
Illustrate the answer with suitable examples. [4]
Data Security
A database system makes it ease for us to apply access constraints so that only
authorized users are able to access the data. Each user has a different set of access
thus data is secured from issues such as identity theft, data leaks and misuse of data
(Shukla, 2020).
Page 7 of 15
Data Searching
A database management system provides inbuilt searching operations making it easier
and faster for users to search for data. User only have to write a small query to
retrieve data from database (Shukla, 2020).
Data Integrity
In some cases some constraints need to be applied on the data before inserting it in
database. A file system does not provide any mechanism or procedure to check these
constraints automatically. Whereas DBMS maintains data integrity by enforcing user
defined constraints on data by itself (Shukla, 2020).
Easy Recovery
database systems are able to keep a backup of data, making it is easy to fully recover
data in case of a failure. However, in the case of file systems because it is not easy to
keep backup once the system crashes, there will be no recovery of the data that has
been lost. A database system normally has a recovery manager which retrieves the
data making it another advantage over file systems (Shukla, 2020).
c) A company wants to move its current file-based system to a database system. In many
ways, this can be seen as a good decision. Identify and describe three disadvantages in
adopting a database approach. [3]
Complexity
Database systems are very complex systems and user training required before user can
use the database system. In order for the database system to run properly, it is very
important for developers, database administrator, designers, and also the end-users to
have a good knowledge of it. If users are not properly trained then this may lead to
data loss or database failure.
Page 8 of 15
3. a) Using relevant examples define the following terms as used in database normalisation
An insertion anomaly occurs as a result of the inability to add data to the database as a
result of the absence of other data (Ricardo, 2004). For example, if a system is
designed to require that a customer be on file before a sale can be made to that
customer, but you cannot add a customer until they have bought something, then you
have an insert anomaly.
An update anomaly refers to data inconsistency that results from data redundancy and
a partial update (Ricardo, 2004). For example, update anomalies happen when the
person charged with the task of keeping all the records current and accurate, is asked,
for example, to change an employee’s title due to a promotion. If the data is stored
redundantly in the same table, and the person misses any of them, then there will be
multiple titles associated with the employee and the end user has no way of knowing
which is the correct title.
A deletion anomaly is the unintended loss of data due to deletion of other data, that is
deletion anomalies happen when the deletion of unwanted information causes desired
information to be deleted as well (Ricardo, 2004). For example, if a single database
record contains information about a particular product along with information about a
salesperson for the company and the salesperson quits, then information about the
product is deleted along with salesperson information.
b) Identify and describe the 8 (eight) major duties of a database administrator. [16]
Database Design
Database Design refers to a set of steps that help with designing, creating,
implementing, and maintaining a business’s database systems. The database
administrator determines what data must be stored and how the data elements
interrelate, that is the database administrator is responsible for producing the physical
and logical models for the proposed database system design.
Performance Monitoring
Database monitoring is a very important part of application maintenance. The ability
to discover database issues on time can ensure that an application remains healthy and
accessible. Without good monitoring in place, database outages or issues can go
unnoticed until it is too late and the business is losing money and customers
Page 9 of 15
Capacity planning
Is the process of determining the production capacity needed by an organization to
meet changing demands for its products. All databases have limits on the amount of
data they can store and the amount of physical memory that they can use. It is
therefore the job of a database administrate to decide the limit and capacity of a
database and all the issues related to it.
Database accessibility
It is the job of database administrator to decide on the accessibility of a database. The
database administrator determines which users can access the database and also which
data the user is able to access. No user has to power to access the entire database
without the permission of database administrator.
Page 10 of 15
4. a) Using examples explain the Data Warehousing ETL Process [13]
The (ETL) process encompasses three steps which are extraction, transformation, and
loading. The ETL process is a process that takes large volumes of raw data from
multiple sources, converts it for analysis, and loads that data into your warehouse
bringing it all together to build a unified source of information for business
intelligence (Tobin, 2020).
The ETL process is a three-step process and the three primary ETL steps are shown
below:
The ETL process is a key design concept used in the design of a data warehouse
architecture. This is because it ensures that all the processes connect seamlessly and
data continues to flow as defined by the business, shaping and modifying itself where
and when needed according to your workflow. Below we take a look steps involve in
the ETL Process.
Step 1 - Extraction
In the first step, data is extracted from the source system into the staging area. Data
may be extracted from multiple sources for example Excel, Pastel, Sage ERP,
Facebook, text files etc into a staging area. The staging area works like a buffer
between the data warehouse and the source data. Since data may be coming from
multiple different sources and it is in various formats, directly transferring the data to
the warehouse may result in corrupted data. The staging area is used for data
cleansing and to organize the data (Guru99, 2021). Hence one needs a logical data
map before data is extracted and loaded physically. This data map describes the
relationship between sources and target data.
Page 11 of 15
For example, we have Chicken Inn which has many shops in Zimbabwe and the
region. Let's say there is Chicken Inn Blantyre Malawi and it has its own system of
saving customer visit and product purchase history and this data is stored in Excel and
the point-of-sale system. As part of the extraction process purchase history data will
be collected from the point-of-sale system and client visit information will be collect
from the excel file.
Step 2 – Transformation
This stage is closely associated with the data extraction stage. This stage is mainly
concerned with converting data to a format that conforms to a standard schema that is
used the data warehouse uses for storage. During this stage data cleaning and
organization processes take place. All the data from multiple source systems will be
normalized and converted to a single system format, improving data quality and
compliance. ETL yields transformed data through these methods (Tobin, 2020):
Filtering – This refers to loading specific attributes into the data warehouse.
Cleaning – replacing NULL values with default values, etc.
Joining – multiple attributes are joined into one.
Splitting – a single attribute is split into multiple attributes.
Sorting – sorting tuples on the basis of some attribute (generally a key-
attribute).
Keeping our chicken Inn example in mind the amount in the point-of-sale system is
capture in Malawi Kwacha and as part of the transformation process the amount
would need to be convert to USD since the information has to be submitted to the
Harare headquarters and all the client names will also be convert to uppercase.
Step 3 – Loading
At this stage the data that has been extracted and transformed is now written into the
target database. Depending upon the requirements of the business, data can be loaded
into the target database in batches or all at once (Tobin, 2020).
1. Refresh: In this case the data in the data warehouse is completely rewritten.
That is older files are completely replaced.
2. Update: In this case only those changes applied to our source information are
added to the Data Warehouse. An update is typically carried out without
deleting or modifying pre-existing data.
Continuing with our Chicken inn example at this stage the data that has been convert
and formatted in the transformation stage is now loaded into the Chicken Inn servers
in Harare for storage and to be used as part of business intelligence by management.
Page 12 of 15
b) Differentiate the following terms as used in Data Warehousing:
Fact table
A fact table is a primary table in a dimensional model that is It is a table that has
values of the attributes of the dimension table. It contains quantitative information in a
denormalized form. It basically contains the data that needs to be analysed. Fact tables
mostly have two columns, one for foreign keys that helps to join them with a
dimension table and others that contains the value or data that needs to be analysed. It
mostly contains numeric data. It grows vertically and it contains more records and
fewer attributes (Pedamkar, 2020).
Dimension Table
In data warehousing, a dimension table refers to a collection of reference information
about a measurable event and such events are known as facts and these are stored in a
fact table. Dimensions categorize and describe data warehouse facts and measures in a
way that supports meaningful answers to business questions. Dimension tables form
the very core of dimensional modelling (Pedamkar, 2020).
Fact table contains measurements, metrics, and facts about a business process while
the Dimension table is a companion to the fact table which contains descriptive
attributes to be used as query constraining.
Fact table is located at the center of a star or snowflake schema, whereas the
Dimension table is located at the edges of the star or snowflake schema.
Fact table is defined by their grain or its most atomic level whereas Dimension table
should be wordy, descriptive, complete, and quality assured.
Fact table helps to store report labels whereas Dimension table contains detailed data.
Fact table does not contain a hierarchy whereas the Dimension table contains
hierarchies.
Page 13 of 15
Online Analytical Processing – OLAP
However, OLAP refers to Online Analytical Processing system. OLAP database
stores historical data that has been inputted by OLTP. It allows a user to view
different summaries of multi-dimensional data. Using OLAP, you can extract
information from a large database and analyse it for decision making (techdifferences,
2018).
Star schema is the simple and common modelling paradigm where the data warehouse
comprises of a fact table with a single table for each dimension. The schema imitates
a star, with dimension table presented in an outspread pattern encircling the central
fact table. The dimensions in fact table are connected to dimension table through
primary key and foreign key (techdifferences, 2018).
Snowflake schema is the kind of the star schema which includes the hierarchical form
of dimensional tables. In this schema, there is a fact table comprise of various
dimension and sub-dimension table connected across through primary and foreign key
to the fact table. It is named as the snowflake because its structure is similar to a
snowflake. The crucial difference between Star schema and snowflake schema is that
star schema does not use normalization whereas snowflake schema uses normalization
to eliminate redundancy of data. Fact and dimension tables are essential requisites for
creating schema (techdifferences, 2018).
Page 14 of 15
List Of References
Hernandez, M.J., (2003). Database Design for Mere Mortals: A Hands-on Guide to
Relational Database Design. Retrieved from https://books.google.co.zw/books?
id=dkxsjXNayHQC&dq=candidate+key&source=gbs_navlinks_s
Kahate, A., (2004). Introduction to Database Management Systems. Pearson Education India.
Taylor, A.G., (2011). SQL for Dummies. John Wiley & Sons.
Casteel, J., (2016). Oracle 12c: SQL. United States: Cengage Learning.
Pedamkar, P., (2020, Dec 12). Data Control language. Retrieved from
https://www.educba.com/data-control-language/
Shukla, S., (2020 Oct 20). Advantages of DBMS over File system. Retrieved from
https://www.geeksforgeeks.org/advantages-of-dbms-over-file-system/
Thakur, S., (2018, May 07). What is DBMS Lets Define DBMS. Retrieved from
https://whatisdbms.com/explain-data-manipulation-language-with-examples-in-dbms/
Tobin, D., (2020, September 28). ETL & Data Warehousing Explained: ETL Tool Basics.
Retrieved From https://www.xplenty.com/blog/etl-data-warehousing-explained-etl-tool-
basics/#:~:text=ETL%20(or%20Extract%2C%20Transform%2C%20Load)%20is%20a%20process,that
%20data%20into%20your%20warehouse.
Guru99. (2020, December 12). ETL (Extract, Transform, and Load) Process in Data
Warehouse. Retrieved from https://www.guru99.com/etl-extract-load-process.html
Tech Differences. (2018, December 9). Difference Between OLTP and OLA. Retrieved from
https://techdifferences.com/difference-between-oltp-and-olap.html#KeyDifferences
*****END OF PAPER*****
Page 15 of 15