Database Resit
Database Resit
Submission Time and Date: Friday 22nd January 2021 by 18:00 GMT
Date by which Work and 23rd February 2021
Feedback will be returned
to Students:
Weighting This coursework accounts for 40% of the total marks for
this module
Group No
Instructions on Submission:
1
Assessment # 2 Submission Template
Advanced Databases (KL7011)
Assignment Questions
You must submit all the SQL queries and any other code that you wrote in
answering any of the tasks / questions (e.g., the use of Explain Plan
statements for the queries and their outputs using Spooling or other suitable
means).
(A) Study the index definitions in sh_idx.sql. Discuss in detail (using cost-based
analysis) why these indexes (choose three different ones) are useful for
answering queries over the SH2 and DWU versions of the database. You
should not run the sh_idx.sql script at all.
(9 marks)
Answer Part 1 (A)
Provide the details of the 3 indexes you are going to compare their performance
impact on SH2 (i.e., name the indexes and on which tables those indexes were
created in SH2, these indexes must not exist in your DWU version) (1 Mark):
Provide the 3 SQL queries you are going to run to compare the performance impact of
the above 3 Indexes on SH2 and the version of the same queries on DWU (3 marks):
Provide Explain Plan statements & outputs for the above 3 SQL queries you have run
to compare the performance impact of those 3 Indexes on SH2 and their version of
the same queries on DWU (3 marks):
2
Assessment # 2 Submission Template
Advanced Databases (KL7011)
i. sales_prod_bix on SH2
Explain plan for
Select * from SH2.Sales where prod_id = 99;
Output:
On DWU192
Explain plan for
Select * from DWU192.Sales where prod_id = 99;
Output:
3
Assessment # 2 Submission Template
Advanced Databases (KL7011)
On DWU192
Explain plan for
Select * from DWU192.sales where promo_id = 1234;
Output:
4
Assessment # 2 Submission Template
Advanced Databases (KL7011)
On DWU192
Explain Plan for
Select * from DWU192.Customers where Cust_year_of_birth =
1990;
Output:
5
Assessment # 2 Submission Template
Advanced Databases (KL7011)
(B) Identify three new indexes and justify why they could be useful. Write the SQL
code for creating these indexes under your DWU account. Give example
queries with cost-based analysis for both DWU account (which will have the
new indexes) and SH2 shared schema (which will NOT have any of your new
indexes). Alternatively, you may choose to run the same queries on your DWU
account before and after creating your proposed three indexes.
(9 marks)
6
Assessment # 2 Submission Template
Advanced Databases (KL7011)
Provide the SQL Code for the 3 new indexes you have created on your DWU database
for comparing their performance impact on DWU (i.e., these indexes must not exist in
SH2) (3 Marks):
i. Customer_email_ix
CREATE index customer_email_ix ON
DWU192.Customers(cust_email)
NOLOGGING COMPUTE STATISTICS;
ii. Customer_city_ix
CREATE index customer_city_ix ON
DWU192.Customers(cust_city)
NOLOGGING COMPUTE STATISTICS;
iii. Prod_min_price_ix
CREATE index Prod_min_price_ix ON
DWU192.Products(Prod_min_price)
NOLOGGING COMPUTE STATISTICS;
Provide 3 SQL queries you are going to run to compare the performance impact of
your own 3 new Indexes on DWU and the version of the same queries on SH2 (3
marks):
i. Customer_email_ix
7
Assessment # 2 Submission Template
Advanced Databases (KL7011)
iii. Prod_min_price_ix
Select * from DWU192.Products WHERE Prod_min_price =
‘5’;
8
Assessment # 2 Submission Template
Advanced Databases (KL7011)
Provide Explain Plan statements & outputs for the above 3 SQL queries you have run
to compare the performance impact of your 3 Indexes on DWU and their version of the
same queries on SH2 together with brief discussion on how the cost figures compare
(3 marks):
i. Explain plan for Customer_email_ix
Select * from DWU192.Customers WHERE cust_email=
‘[email protected]’;
Plan output After Indexing:
9
Assessment # 2 Submission Template
Advanced Databases (KL7011)
10
Assessment # 2 Submission Template
Advanced Databases (KL7011)
(8 marks)
Provide 2 SQL queries you are going to run to compare the performance impact of the
2 MVs in SH2 and the version of the same queries on DWU (2 marks):
i. Calendar_month_desc_mv on SH2
SELECT* FROM SH2.cal_month_desc_mv;
11
Assessment # 2 Submission Template
Advanced Databases (KL7011)
12
Assessment # 2 Submission Template
Advanced Databases (KL7011)
Provide Explain Plan statements & outputs for the above 2 SQL queries you have run
to compare the performance impact of those 2 MVs in SH2 and their version of the
same queries on DWU (3 marks):
i. Explain Plan Statement for Calendar_month_sales_mv on SH2 schema.
EXPLAIN PLAN FOR
SELECT* FROM SH2.cal_month_sales_mv;
Plan Output:
13
Assessment # 2 Submission Template
Advanced Databases (KL7011)
Plan Output:
14
Assessment # 2 Submission Template
Advanced Databases (KL7011)
Provide Discussion of the cost-based comparison of the above 2 sets of queries and
their explain plan cost figures (3 marks):
i. In the materialized view comparison of cal_month_sales_mv, the run-time for
the query on SH2 schema is 3 and on the DWU192 schema it is 525.
15
Assessment # 2 Submission Template
Advanced Databases (KL7011)
16
Assessment # 2 Submission Template
Advanced Databases (KL7011)
(D) Identify three new MVs based on the base tables in the SH schema under your
DWU account and justify why they would be useful for the users of your data
warehouse. Write the SQL code for creating these MVs. Moreover, run sample
queries on both SH2 and DWU to ensure that queries running on DWU will be
re-written by Oracle to use your proposed three MVs instead of the base tables
used in the sample queries. Note that you must not query your MVs directly in
the FROM clause; let the Oracle Query Optimizer re-write the queries and
answer them using your proposed MVs.
(12 marks)
Provide SQL code you used to create the 3 new MVs you created in your own DWU
database (i.e., these MVs must not exist in SH2) (3 marks):
i. Sales_Per_Channel_Mv
CREATE MATERIALIZED VIEW Sales_Per_Channel_Mv
ENABLE QUERY REWRITE
AS
SELECT DWU192.Channels.Channel_Desc,
Count (*) FROM DWU192.SALES INNER JOIN DWU192.CHANNELS
ON DWU192.SALES.CHANNEL_ID = DWU192.Channels.CHANNEL_ID
GROUP BY DWU192.Channels.Channel_Desc;
ii. Customers_Per_city_Mv
CREATE MATERIALIZED VIEW Customers_Per_city_Mv
ENABLE QUERY REWRITE
AS
SELECT Cust_city,
Count(*) FROM DWU192.Customers
GROUP BY cust_city;
iii. Customers_Per_gender_Mv
CREATE MATERIALIZED VIEW Customers_Per_gender_Mv
ENABLE QUERY REWRITE
17
Assessment # 2 Submission Template
Advanced Databases (KL7011)
AS
SELECT Cust_gender,
Count(*) FROM DWU192.Customers
GROUP BY cust_gender;
Provide the 3 SQL queries you are going to run to compare the performance impact of
your own 3 new MVs on DWU and the version of the same queries on SH2 (3 marks):
i. SQL query for Sales_Per_Channel_Mv On DWU192 channel:
Select * from Sales_Per_Channel_Mv;
18
Assessment # 2 Submission Template
Advanced Databases (KL7011)
19
Assessment # 2 Submission Template
Advanced Databases (KL7011)
Provide Explain Plan statements & outputs for the above 3 SQL queries you have run
to compare the performance impact of your 3 MVs on DWU and their version of the
same queries on SH2 (3 marks)
20
Assessment # 2 Submission Template
Advanced Databases (KL7011)
21
Assessment # 2 Submission Template
Advanced Databases (KL7011)
22
Assessment # 2 Submission Template
Advanced Databases (KL7011)
23
Assessment # 2 Submission Template
Advanced Databases (KL7011)
Provide Discussion of the cost-based comparison of the above 3 sets of queries and
their explain plan cost figures (3 marks):
ii. The cost-based comparison for the SQL queries implemented to explain
Customers_Per_City_Mv, for DWU192 is 281 and for SH2 is observed as 275.
iii. The cost-based comparison for the SQL queries implemented to explain
Customers_Per_gender_Mv, for DWU192 is 3 and for SH2 is observed as 4.
(E) Prior to the introduction of the special aggregation function CUBE, there was no
possibility to express an aggregation over different levels within a single SQL
statement without using the set operation UNION ALL. Every different
aggregation level needed its own SQL aggregation expression, operating on
the exact same data set n times, once for each of the n different aggregation
levels. With the introduction of CUBE in the recent database systems, Oracle
provided a single SQL command for handling the aggregation over different
levels within a single SQL statement, not only improving the runtime of this
operation but also reducing the number of internal operations necessary to run
the query and reducing the workload on the system.
i. Using CUBE, write an SQL query over the SH schema under your DWU
account involving one fact table (SALES or COSTS) and at least two
dimension tables and at least 3 grouping attributes. Provide output of
successful execution of your query. Provide reasons why your query may
be useful for users of the SH data warehouse.
(3 marks)
CUBE: This is one of the sub-types from Group by statements which let the user to generate
multiple grouping sets in the database.
Syntax:
SELECT
COLUMN C1,
COLUMN C2,
COLUMN C3….
AGGREGATE _ FUNCTION ()
FROM TABLE_NAME
24
Assessment # 2 Submission Template
Advanced Databases (KL7011)
GROUP BY
CUBE (C1, C2, C3);
Example:
SELECT
PROD_ID,
QUANTITY_SOLD,
TIME_ID,
SUM (QUANTITY_SOLD)
FROM
SALES
GROUP BY
CUBE (PROD_ID, QUANTITY_SOLD, TIME_ID);
25
Assessment # 2 Submission Template
Advanced Databases (KL7011)
49980 3 25-DEC-01 3
49980 3 21-DEC-02 3
49980 3 07-NOV-03 3
26
Assessment # 2 Submission Template
Advanced Databases (KL7011)
49980 37 37
49980 37 13-JAN-01 37
49980 42 42
ii. Using set operation UNION ALL (and not CUBE), write an SQL query that
produces the same result as the query in (a) above. Provide output of
successful execution of your query.
(5 marks)
Provide the UNION ALL query, its output / spool result:
OUTPUT:
(4 marks)
27
Assessment # 2 Submission Template
Advanced Databases (KL7011)
Provide Explain Plan statements & outputs for the above 2 SQL queries you
have run to compare the performance of these 2 SQL queries and provide your
discussion of their costs (4 marks):
Explain Plan Statement for CUBE query:
EXPLAIN PLAN FOR
SELECT
PROD_ID,
QUANTITY_SOLD,
TIME_ID,
SUM (QUANTITY_SOLD)
FROM
SALES
GROUP BY
CUBE (PROD_ID, QUANTITY_SOLD, TIME_ID);
Output:
Output:
28
Assessment # 2 Submission Template
Advanced Databases (KL7011)
Jessica is the customers relation manager at UniTel. She wants to know the
possibility of potential churn of the company’s customers based on previous
experience, so she may be able take some actions accordingly to retain their
customers.
29
Assessment # 2 Submission Template
Advanced Databases (KL7011)
verify the model accuracy. Data of all the columns are used to set up the model. To
meet the requirement, many algorithms can be selected.
Oracle Data Mining (ODM) provides the following algorithms for classification:
Decision Tree
Naive Bayes
Generalized Linear Models (GLM)
Support Vector Machines (SVM)
If you have used PL/SQL API or RODM (R package for Interfacing ODM) then
provide here all the code you have used for this part including spool file contents /
outputs make sure that the output shows both the code and result / output when the
code has been executed. Hint: Use SET ECHO ON and SET SERVEROUTPUT ON.
Alternatively, if you have used the SQL-Developer’s Data Miner Workflows option,
then provide all the relevant screenshots making sure that each and every
screenshot shows your DMU username for your group, e.g., see the example
screenshots in Appendix 4.
In this part we are going to analyze the data of a Telecom company UniTel from the table CUSTCHURN_FEB.
In this table the meta data available is:
GENDER SENIORCITIZEN
PARTNER DEPENDENTS
TENURE PHONESERVICE
MULTIPLELINES
INTERNETSERVICE
ONLINESECURITY
ONLINEBACKUP
DEVICEPROTECTION
TECHSUPPORT
30
Assessment # 2 Submission Template
Advanced Databases (KL7011)
STREAMINGTV
STREAMINGMOVIES
CONTRACT
PAPERLESSBILLING
MONTHLYCHARGES
TOTALCHARGES
CHURN
CUSTID
From the above data, we are going to perform an analysis using the previous year data on the Oracle in built
database and help Jessica, Relational manager in UniTel’s company to filter the above data and provide
necessary recommendations . To perform this task we are going to use the data miners account in the Oracle
SQL Plus and PL/SQL as the query language. The output is spool copied and pasted in the document.
TENURE COMPARISON:
SQL> SET ECHO ON
SQL> SPOOL ON
SQL> SELECT CUSTID, MONTHLYCHARGES, TOTALCHARGES
2 FROM CUSTCHURN_FEB
3 WHERE TOTALCHARGES> 5000
4 AND MONTHLYCHARGES> = 100
5 AND TENURE < 50
6 ORDER BY TOTALCHARGES, TENURE;
31
Assessment # 2 Submission Template
Advanced Databases (KL7011)
32
Assessment # 2 Submission Template
Advanced Databases (KL7011)
38 rows selected.
Evaluation:
In the above code, we had used the case command and group by clause to filter the given data in the
custchurn_feb. These commands are explained below:
SET ECHO ON:
This command in SQL is mainly used to present the output in Spool mode. This is used to turn output on or off
and does not require any ADR.
SPOOL ON|OFF:
This command in PL/SQL is used to direct the Automatic Diagnostic repository command Interpreter output to
a file.
ORDER BY:
This is a special clause statement that can be only used along with the SELECT statements in the PL/SQL. They
are mainly used to sort the results in the output.
RECOMMENDATIONS:
From the data obtained from above set of queries, we can clearly see the information about the company’s
customer data and recommendations can be made. For Jessica, as a customer relation manager it is
33
Assessment # 2 Submission Template
Advanced Databases (KL7011)
recommended to analyze the obtained data based on the annual charges spent and the tenure information. From
the above table here are the top 5 users with high annual charges:
Part 3:
PL/SQL
Information consists of statistics as well as evidence or through internet and several other news agencies those
who provide enormous relevant information access to share. Throughout summary, Structured Query Language
(SQL) was incorporated decades previously to authenticate users. Various forms of SQL given by multiple
companies are accessible on the internet. Developers are seeing the variant of SQL offered by Google in few
articles and journals.
The programming language established with descriptive statistics through interpretation is
Application Server Interface, or MS SQL Server as brief.
Application Server is a Windows Based Operating System established but controlled by the
firm.
When dealing through regular expressions, SQL and SQL services are configured as different
components in which the Microsoft office has been at the end.
T-SQL or Transact-SQL is indeed available on MS Windows Server and T-SQL is mainly based
on managing transfers.
Because it is a framework built by Google, it really only existed throughout the Google
framework before it had been become usable on Software frameworks in 2016.
The SQL server consists of the following: Database Server, Relational Server, and Storage Server. It is
described as like follows.
1. Database Server
Database management systems are a compilation of disparate information objects under
which some type of modification can indeed be done by the administrator.
The application server has a conceptual framework that the searches can indeed be
done by a client as well as a processing framework that handles data archives;
databases including processes are sometimes used.
Artifacts including causes, displays, processes etc. are both generated but performed
by the sql database.
2. Relational Server
The links between some of the 2 autonomous datasets or inside the specific server were
partnerships. It is contained throughout the shape of someone with a combination of rows
and columns called tables.
This handles the execution of requests, storage handling, maintenance of buffers,
frames, and many others.
It does have a computing application framework the disk.
3. Storage Server
34
Assessment # 2 Submission Template
Advanced Databases (KL7011)
Disadvantages
For large databases, storage systems are not really the ideal setting.
Since information should be processed, converted then installed into another warehouse,
data center software has an aspect of variance.
Data centers will have enormous demand throughout your lifetimes. Expenses for repairs
were massive.
Data Mining
Data Mining is the method of collecting but instead re-organizing understanding of multiple datasets of both the
organization for reasons according to what those computer systems are initially designed towards. It offers a
way of collecting from its inventory of available stored in data warehouses originally undisclosed, reliable data.
Obviously it depends mostly on validity of the research and organization, the data mining methodology is
common for multiple organizations.
Evaluation of Data:
In the given task, we need perform the Data warehouse operations on there are two schemas with names
SH2.CHANNELS and DWU192.CHANNELS. In the given schema the table has the following
channels and attributes in the database:
For SH2.channels:
C CHANNEL_DESC CHANNEL_CLASS
- -------------------- --------------------
35
Assessment # 2 Submission Template
Advanced Databases (KL7011)
For DWU192.CHANNELS:
C CHANNEL_DESC CHANNEL_CLASS
- -------------------- --------------------
S Direct Sales Direct
T Tele Sales Direct
C Catalog Indirect
I Internet Indirect
P Partners Franchise
For the above data, we had used various methods to create different indexes, Materialized Views and explain
plan statements in the Part 1 of this paper. The output is represented in Spool format.
The second part of the given task involves the Oracle Data Mining techniques which are implemented using the
PL/SQL techniques. In this task, we are going to analyze the customer churn data of a telecom company
‘UniTel’ and help Jessica, customer relation manager to take necessary actions based on the previous year data.
The data file is obtained in this part is from CUSTCHURN_FEB.
Bibliography
H. Lenz and A. Shoshani. Proc. Int. Conf. on Scientific and Statistical Database
Management. In Summarizability in OLAP and statistical data bases (pp. 1997, pp.
132–143.). Olympia, WA, .
Inmon, W. H. ( 1996.). Building the Data Warehouse (2nd Edition), . New York: :
Wiley,.
Singh, H. (1999). Interactive Data Warehousing, . Upper Saddle River,: NJ: Prentice-
Hall PTR,.
36
Assessment # 2 Submission Template
Advanced Databases (KL7011)
Appendix 1
You should keep records of your group discussions, meetings and communications
and any group code of conduct you agree. Unless otherwise instructed, you should
keep these as they may be requested as evidence by the module team. It is your
responsibility to try and make the group work. If there are problems during the
assessment you are expected to try and resolve these and the evidence should
show this.
Peer assessment will be used to adjust the marks you will get for assignment 2. You
are asked to think carefully about completing peer assessments. If someone has
contributed little to the assignment, why give them full marks allocated for the entire
assignment? It only diminishes the high standards of other students. Conversely, if
someone has worked really hard and well, they should be rewarded with good
marks. If everyone in the group has been ‘pulling their weight’ as agreed and
expected of them, then marks should shared evenly, in which case, you may decide
not to submit any peer assessment forms. Further guidance is given in Appendix 2.
37
Assessment # 2 Submission Template
Advanced Databases (KL7011)
You should write brief comments justifying the marks you have given your peers and
yourself. This is especially important for any peer mark that is less than 5 or greater
than 8.
Each student to submit the peer assessment form separately on the Blackboard.
Once all the peer assessment forms marks have been recorded, the average peer
assessment mark for the team and each member is calculated.
Initially, the group work is marked out of 100% and then the individual and team
averages are used to determine each individual’s mark.
The following example illustrate how the individual mark is calculated for each
member of the group. The group mark is 60 out of 100 and the average group peer
assessment contribution mark is 8 out of 10.
A gets 9/8*60 = 67.5 / 100 B gets 7/8 * 60 = 52.5 / 100 C gets 6/8 * 60 = 45 / 100
Student A is rewarded for the extra effort put in, which is recognised by his/her
peers.
38
Assessment # 2 Submission Template
Advanced Databases (KL7011)
This table is intended for guidance rather than as a definitive document. All the
categories are ranges and you should use your guidance to position individuals
within these ranges. It is suggested that you mark in the three categories and
average the final mark. This may be a weighted average depending on your views of
the relative importance for the specific assignment.
Remember that not everybody in a team can be or should be a leader or have a
specifically defined role. This depends on how you as a team agreed to organise
yourselves and share the work.
39
Assessment # 2 Submission Template
Advanced Databases (KL7011)
Appendix 3
Group _________________________________________________
Write down the names of each group member (including yourself) and alongside
give a score out of 10. You are advised to read through the information on ‘Peer
Assessment’ and ‘How the Formula is Calculated’ before commencing.
You are expected to comment on each mark (including your own) to explain why you
have given it. However, it is compulsory for you to make a comment if you have
given any member a peer assessment mark that is either less than 5 or greater than
8.
Page 40 of 43
Assessment # 2 Submission Template
Advanced Databases (KL7011)
Page 41 of 43
Assessment # 2 Submission Template
Advanced Databases (KL7011)
Page 42 of 43
Assessment # 2 Submission Template
Advanced Databases (KL7011)
Page 43 of 43