0% found this document useful (0 votes)
100 views43 pages

Database Resit

This document provides a template for submitting Assessment #2 for the Advanced Databases module. It includes instructions for submission, the assignment questions, and appendices. The assignment involves data warehousing tasks including analyzing indexes on the SH2 database, creating new indexes on the DWU192 database and analyzing their performance impact, and discussing the usefulness of materialized views on the SH2 database through cost-based analysis of sample queries. Students are asked to submit SQL queries and code to answer the questions.

Uploaded by

V M
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
100 views43 pages

Database Resit

This document provides a template for submitting Assessment #2 for the Advanced Databases module. It includes instructions for submission, the assignment questions, and appendices. The assignment involves data warehousing tasks including analyzing indexes on the SH2 database, creating new indexes on the DWU192 database and analyzing their performance impact, and discussing the usefulness of materialized views on the SH2 database through cost-based analysis of sample queries. Students are asked to submit SQL queries and code to answer the questions.

Uploaded by

V M
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 43

Assessment # 2 Submission Template

Advanced Databases (KL7011)

Programmes: MSc Advanced Computer Science / MSc Data Science

Module Code: KL7011

Module Title: Advanced Databases

Distributed on: Monday 7th December 2020

Submission Time and Date: Friday 22nd January 2021 by 18:00 GMT
Date by which Work and 23rd February 2021
Feedback will be returned
to Students:
Weighting This coursework accounts for 40% of the total marks for
this module

Group Work This assessment is designed to be undertaken by a group


comprising THREE students

Student IDs/Oracle DWU192


Username (DWUs
and
DMU)

Names of students in the


Group

Group No

Instructions on Submission:

 ONLY ONE submission is required for each group to be submitted on


Blackboard.
 The names of students in the group must be provided in the above box and must
match with the group no and names already agreed on the Google doc.
 Marks allocated for your submission will be shared equally by all the students
within the group (a max of 3 members per group). However, if some members
have not contributed to the assignment as agreed and expected of them, then a
peer-assessment form should be filled and submitted on the Blackboard by each
member of the group. See Appendixes 1, 2 and 3.

1
Assessment # 2 Submission Template
Advanced Databases (KL7011)

Assignment Questions

Part 1: Data Warehousing Tasks (50 Marks)

This part is based on the Sales History scenario as described in Appendix 1.

You must submit all the SQL queries and any other code that you wrote in
answering any of the tasks / questions (e.g., the use of Explain Plan
statements for the queries and their outputs using Spooling or other suitable
means).

(A) Study the index definitions in sh_idx.sql. Discuss in detail (using cost-based
analysis) why these indexes (choose three different ones) are useful for
answering queries over the SH2 and DWU versions of the database. You
should not run the sh_idx.sql script at all.
(9 marks)
Answer Part 1 (A)
Provide the details of the 3 indexes you are going to compare their performance
impact on SH2 (i.e., name the indexes and on which tables those indexes were
created in SH2, these indexes must not exist in your DWU version) (1 Mark):

Provide the 3 SQL queries you are going to run to compare the performance impact of
the above 3 Indexes on SH2 and the version of the same queries on DWU (3 marks):

Provide Explain Plan statements & outputs for the above 3 SQL queries you have run
to compare the performance impact of those 3 Indexes on SH2 and their version of
the same queries on DWU (3 marks):

2
Assessment # 2 Submission Template
Advanced Databases (KL7011)

i. sales_prod_bix on SH2
Explain plan for
Select * from SH2.Sales where prod_id = 99;
Output:

On DWU192
Explain plan for
Select * from DWU192.Sales where prod_id = 99;
Output:

3
Assessment # 2 Submission Template
Advanced Databases (KL7011)

ii. sales_promo_bix On SH2


Explain plan for
Select * from SH2.sales where promo_id = 1234;
Plan output

On DWU192
Explain plan for
Select * from DWU192.sales where promo_id = 1234;
Output:

4
Assessment # 2 Submission Template
Advanced Databases (KL7011)

iii. Customer_year_of_birth_bix On SH2


Explain Plan for
Select * from SH2.Customers where Cust_year_of_birth = 1990;
Output:

On DWU192
Explain Plan for
Select * from DWU192.Customers where Cust_year_of_birth =
1990;
Output:

5
Assessment # 2 Submission Template
Advanced Databases (KL7011)

Provide Discussion of the cost-based comparison of the above 3 sets of


queries and their explain plan cost figures (2 marks):
i. Here is the cost-based comparison for the sales_prod_bix where the execution
cost is identified as 60 on SH2 schema and execution cost on DWU192 is
identified as 3280.
ii. Here is the comparison for sales_promo_bix, where the execution cost is
identified as 111 on SH2 schema and execution cost on DWU192 is identified as
3282.
iii. Here is the comparison for customers_year_of_birth_bix, where the
execution cost is identified as 128 on SH2 schema and execution cost on
DWU192 is identified as 273.

(B) Identify three new indexes and justify why they could be useful. Write the SQL
code for creating these indexes under your DWU account. Give example
queries with cost-based analysis for both DWU account (which will have the
new indexes) and SH2 shared schema (which will NOT have any of your new
indexes). Alternatively, you may choose to run the same queries on your DWU
account before and after creating your proposed three indexes.

(9 marks)

Answer Part 1 (B)

6
Assessment # 2 Submission Template
Advanced Databases (KL7011)

Provide the SQL Code for the 3 new indexes you have created on your DWU database
for comparing their performance impact on DWU (i.e., these indexes must not exist in
SH2) (3 Marks):
i. Customer_email_ix
CREATE index customer_email_ix ON
DWU192.Customers(cust_email)
NOLOGGING COMPUTE STATISTICS;

ii. Customer_city_ix
CREATE index customer_city_ix ON
DWU192.Customers(cust_city)
NOLOGGING COMPUTE STATISTICS;

iii. Prod_min_price_ix
CREATE index Prod_min_price_ix ON
DWU192.Products(Prod_min_price)
NOLOGGING COMPUTE STATISTICS;

Provide 3 SQL queries you are going to run to compare the performance impact of
your own 3 new Indexes on DWU and the version of the same queries on SH2 (3
marks):
i. Customer_email_ix

7
Assessment # 2 Submission Template
Advanced Databases (KL7011)

Select * from DWU192.Customers WHERE cust_email=


[email protected]’;

ii. Customer_city _ix


Select * from DWU192.Customers WHERE cust_city =
‘Evinston’;

iii. Prod_min_price_ix
Select * from DWU192.Products WHERE Prod_min_price =
‘5’;

8
Assessment # 2 Submission Template
Advanced Databases (KL7011)

Provide Explain Plan statements & outputs for the above 3 SQL queries you have run
to compare the performance impact of your 3 Indexes on DWU and their version of the
same queries on SH2 together with brief discussion on how the cost figures compare
(3 marks):
i. Explain plan for Customer_email_ix
Select * from DWU192.Customers WHERE cust_email=
[email protected]’;
Plan output After Indexing:

ii. Explain plan for Customer_city_ix


Explain plan for

9
Assessment # 2 Submission Template
Advanced Databases (KL7011)

Select * from DWU192.Customers WHERE cust_city =


‘Evinston’;
Plan output After Indexing:

iii. Explain plan for Prod_min_price_ix


Explain Plan For
Select * from DWU192.Products WHERE Prod_min_price = ‘5’;
Plan output After Indexing:

Cost Based Discussion


i. The cost-based comparison for the index created on Customer_email_ix, is
observed as 273 before indexing and observed 79 after indexing.
ii. The cost-based comparison for the index created on Customer_city_ix, is
observed as 81 before indexing and observed 46 after indexing.

10
Assessment # 2 Submission Template
Advanced Databases (KL7011)

iii. The cost-based comparison for the index created on Prod_min_price_ix, is


observed as 3 before indexing and observed 4 after indexing.
(C) Given the two materialized views (MVs) defined in sh_cremv.sql, discuss in
detail why these MVs are useful for users of the SH database. You should
provide detailed examples of cost-based analysis, e.g., using Explain Plan for
running sample queries on both SH2 and DWU to illustrate your answer. You
should not run the sh_cremv.sql script at all.

(8 marks)

Answer Part 1 (C)

Provide 2 SQL queries you are going to run to compare the performance impact of the
2 MVs in SH2 and the version of the same queries on DWU (2 marks):
i. Calendar_month_desc_mv on SH2
SELECT* FROM SH2.cal_month_desc_mv;

Calendar month description for DWU192


SELECT
t.calendar_month_desc, sum(s.amount_sold)

11
Assessment # 2 Submission Template
Advanced Databases (KL7011)

FROM sales s, times t


WHERE s.time_id = t.time_id
GROUP BY t.calendar_month_desc;

ii. fweek_pscat_sales_mv On SH2


SELECT * FROM SH2.fweek_pscat_sales_mv;

Weekly sale report on DWU192


EXPLAIN PLAN FOR
SELECT t.week_ending_day,p.prod_subcategory,
sum(s.amount_sold)

12
Assessment # 2 Submission Template
Advanced Databases (KL7011)

FROM sales s, times t, products p


WHERE s.time_id = t.time_id
AND s.prod_id = p.prod_id
GROUP BY t.week_ending_day, p.prod_subcategory

Provide Explain Plan statements & outputs for the above 2 SQL queries you have run
to compare the performance impact of those 2 MVs in SH2 and their version of the
same queries on DWU (3 marks):
i. Explain Plan Statement for Calendar_month_sales_mv on SH2 schema.
EXPLAIN PLAN FOR
SELECT* FROM SH2.cal_month_sales_mv;
Plan Output:

Explain Plan Statement t.calendar_month_desc on DWU192:

EXPLAIN PLAN FOR


SELECT t.calendar_month_desc, sum(s.amount_sold)

13
Assessment # 2 Submission Template
Advanced Databases (KL7011)

FROM DWU192.sales s, DWU192.times t


WHERE s.time_id = t.time_id
GROUP BY t.calendar_month_desc;

Plan Output:

ii. Explain Plan Statement for fweek_pscat_sales_mv on SH2


EXPLAIN PLAN FOR
SELECT * FROM SH2.fweek_pscat_sales_mv;
Explain Plan Output

14
Assessment # 2 Submission Template
Advanced Databases (KL7011)

Explain Plan Statement for fweek_pscat_sales_mv on DWU192:


EXPLAIN PLAN FOR
SELECT t.week_ending_day,
p.prod_subcategory,
sum(s.amount_sold) AS Money,
s.channel_id,
s.promo_id
FROM DWU192.sales s,
DWU192.times t,
DWU192.products p
WHERE s.time_id = t.time_id
AND s.prod_id = p.prod_id
GROUP BY t.week_ending_day,
p.prod_subcategory, s.channel_id, s.promo_id;
Explain Plan Output:

Provide Discussion of the cost-based comparison of the above 2 sets of queries and
their explain plan cost figures (3 marks):
i. In the materialized view comparison of cal_month_sales_mv, the run-time for
the query on SH2 schema is 3 and on the DWU192 schema it is 525.

15
Assessment # 2 Submission Template
Advanced Databases (KL7011)

ii. In the materialized view comparison of fweek_pscat_sales_mv, the run-time


for the query on both SH2 and DWU192 schema is 225.

16
Assessment # 2 Submission Template
Advanced Databases (KL7011)

(D) Identify three new MVs based on the base tables in the SH schema under your
DWU account and justify why they would be useful for the users of your data
warehouse. Write the SQL code for creating these MVs. Moreover, run sample
queries on both SH2 and DWU to ensure that queries running on DWU will be
re-written by Oracle to use your proposed three MVs instead of the base tables
used in the sample queries. Note that you must not query your MVs directly in
the FROM clause; let the Oracle Query Optimizer re-write the queries and
answer them using your proposed MVs.
(12 marks)

Answer Part 1 (D)

Provide SQL code you used to create the 3 new MVs you created in your own DWU
database (i.e., these MVs must not exist in SH2) (3 marks):
i. Sales_Per_Channel_Mv
CREATE MATERIALIZED VIEW Sales_Per_Channel_Mv
ENABLE QUERY REWRITE
AS
SELECT DWU192.Channels.Channel_Desc,
Count (*) FROM DWU192.SALES INNER JOIN DWU192.CHANNELS
ON DWU192.SALES.CHANNEL_ID = DWU192.Channels.CHANNEL_ID
GROUP BY DWU192.Channels.Channel_Desc;

ii. Customers_Per_city_Mv
CREATE MATERIALIZED VIEW Customers_Per_city_Mv
ENABLE QUERY REWRITE
AS
SELECT Cust_city,
Count(*) FROM DWU192.Customers
GROUP BY cust_city;

iii. Customers_Per_gender_Mv
CREATE MATERIALIZED VIEW Customers_Per_gender_Mv
ENABLE QUERY REWRITE

17
Assessment # 2 Submission Template
Advanced Databases (KL7011)

AS
SELECT Cust_gender,
Count(*) FROM DWU192.Customers
GROUP BY cust_gender;

Provide the 3 SQL queries you are going to run to compare the performance impact of
your own 3 new MVs on DWU and the version of the same queries on SH2 (3 marks):
i. SQL query for Sales_Per_Channel_Mv On DWU192 channel:
Select * from Sales_Per_Channel_Mv;

SQL query for Sales_Per_Channel_Mv on SH2 channel:


SELECT Channels.Channel_Desc,
Count (*) FROM SH2.SALES INNER JOIN SH2.CHANNELS ON
SH2.SALES.CHANNEL_ID = SH2.Channels.CHANNEL_ID
GROUP BY Channels.Channel_Desc;

ii. SQL query for Customers_Per_City_Mv On SH2


SELECT Cust_city, Count(*) FROM SH2.Customers
GROUP BY cust_city;

18
Assessment # 2 Submission Template
Advanced Databases (KL7011)

SQL query for Customers_Per_City_Mv On DWU192:


Select * from DWU192.Customers_Per_City_Mv;
Spool output:

CUST_ID CUST_FIRST_NAME CUST_LAST_NAME C


---------- -------------------- ---------------------------------------- -
CUST_YEAR_OF_BIRTH CUST_MARITAL_STATUS CUST_INCOME_LEVEL
------------------ -------------------- ------------------------------
CUST_CREDIT_LIMIT CUST_STREET_ADDRESS
----------------- ----------------------------------------
CUST_CITY CUST_STATE_PROVINCE
------------------------------ ----------------------------------------
CUST_POSTA CO CUST_MAIN_PHONE_NUMBER CUST_EMAIL
---------- -- ------------------------- ------------------------------
Quartzhill CA
89273 US 180-147-5425 [email protected]

CUST_ID CUST_FIRST_NAME CUST_LAST_NAME C


---------- -------------------- ---------------------------------------- -
CUST_YEAR_OF_BIRTH CUST_MARITAL_STATUS CUST_INCOME_LEVEL
------------------ -------------------- ------------------------------
CUST_CREDIT_LIMIT CUST_STREET_ADDRESS
----------------- ----------------------------------------
CUST_CITY CUST_STATE_PROVINCE
------------------------------ ----------------------------------------
CUST_POSTA CO CUST_MAIN_PHONE_NUMBER CUST_EMAIL
---------- -- ------------------------- ------------------------------
489500 Desma Niu M
1975 married J: 190,000 - 249,999

19
Assessment # 2 Submission Template
Advanced Databases (KL7011)

7000 117 North Hennepin Avenue

CUST_ID CUST_FIRST_NAME CUST_LAST_NAME C


---------- -------------------- ---------------------------------------- -
CUST_YEAR_OF_BIRTH CUST_MARITAL_STATUS CUST_INCOME_LEVEL
------------------ -------------------- ------------------------------
CUST_CREDIT_LIMIT CUST_STREET_ADDRESS
----------------- ----------------------------------------
CUST_CITY CUST_STATE_PROVINCE
------------------------------ ----------------------------------------
CUST_POSTA CO CUST_MAIN_PHONE_NUMBER CUST_EMAIL
---------- -- ------------------------- ------------------------------
Cardiff Wales - South Glamorgan
85057 UK 222-517-9641 [email protected]

50000 rows selected.

iii. SQL query for Customers_Per_gender_Mv On SH2


SELECT SH2.customers.cust_gender, Count(*) AS custCount FROM
SH2.Customers
GROUP BY SH2.Customers. cust_gender;

SQL query for Customers_Per_gender_Mv On DWU192


Select * from DWU192.Customers_Per_gender_Mv;

Provide Explain Plan statements & outputs for the above 3 SQL queries you have run
to compare the performance impact of your 3 MVs on DWU and their version of the
same queries on SH2 (3 marks)

20
Assessment # 2 Submission Template
Advanced Databases (KL7011)

i. Explain plan for Sales_Per_Channel_Mv On DWU192 channel:


Explain plan for
Select * from DWU192.Sales_Per_Channel_Mv;
Output:

Explain plan for Sales_Per_Channel_Mv On SH2 channel:


Explain Plan for
SELECT Channels.Channel_Desc,
Count(*) FROM SH2.SALES INNER JOIN SH2.CHANNELS ON
SH2.SALES.CHANNEL_ID = SH2.Channels.CHANNEL_ID
GROUP BY Channels.Channel_Desc;
Output:

21
Assessment # 2 Submission Template
Advanced Databases (KL7011)

ii. Explain plan for Customers_Per_City_Mv on DWU192 channel:


Explain plan for
Select * from DWU192.Customers_Per_City_Mv;
Output:

Explain plan for Customers_Per_City_Mv on SH2 channel:


Explain Plan for
SELECT Cust_city,
Count (*) FROM SH2.Customers
GROUP BY cust_city;
Output:

22
Assessment # 2 Submission Template
Advanced Databases (KL7011)

iii. Explain plan for Customers_Per_gender_Mv On DWU192 Channel:


Explain plan for
Select * from DWU192.Customers_Per_gender_Mv;
Output:

Explain plan for Customers_Per_gender_Mv on SH2 Channel:


Explain plan for
SELECT SH2.customers.cust_gender, Count(*) AS custCount
FROM
SH2.Customers
GROUP BY SH2.Customers. cust_gender;
Output:

23
Assessment # 2 Submission Template
Advanced Databases (KL7011)

Provide Discussion of the cost-based comparison of the above 3 sets of queries and
their explain plan cost figures (3 marks):

i. The cost-based comparison for the SQL queries implemented to explain


Sales_Per_Channel_Mv, for DWU192 is 3 and for SH2 is observed as 135.

ii. The cost-based comparison for the SQL queries implemented to explain
Customers_Per_City_Mv, for DWU192 is 281 and for SH2 is observed as 275.

iii. The cost-based comparison for the SQL queries implemented to explain
Customers_Per_gender_Mv, for DWU192 is 3 and for SH2 is observed as 4.

(E) Prior to the introduction of the special aggregation function CUBE, there was no
possibility to express an aggregation over different levels within a single SQL
statement without using the set operation UNION ALL. Every different
aggregation level needed its own SQL aggregation expression, operating on
the exact same data set n times, once for each of the n different aggregation
levels. With the introduction of CUBE in the recent database systems, Oracle
provided a single SQL command for handling the aggregation over different
levels within a single SQL statement, not only improving the runtime of this
operation but also reducing the number of internal operations necessary to run
the query and reducing the workload on the system.

i. Using CUBE, write an SQL query over the SH schema under your DWU
account involving one fact table (SALES or COSTS) and at least two
dimension tables and at least 3 grouping attributes. Provide output of
successful execution of your query. Provide reasons why your query may
be useful for users of the SH data warehouse.
(3 marks)
CUBE: This is one of the sub-types from Group by statements which let the user to generate
multiple grouping sets in the database.

Syntax:

SELECT
COLUMN C1,
COLUMN C2,
COLUMN C3….
AGGREGATE _ FUNCTION ()
FROM TABLE_NAME

24
Assessment # 2 Submission Template
Advanced Databases (KL7011)

GROUP BY
CUBE (C1, C2, C3);

Example:
SELECT
PROD_ID,
QUANTITY_SOLD,
TIME_ID,
SUM (QUANTITY_SOLD)
FROM
SALES
GROUP BY
CUBE (PROD_ID, QUANTITY_SOLD, TIME_ID);

Output of last tables:

PROD_ID QUANTITY_SOLD TIME_ID SUM(QUANTITY_SOLD)


---------- ------------- --------- ------------------
49980 04-DEC-01 12
49980 07-DEC-01 12
49980 09-DEC-01 47
49980 12-DEC-01 12
49980 19-DEC-01 11
49980 25-DEC-01 3
49980 29-DEC-01 12
49980 01-JAN-02 42
49980 02-OCT-02 11
49980 05-OCT-02 29
49980 08-NOV-02 12

PROD_ID QUANTITY_SOLD TIME_ID SUM(QUANTITY_SOLD)


---------- ------------- --------- ------------------
49980 04-DEC-02 33
49980 05-DEC-02 24
49980 21-DEC-02 3
49980 23-DEC-02 18
49980 25-DEC-02 6
49980 27-DEC-02 28
49980 29-DEC-02 18
49980 01-JAN-03 1
49980 04-JAN-03 2
49980 07-NOV-03 3
49980 1 3

PROD_ID QUANTITY_SOLD TIME_ID SUM(QUANTITY_SOLD)


---------- ------------- --------- ------------------
49980 1 09-NOV-01 1
49980 1 21-NOV-01 1
49980 1 01-JAN-03 1
49980 2 4
49980 2 23-OCT-01 2
49980 2 04-JAN-03 2
49980 3 12
49980 3 24-OCT-01 3

25
Assessment # 2 Submission Template
Advanced Databases (KL7011)

49980 3 25-DEC-01 3
49980 3 21-DEC-02 3
49980 3 07-NOV-03 3

PROD_ID QUANTITY_SOLD TIME_ID SUM(QUANTITY_SOLD)


---------- ------------- --------- ------------------
49980 4 16
49980 4 13-JAN-01 4
49980 4 28-JAN-01 8
49980 4 15-OCT-01 4
49980 6 6
49980 6 25-DEC-02 6
49980 8 8
49980 8 23-JAN-01 8
49980 11 55
49980 11 18-JAN-01 11
49980 11 15-NOV-01 11

PROD_ID QUANTITY_SOLD TIME_ID SUM(QUANTITY_SOLD)


---------- ------------- --------- ------------------
49980 11 27-NOV-01 11
49980 11 19-DEC-01 11
49980 11 02-OCT-02 11
49980 12 72
49980 12 18-JAN-01 12
49980 12 04-DEC-01 12
49980 12 07-DEC-01 12
49980 12 12-DEC-01 12
49980 12 29-DEC-01 12
49980 12 08-NOV-02 12
49980 18 54

PROD_ID QUANTITY_SOLD TIME_ID SUM(QUANTITY_SOLD)


---------- ------------- --------- ------------------
49980 18 16-OCT-01 18
49980 18 23-DEC-02 18
49980 18 29-DEC-02 18
49980 21 21
49980 21 08-JAN-01 21
49980 24 24
49980 24 05-DEC-02 24
49980 26 26
49980 26 23-JAN-01 26
49980 28 28
49980 28 27-DEC-02 28

PROD_ID QUANTITY_SOLD TIME_ID SUM(QUANTITY_SOLD)


---------- ------------- --------- ------------------
49980 29 29
49980 29 05-OCT-02 29
49980 32 32
49980 32 01-NOV-01 32
49980 33 33
49980 33 04-DEC-02 33
49980 34 34
49980 34 20-NOV-01 34

26
Assessment # 2 Submission Template
Advanced Databases (KL7011)

49980 37 37
49980 37 13-JAN-01 37
49980 42 42

PROD_ID QUANTITY_SOLD TIME_ID SUM(QUANTITY_SOLD)


---------- ------------- --------- ------------------
49980 42 01-JAN-02 42
49980 47 47
49980 47 09-DEC-01 47

1984227 rows selected.

ii. Using set operation UNION ALL (and not CUBE), write an SQL query that
produces the same result as the query in (a) above. Provide output of
successful execution of your query.
(5 marks)
Provide the UNION ALL query, its output / spool result:

SELECT PROD_ID, QUANTITY_SOLD


FROM SALES
WHERE QUANTITY_SOLD > 250
UNION ALL
SELECT PROD_ID, UNIT_PRICE
FROM COSTS
WHERE UNIT_PRICE > 100
ORDER BY 1;

OUTPUT:

iii. Using EXPLAIN PLAN, provide a detailed discussion analysing costs of


evaluating the above queries (i.e. with and without ROLLUP).

(4 marks)

27
Assessment # 2 Submission Template
Advanced Databases (KL7011)

Provide Explain Plan statements & outputs for the above 2 SQL queries you
have run to compare the performance of these 2 SQL queries and provide your
discussion of their costs (4 marks):
Explain Plan Statement for CUBE query:
EXPLAIN PLAN FOR
SELECT
PROD_ID,
QUANTITY_SOLD,
TIME_ID,
SUM (QUANTITY_SOLD)
FROM
SALES
GROUP BY
CUBE (PROD_ID, QUANTITY_SOLD, TIME_ID);
Output:

Explain Plan Statement for UNION ALL query:


EXPLAIN PLAN FOR
SELECT PROD_ID, QUANTITY_SOLD
FROM SALES
WHERE QUANTITY_SOLD > 250
UNION ALL
SELECT PROD_ID, UNIT_PRICE
FROM COSTS
WHERE UNIT_PRICE > 100
ORDER BY 1;

Output:

28
Assessment # 2 Submission Template
Advanced Databases (KL7011)

Cost Based Discussion:


The comparison for both Cube and Union all queries are given below and the
execution rates are:
CUBE: 8621
Union All: 6949.

Part 2: Data Mining Tasks (35 Marks)


This part is based on the UniTel scenario as described in Appendix 2. Moreover, you
must use the DMUn Oracle Data Mining Account (where 1 <= n <= 75, e.g., DMU1,
DMU2) allocated to your group.

Jessica is the customers relation manager at UniTel. She wants to know the
possibility of potential churn of the company’s customers based on previous
experience, so she may be able take some actions accordingly to retain their
customers.

To help Jessica in doing her analysis, we need to investigate what could be a


suitable algorithm for solving her problem. The data from last year are used as the
training data and the data of February of this year are taken as the testing data to

29
Assessment # 2 Submission Template
Advanced Databases (KL7011)

verify the model accuracy. Data of all the columns are used to set up the model. To
meet the requirement, many algorithms can be selected.

Oracle Data Mining (ODM) provides the following algorithms for classification:

 Decision Tree
 Naive Bayes
 Generalized Linear Models (GLM)
 Support Vector Machines (SVM)

You are required to perform the following tasks:

1. Using PL/SQL API, SQL-Developer’s Data Miner Workflows or RODM (R


package for Interfacing ODM), develop at least TWO models based on the above
algorithms for the dataset accessible as CUSTOMERCHURN table (15 marks)

If you have used PL/SQL API or RODM (R package for Interfacing ODM) then
provide here all the code you have used for this part including spool file contents /
outputs make sure that the output shows both the code and result / output when the
code has been executed. Hint: Use SET ECHO ON and SET SERVEROUTPUT ON.

Alternatively, if you have used the SQL-Developer’s Data Miner Workflows option,
then provide all the relevant screenshots making sure that each and every
screenshot shows your DMU username for your group, e.g., see the example
screenshots in Appendix 4.
In this part we are going to analyze the data of a Telecom company UniTel from the table CUSTCHURN_FEB.
In this table the meta data available is:
 GENDER SENIORCITIZEN
 PARTNER DEPENDENTS
 TENURE PHONESERVICE
 MULTIPLELINES
 INTERNETSERVICE
 ONLINESECURITY
 ONLINEBACKUP
 DEVICEPROTECTION
 TECHSUPPORT

30
Assessment # 2 Submission Template
Advanced Databases (KL7011)

 STREAMINGTV
 STREAMINGMOVIES
 CONTRACT
 PAPERLESSBILLING
 MONTHLYCHARGES
 TOTALCHARGES
 CHURN
 CUSTID

From the above data, we are going to perform an analysis using the previous year data on the Oracle in built
database and help Jessica, Relational manager in UniTel’s company to filter the above data and provide
necessary recommendations . To perform this task we are going to use the data miners account in the Oracle
SQL Plus and PL/SQL as the query language. The output is spool copied and pasted in the document.

TENURE COMPARISON:
SQL> SET ECHO ON
SQL> SPOOL ON
SQL> SELECT CUSTID, MONTHLYCHARGES, TOTALCHARGES
2 FROM CUSTCHURN_FEB
3 WHERE TOTALCHARGES> 5000
4 AND MONTHLYCHARGES> = 100
5 AND TENURE < 50
6 ORDER BY TOTALCHARGES, TENURE;

CUSTID MONTHLYCHARGES TOTALCHARGES


---------- -------------- ------------
7141 112.2 5031.85
7158 103.7 5036.3
7154 108.1 5067.45
7153 106.1 5082.8
7157 106.65 5168.1

SQL> SPOOL OFF

31
Assessment # 2 Submission Template
Advanced Databases (KL7011)

SQL> SELECT CUSTID, MONTHLYCHARGES, TOTALCHARGES


2 FROM CUSTCHURN_FEB
3 WHERE TOTALCHARGES> 5000
4 AND MONTHLYCHARGES< = 100
5 ORDER BY TOTALCHARGES;

CUSTID MONTHLYCHARGES TOTALCHARGES


---------- -------------- ------------
7258 76.9 5023
7162 99 5038.15
7196 94.45 5124.6
7201 90.45 5229.8
7209 89.55 5231.2
7257 76.75 5233.25
7192 96.8 5283.95
7239 84.3 5289.05
7223 86.7 5309.5
7281 75.1 5336.35
7210 89.9 5450.7

CUSTID MONTHLYCHARGES TOTALCHARGES


---------- -------------- ------------
7256 79.6 5461.45
7274 79.05 5552.5
7265 81.5 5553.25
7234 81.25 5567.55
7195 98.6 5581.05
7222 90.7 5586.45
7233 86.55 5632.55
7272 80.6 5708.2
7203 99.15 5720.95
7220 92.3 5731.45
7228 90.45 5825.5

CUSTID MONTHLYCHARGES TOTALCHARGES


---------- -------------- ------------
7253 88.8 5903.15

32
Assessment # 2 Submission Template
Advanced Databases (KL7011)

7249 89.4 5976.9


7283 84.3 5997.1
7243 93.55 6069.25
7219 96.75 6125.4
7273 86.4 6172
7282 88.6 6201.95
7286 86.65 6224.8
7284 86.05 6309.65
7266 90.65 6322.1
7287 88.55 6362.35

CUSTID MONTHLYCHARGES TOTALCHARGES


---------- -------------- ------------
7238 99.25 6549.45
7288 92 6632.75
7275 97.65 6687.85
7245 99.5 6710.5
7268 97.65 6743.55

38 rows selected.

Evaluation:
In the above code, we had used the case command and group by clause to filter the given data in the
custchurn_feb. These commands are explained below:
SET ECHO ON:
This command in SQL is mainly used to present the output in Spool mode. This is used to turn output on or off
and does not require any ADR.

SPOOL ON|OFF:
This command in PL/SQL is used to direct the Automatic Diagnostic repository command Interpreter output to
a file.

ORDER BY:
This is a special clause statement that can be only used along with the SELECT statements in the PL/SQL. They
are mainly used to sort the results in the output.

RECOMMENDATIONS:
From the data obtained from above set of queries, we can clearly see the information about the company’s
customer data and recommendations can be made. For Jessica, as a customer relation manager it is

33
Assessment # 2 Submission Template
Advanced Databases (KL7011)

recommended to analyze the obtained data based on the annual charges spent and the tenure information. From
the above table here are the top 5 users with high annual charges:

CUSTID MONTHLYCHARGES TOTALCHARGES


---------- -------------- ------------
7238 99.25 6549.45
7288 92 6632.75
7275 97.65 6687.85
7245 99.5 6710.5
7268 97.65 6743.55

Part 3:
PL/SQL
Information consists of statistics as well as evidence or through internet and several other news agencies those
who provide enormous relevant information access to share. Throughout summary, Structured Query Language
(SQL) was incorporated decades previously to authenticate users. Various forms of SQL given by multiple
companies are accessible on the internet. Developers are seeing the variant of SQL offered by Google in few
articles and journals.
 The programming language established with descriptive statistics through interpretation is
Application Server Interface, or MS SQL Server as brief.
 Application Server is a Windows Based Operating System established but controlled by the
firm.
 When dealing through regular expressions, SQL and SQL services are configured as different
components in which the Microsoft office has been at the end.
 T-SQL or Transact-SQL is indeed available on MS Windows Server and T-SQL is mainly based
on managing transfers.
 Because it is a framework built by Google, it really only existed throughout the Google
framework before it had been become usable on Software frameworks in 2016.

The SQL server consists of the following: Database Server, Relational Server, and Storage Server. It is
described as like follows.
1. Database Server
Database management systems are a compilation of disparate information objects under
which some type of modification can indeed be done by the administrator.
 The application server has a conceptual framework that the searches can indeed be
done by a client as well as a processing framework that handles data archives;
databases including processes are sometimes used.
 Artifacts including causes, displays, processes etc. are both generated but performed
by the sql database.

2. Relational Server
The links between some of the 2 autonomous datasets or inside the specific server were
partnerships. It is contained throughout the shape of someone with a combination of rows
and columns called tables.
 This handles the execution of requests, storage handling, maintenance of buffers,
frames, and many others.
 It does have a computing application framework the disk.
3. Storage Server

34
Assessment # 2 Submission Template
Advanced Databases (KL7011)

This focuses through information collection.


Through technologies including servers as well as the Wireless Connection or SAN, this is
achieved.
DATA Warehousing
A means of integrating technologies through analyzing and extract this accessible to consumers inside a
reasonable time frame to contribute significantly
 It is indeed a method of functional or highly complex managerial accounting necessary to
encourage strategic thinking by administrators.
 A database server is indeed a precisely organized file of user information through gathering
data as well as publishing.
 Approach for both the assembly and management of information through multiple
perspectives in order to address market concerns. Therefore, allowing options that have not
been always feasible.

Uses of Data ware housing


Typically, companies continue with the comparatively easy utilization data warehousing. The increasingly
advanced implementation of information warehousing is undergoing a transition. It is possible to discern from
the aforementioned levels with usage of the database system:
Off Line Database of Operations
Throughout this early phase, data warehouses were built through merely transferring the information from an
user interface towards another database at which monitoring computing workload against both the transferred
software doesn't quite influence the output including its operating system.
Advantages
 It offers an "organization" perception of its machine learning including its enterprise to
market consumers by attempting to make comparisons through production, operation,
production and delivery and some other data warehousing specific to customers.
 It brings additional advantage to just the consumers of the business by helping everyone to
obtain reliable guidance as Broadband network is combined through database management.

Disadvantages
 For large databases, storage systems are not really the ideal setting.
 Since information should be processed, converted then installed into another warehouse,
data center software has an aspect of variance.
 Data centers will have enormous demand throughout your lifetimes. Expenses for repairs
were massive.

Data Mining
Data Mining is the method of collecting but instead re-organizing understanding of multiple datasets of both the
organization for reasons according to what those computer systems are initially designed towards. It offers a
way of collecting from its inventory of available stored in data warehouses originally undisclosed, reliable data.
Obviously it depends mostly on validity of the research and organization, the data mining methodology is
common for multiple organizations.
Evaluation of Data:
In the given task, we need perform the Data warehouse operations on there are two schemas with names
SH2.CHANNELS and DWU192.CHANNELS. In the given schema the table has the following
channels and attributes in the database:
For SH2.channels:
C CHANNEL_DESC CHANNEL_CLASS
- -------------------- --------------------

35
Assessment # 2 Submission Template
Advanced Databases (KL7011)

S Direct Sales Direct


T Tele Sales Direct
C Catalog Indirect
I Internet Indirect
P Partners Franchise

For DWU192.CHANNELS:
C CHANNEL_DESC CHANNEL_CLASS
- -------------------- --------------------
S Direct Sales Direct
T Tele Sales Direct
C Catalog Indirect
I Internet Indirect
P Partners Franchise

For the above data, we had used various methods to create different indexes, Materialized Views and explain
plan statements in the Part 1 of this paper. The output is represented in Spool format.
The second part of the given task involves the Oracle Data Mining techniques which are implemented using the
PL/SQL techniques. In this task, we are going to analyze the customer churn data of a telecom company
‘UniTel’ and help Jessica, customer relation manager to take necessary actions based on the previous year data.
The data file is obtained in this part is from CUSTCHURN_FEB.

Bibliography

 Adriaans, P. a. (1996). Data Mining, Harlow,. UK: Addison Wesley.

 Fahrner, C. &. (1995). A survey of database transformations based on the entity-


relationship model. Data & Knowledge Engineering,.

 H. Lenz and A. Shoshani. Proc. Int. Conf. on Scientific and Statistical Database
Management. In Summarizability in OLAP and statistical data bases (pp. 1997, pp.
132–143.). Olympia, WA, .

 Hammergren, T. ( 1996). Data Warehousing Strategies, Technologies, and


Techniques,. New York:: McGraw-Hill,.

 Inmon, W. H. ( 1996.). Building the Data Warehouse (2nd Edition), . New York: :
Wiley,.

 Kelly, S. (1997). Data Warehousing in Action, Chichester,. UK:: Wiley, .

 M. Golfarelli, D. M. The dimensional fact model: a conceptual model for data


warehouses . In S. Rizzi. Int. J. Coope.Informat. Sys., .

 Mattison, R. (1996). Data Warehousing Strategies, Technologies, and Techniques, .


New York: McGrawHill,.

 Singh, H. (1999). Interactive Data Warehousing, . Upper Saddle River,: NJ: Prentice-
Hall PTR,.

 Westphal, C. a. (1998). Data Mining Solutions. New York: : Wiley.

36
Assessment # 2 Submission Template
Advanced Databases (KL7011)

Appendix 1

Group Work and Peer Assessment


The ability to work together as a group is one of the key skills that employers want.
Our programmes are structured to give you experiences of this. Skills and
techniques are covered in a variety of different modules and you are expected to
apply those learnt in one module in subsequent group work. Not all groups will work
together well and these can provide some of the most valuable learning experiences.

You should keep records of your group discussions, meetings and communications
and any group code of conduct you agree. Unless otherwise instructed, you should
keep these as they may be requested as evidence by the module team. It is your
responsibility to try and make the group work. If there are problems during the
assessment you are expected to try and resolve these and the evidence should
show this.

Peer assessment will be used to adjust the marks you will get for assignment 2. You
are asked to think carefully about completing peer assessments. If someone has
contributed little to the assignment, why give them full marks allocated for the entire
assignment? It only diminishes the high standards of other students. Conversely, if
someone has worked really hard and well, they should be rewarded with good
marks. If everyone in the group has been ‘pulling their weight’ as agreed and
expected of them, then marks should shared evenly, in which case, you may decide
not to submit any peer assessment forms. Further guidance is given in Appendix 2.

It is important that peer assessment is carried out CONFIDENTIALLY. The reason


for this is to ensure that there is no coercion. In this regard, each member must
complete the form (as shown in Appendix 3) for each group member including
themselves. The information given in Appendix 2 is there to help you assess each
of your peers. Please note that your tutors have the right to alter peer assessments
if they feel they do not reflect the true position or if the team records you have kept
cannot justify your mark.

37
Assessment # 2 Submission Template
Advanced Databases (KL7011)

You should write brief comments justifying the marks you have given your peers and
yourself. This is especially important for any peer mark that is less than 5 or greater
than 8.

Each student to submit the peer assessment form separately on the Blackboard.

How Individual marks are calculated ?

Once all the peer assessment forms marks have been recorded, the average peer
assessment mark for the team and each member is calculated.
Initially, the group work is marked out of 100% and then the individual and team
averages are used to determine each individual’s mark.
The following example illustrate how the individual mark is calculated for each
member of the group. The group mark is 60 out of 100 and the average group peer
assessment contribution mark is 8 out of 10.

Student A gets an individual Student B gets an individual Student C gets an


contribution of 9 contribution of 7 individual contribution of 6

A gets 9/8*60 = 67.5 / 100 B gets 7/8 * 60 = 52.5 / 100 C gets 6/8 * 60 = 45 / 100

Student A is rewarded for the extra effort put in, which is recognised by his/her
peers.

38
Assessment # 2 Submission Template
Advanced Databases (KL7011)

Appendix 2 – Guidance on Peer Assessment Criteria

This table is intended for guidance rather than as a definitive document. All the
categories are ranges and you should use your guidance to position individuals
within these ranges. It is suggested that you mark in the three categories and
average the final mark. This may be a weighted average depending on your views of
the relative importance for the specific assignment.
Remember that not everybody in a team can be or should be a leader or have a
specifically defined role. This depends on how you as a team agreed to organise
yourselves and share the work.

39
Assessment # 2 Submission Template
Advanced Databases (KL7011)

Appendix 3

Peer Assessment Form

Group _________________________________________________

Name of Student Completing this Form _______________________________

Write down the names of each group member (including yourself) and alongside
give a score out of 10. You are advised to read through the information on ‘Peer
Assessment’ and ‘How the Formula is Calculated’ before commencing.

Name Mark out of 10 Comments

You are expected to comment on each mark (including your own) to explain why you
have given it. However, it is compulsory for you to make a comment if you have
given any member a peer assessment mark that is either less than 5 or greater than
8.

Page 40 of 43
Assessment # 2 Submission Template
Advanced Databases (KL7011)

Appendix 4 – Example Screenshots from SQL Developer Oracle Data Miner

Page 41 of 43
Assessment # 2 Submission Template
Advanced Databases (KL7011)

Page 42 of 43
Assessment # 2 Submission Template
Advanced Databases (KL7011)

Page 43 of 43

You might also like