Compilation Chapter 12-DDBMS-ans
Compilation Chapter 12-DDBMS-ans
b) Suppose you are selling the data warehouse idea to your users. How would you
explain to them what multidimensional data analysis is and explain its advantages?
Multidimensional data analysis refers to the processing of data in which data are
viewed as part of a multidimensional structure, one in which data are related in
many different ways. Business decision makers usually view data from a business
perspective. That is, they tend to view business data as they relate to other business
data. For example, a business data analyst might investigate the relationship
between sales and other business variables such as customers, time, product line,
and location. The multidimensional view is much more representative of a business
perspective. A good way to visualize the development and use of relationships is to
examine data pivot tables in MS Excel.
.
(6 marks)
c) The data warehouse project is in the design phase. Explain how you would use a star
schema in the design.
(9 marks)
The star schema is a data modeling technique that is used to map multidimensional
decision support data into a relational database. The reason for the star schema's
development is that existing relational modeling techniques, E-R and normalization,
did not yield a database structure that served the advanced data analysis
requirements well. Star schemas yield an easily implemented model for
multidimensional data analysis while still preserving the relational structures on
which the operational database is built.
The basic star schema has two four components: facts, dimensions, attributes, and
attribute hierarchies. The star schemas represent aggregated data for specific
business activities. Using the schemas, we will create multiple aggregated data
sources that will represent different aspects of business operations. For example, the
aggregation may involve total sales by selected time periods, by products, by stores,
and so on. Aggregated totals can be total product units, total sales values by
products, etc.
QUESTION 4
But extra cost for queries involving more than one segment residing at
different sites
ii. Differentiate by giving appropriate example between horizontal and vertical
fragmentation.
(4 marks)
Horizontally Fragmented Data
b) Consider the distributed database for an electrical company with the following
relations:
PAY EMPLOYEE
E3 J3 Consultant 10
E3 J4 Engineer 48
E4 J2 Programmer 18
Okt 2009
QUESTION 5
a) Explain the difference between a centralized database and a distributed database. Illustrate
your answers with diagrams.
Distributed databases can be defined as a collection of multiple, logically interrelated
database distributed over a computer network.
All the database must be logically related that are managed by DDBMS (distributed
database management system). The distributed databases are not just the collection of files'
stored individually at different network nodes. Rather to form DDBS (distributed
databases) all the files should be logically related and there should be structures among
those files.
Meanwhile centralized database are managed by DBMS and no data distribution is done in
this case
(6 marks)
b. Vertically fragment
EMPLOYEE relation
c. Not fragment
DEPARTMENT or STORE relation
(3 marks)
ii. Derive the horizontal and vertical fragments using relational algebra.
Horizontal fragments
S1 : σ StoreID= 10(EMPLOYEE)
S2 : σ StoreID=20 (EMPLOYEE)
S3 : σ StoreID=30(EMPLOYEE)
Vertical fragments
For payroll information
iii. Mixed fragmentation can be applied for the scenario above. Which relation
that you have chosen in (i) above would you recommend to be mixed
fragmented? Give reasons for your answers and show the fragments using
relational algebra.
Employee Relation
For payroll information
Reconstruction:
EMPLOYEE = S1 U S2 U S3 horizontal
(6 marks)
April 2010
QUESTION 4
(6 marks)
April 2011
QUESTION 4
c) A university needs to analyze the quality of its education and wants to use data
warehousing technology. They want to summarize grades, courses, and students
data using a data cube containing grades of students per course, year, and
department.
ii. Define a data query in SQL to summarize the average grades per
department over years 2002-2007.
select course.dept,course.year,avg(grade.score)
from grade, student, course,year
where year >= 2002
and year <= 2007
and grade.ssn = student.ssn
and grade.cid = course.cid
and grade.yearcode =year.yearcode
group by dept,year
(4 marks)
Jan 2012
QUESTION 2
An international bank in Kuala Lumpur wants to distribute its credit card information in
Indonesia, Thailand and Vietnam. The relational schema below represents the information.
CardType is of Visa or Master. As well as distributing the data in the country concerned,
there is an additional requirement to access customer data according to personal
information or by credit card information.
As an IT consultant to the bank, you are being asked the following questions:
Site 1
Database
Site 3 Site 2
Site 3 Computer
Network
Database Database
(4 marks)
b) Explain the advantages of using a DDBMS for the bank.
Advantages of DDBMS:
- Improved shareability and autonomy
- Improved availability
- Improved reliability
- Improved performance
- etc (refer to Connolly. Pages 738 – 739 for any 2 correct answers)
Centralized: Consists of single database and DBMS stored at one site with users
distributed across the network
(C1 and C2 3 marks, C11, C12 and C13 3 marks, total = 6 marks)
e) Is your fragmentation in (d) correct? Explain your answer.
Yes, it is correct based on rule of correctness:
Completeness: Each tuple in the relation appears in either fragment C11 or C12 or
C13 or C2.
Reconstruction:
Jun 2012
QUESTION 4
Encik Abdullah manages a small product distribution company. Because the business
is growing fast, Encik Abdullah recognizes that it is time to manage the vast
information pool to help guide the accelerating growth. Encik Abdullah, who is
familiar with spreadsheet software, currently employs a small sales force of four
people. He has asked you to develop a data warehouse application prototype that
enables him to study sales quantity by year, region, agent, and product. (This
prototype is to be used as the basis for a future data warehouse database.)
The following SALES ORDER table describe about Encik Abdullah’s company sales
quantity according to year, region, agent, and product.
SALES ORDER
(3 marks)
iii) Draw a star schema for the Data Warehouse.
Ans:
TIME AGENT
TimeID AgentID
Day AgentName
Week AgentAddress
in in
SALES ORDER
SalesID
TimeID
RegionID
AgentID
REGION PRODUCT
RegionID ProductID
RegionName ProductName
in
in
(4 marks)
250
Car-
200
los
Mary
150
100
50
0
East North South West
(6 marks)
b) What is the difference between ERD Snowflake Schema and ERD Constellation
Schema? Support your answers with diagrams.
Ans:
Snowflake schema: A data modeling repsentation for multidimensional databases. In
arelational database, a asnowflake schema has mulyiple levels of dimension tables
related to one or more fact tables.
Page 570(fig 16.11, Mannino)
ERD Constellation Schema contains multiple fact tables in the center related to
dimension tables. Typically, the fact tables share some dimension tables.
Page 569(fig 16.10, mannino)
(6 marks)
Jan 2013
QUESTION 1
Merbok CS231 80
Dungun CS224 70
CS231 150 0 80 0 0
CS224 100 0 0 0 70
240 110 0 0 0
CS220
2012
i) Process of discovering implicit patterns in data and using these patterns for
business advantage.
iii) Allows users to navigate from a more general level to a more specific level.
iv) Retrieves a subset of a data cube similar to the restrict operator of relational
algebra.
Members
Drill-down
Data mining
Dice
Fact table
Data coupling
Measures
Slice
Snowflake schema
Constellation schema
Roll-up
Dimension table
(12 marks)
ANSWER:
i) Data mining
ii) Members
iii) Drill down
iv) Slice
v) Snowflake schema
vi) Fact table
(2 marks each)
Dec 2013
QUESTION 4
LAMAN SURI, a magazine publishing company has its headquarter in Kuala Lumpur. It has
branch offices in Alor Star (northern region – code “N”), Kuantan (eastern region – code
“E”), Johor Baru (southern region – code “S”) while the headquarter in Kuala Lumpur
services the western region – code “W”.
Currently, the company is using a centralized database system located in its
headquarter.
The company publishes one regional magazine monthly in the above offices.
The company has 300,000 customers (subscribers) distributed throughout the regions.
On the first day of each month, an annual subscription INVOICE is printed and sent to
each customer whose subscription is due for renewal.
The company's management is aware of the problems associated with the centralized
database and has decided that it is time to upgrade the processing of the subscriptions in its
offices. Each office will handle its own customer and invoice data. The company's
management in Kuala Lumpur wants to have access to all customers and invoice data to
generate annual reports and to run queries.
b) Suggest an alternative to the centralized database for the company. Illustrate your
answer with a diagram.
Answer:
The company requires a distributed system with distributed database management
systems capabilities. The distributed system will be distributed among the company’s
offices.
(3 marks)
Computer
DDBMS
Network
Site 2-Kuala Lumpur Site 3-Kuantan
DDBMS DDBMS
d) Show the database fragments for the CUSTOMER relation. The fragments should
include attributes and sample data (one row only). Indicate where the fragment is
located.
Answer:
(6 marks)
Jun 2014
QUESTION 5
AM Rich Sdn, Bhd (AM Rich) is a manufacturing firm that hires you as an IT consultant to design
a distributed relational database. The company is headquartered in Kuala Lumpur and has
major branches in Shah Alam, Jasin, and Dungun. The database involved consists of four
tables, labeled A, B, C, and D, with the following characteristics:
Table A consists of 500,000 records and is heavily used in Kuala Lumpur and Shah
Alam.
Table B consists of 100,000 records and is frequently required in all four cities.
Table C consists of 75,000 records. Records 1-30,000 are most frequently used in
Shah Alam. Records 30,001-75,000 are most frequently used in Jasin.
Table D consists of 20,000 records and is used almost exclusively in Kuala Lumpur.
e) Plan a distributed relational database design for AM Rich. Justify your placement,
replication, and partitioning of the tables.
Answer:
Table A should be replicated with a copy in Kuala Lumpur and a copy in Shah Alam. This
will minimize telecommunications costs and improve availability.
Since Table B is used in all four cities, it should be replicated for availability purposes. If it is
not particularly volatile we could consider placing a copy of it in each city. If it is somewhat
volatile we might have just two copies, placed in any two of the cities. If it is very volatile
then we have to weigh the costs of synchronous or asynchronous update against the
benefits of availability when considering replication.
Table C would certainly be a candidate for partitioning, with one partition in Shah Alam and
one in Jasin as indicated.
Since Table D is used almost exclusively in Kuala Lumpur, it should not be replicated and
should be stored only in Kuala Lumpur.
(Each table explanation, 2½ marks, total 10 marks)
f) Give TWO (2) advantages and TWO (2) disadvantages of implementing a distributed
relational database at AM Rich.
Answer:
Advantages:
Local autonomy.
Reduced communications costs because each table can be located at the site that
most heavily uses it.
Improved availability because portions of the database are available even if one or
some of the sites are down.
Disadvantages:
Several sites have to be concerned with security, concurrency, backup and recovery
controls.
Requires a distributed directory and the software to support location transparency.
Requires distributed joins.
(Any 2 advantages and 2 disadvantages, 1½ marks each, total 6 marks)
g) In the next five years, AM Rich Sdn. Bhd. is planning to implement a mobile database for
the company due to the increasing branches in the northern and southern areas. As an
IT consultant, do you think that mobile database is suitable for this company? In your
opinion, give a driving force or factor that contributes to the wide usage of mobile
database.
Answer:
A mobile database is a database that can be connected to by a mobile computing device
over a wireless mobile network. It is suitable for AM Rich when it deals with many
branches to save cost of travelling.
Dec 2015
-