0% found this document useful (0 votes)
70 views27 pages

Compilation Chapter 12-DDBMS-ans

1) The database administrator (DBA) is responsible for controlling and managing the shared database within an organization. Key skills for a DBA include technical skills to manage the database and managerial skills to control the database administration function. 2) Multidimensional data analysis allows data to be viewed from different business perspectives by relating data in multiple ways, such as relating sales to customers, time periods, product lines, and locations. This multidimensional view better represents how businesses analyze data. 3) A star schema is a data modeling technique used to map multidimensional decision support data into a relational database. It has facts, dimensions, attributes, and attribute hierarchies to represent aggregated business data for different
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
70 views27 pages

Compilation Chapter 12-DDBMS-ans

1) The database administrator (DBA) is responsible for controlling and managing the shared database within an organization. Key skills for a DBA include technical skills to manage the database and managerial skills to control the database administration function. 2) Multidimensional data analysis allows data to be viewed from different business perspectives by relating data in multiple ways, such as relating sales to customers, time periods, product lines, and locations. This multidimensional view better represents how businesses analyze data. 3) A star schema is a data modeling technique used to map multidimensional decision support data into a relational database. It has facts, dimensions, attributes, and attribute hierarchies to represent aggregated business data for different
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 27

Final Question ITS472

Chapter 12 – Distributed DBMS


Oct 2008
QUESTION 3
a) Who is the database administrator? List THREE managerial and technical skills
required for a DBA.
(5 marks)
The database administrator (DBA) is the person responsible for the control and
management of the shared database within an organization. The DBA controls the
database administration function within the organization

b) Suppose you are selling the data warehouse idea to your users. How would you
explain to them what multidimensional data analysis is and explain its advantages?

Multidimensional data analysis refers to the processing of data in which data are
viewed as part of a multidimensional structure, one in which data are related in
many different ways. Business decision makers usually view data from a business
perspective. That is, they tend to view business data as they relate to other business
data. For example, a business data analyst might investigate the relationship
between sales and other business variables such as customers, time, product line,
and location. The multidimensional view is much more representative of a business
perspective. A good way to visualize the development and use of relationships is to
examine data pivot tables in MS Excel.
.
(6 marks)
c) The data warehouse project is in the design phase. Explain how you would use a star
schema in the design.
(9 marks)
The star schema is a data modeling technique that is used to map multidimensional
decision support data into a relational database. The reason for the star schema's
development is that existing relational modeling techniques, E-R and normalization,
did not yield a database structure that served the advanced data analysis
requirements well. Star schemas yield an easily implemented model for
multidimensional data analysis while still preserving the relational structures on
which the operational database is built.

The basic star schema has two four components: facts, dimensions, attributes, and
attribute hierarchies. The star schemas represent aggregated data for specific
business activities. Using the schemas, we will create multiple aggregated data
sources that will represent different aspects of business operations. For example, the
aggregation may involve total sales by selected time periods, by products, by stores,
and so on. Aggregated totals can be total product units, total sales values by
products, etc.

QUESTION 4

a) i. Discuss TWO reasons why fragmentation is needed in distributed DBMS.


(4 marks)
To enhance intraquery concurrency
To increase the throughput

But extra cost for queries involving more than one segment residing at
different sites
ii. Differentiate by giving appropriate example between horizontal and vertical
fragmentation.
(4 marks)
Horizontally Fragmented Data

Horizontally fragmented data means that data is distributed across


different sites based on one or more primary keys. This type of data
distribution is typical where, for example, branch offices in an
organization deal mostly with a set of local customers and the related
customer data need not be accessed by other branch offices.

Vertically Fragmented Data

Vertically fragmented data is data that has been split by columns


across multiple systems. The primary key is replicated at each site.
For example, a district office may maintain client information such as
name and address keyed on client number while head office
maintains client account balance and credit information, also keyed
on the same client number.

b) Consider the distributed database for an electrical company with the following
relations:

PAY EMPLOYEE

Title Salary ENo EName Title

Elec. Eng. 40000 E1 J.Doe Elec. Eng.

Syst. Anal. 34000 E2 M.Smith Syst. Anal.

Mesh. Eng. 27000 E3 A.Lee Mach. Eng.

Programm 24000 E4 J.Miller Programmer


er E5 B.Casey Syst. Anal.

E6 L.Chu Elec. Eng.

E7 R.Davis Mach. Eng.

E8 J.Jones Syst. Anal.


PROJECT ASSIGNMENT

JNo JName Budget Location ENo JNo Responsible Duration

J1 Instrumentation 150000 Montreal E1 J1 Manager 12

J2 Database Develop 135000 New York E2 J1 Analyst 24

J3 CAD/CAM 250000 New York E2 J2 Analyst 6

E3 J3 Consultant 10

E3 J4 Engineer 48

E4 J2 Programmer 18

i. Write the SQL statements to define the horizontal fragments of the


PROJECT relation by location.
(6 marks)
P1 : CREATE FRAGMENT Location-Paris AS
SELECT * FROM PROJECT
WHERE Location = ‘Paris;

P2 : CREATE FRAGMENT Location-Montreal AS


SELECT * FROM PROJECT
WHERE Location = ‘Montreal’;

P3 : CREATE FRAGMENT Location-NewYork AS


SELECT * FROM PROJECT
WHERE Location = ‘New York’;
ii. Give TWO predicates by which list EMPLOYEE may be horizontally partitioned
by Salary.
(6 marks)

EMP1 = EMP µ PAY1

EMP2 = EMP µ PAY2


where

PAY1 = sSAL£30000 (PAY)

PAY2 = sSAL>30000 (PAY)

Okt 2009
QUESTION 5

a) Explain the difference between a centralized database and a distributed database. Illustrate
your answers with diagrams.
Distributed databases can be defined as a collection of multiple, logically interrelated
database distributed over a computer network.

And distributed database management system (DDBMS) manages the distributed


databases and makes this distribution transparent to the user.

All the database must be logically related that are managed by DDBMS (distributed
database management system). The distributed databases are not just the collection of files'
stored individually at different network nodes. Rather to form DDBS (distributed
databases) all the files should be logically related and there should be structures among
those files.

Meanwhile centralized database are managed by DBMS and no data distribution is done in
this case

(6 marks)

b) Consider the retail store database schema below:

STORE (StoreID, Region, ManagerID, SquareFeet)


EMPLOYEE (EmployeeID, Name, Address, Sex, Salary, Position, DOB, StoreID)
DEPARTMENT (DepartmentID, DepartmentName, ManagerID, SalesGoal)

Assume that a data communications network links a computer at corporate


headquarters in Shah Alam with a computer in each retail outlet in Ipoh (StoreID=
10) Penang (StoreID=20) and Johor Baru (StoreID=30). The average employee in
each of the three stores is 100. There are 10 departments in each store.
The corporation generates all payroll checks and also keeps the employees
personnel information.

i. State the relation(s) you would


a. Horizontally fragment
EMPLOYEE relation

b. Vertically fragment
EMPLOYEE relation

c. Not fragment
DEPARTMENT or STORE relation

(3 marks)

ii. Derive the horizontal and vertical fragments using relational algebra.

Horizontal fragments
S1 : σ StoreID= 10(EMPLOYEE)
S2 : σ StoreID=20 (EMPLOYEE)
S3 : σ StoreID=30(EMPLOYEE)

Vertical fragments
For payroll information

S4 : πEmployeeID, Position, Sex, Salary, DOB(Employee)

For personnel information


S5: πEmployeeID, Name, Address, StoreID (Employee)
(5 marks)

iii. Mixed fragmentation can be applied for the scenario above. Which relation
that you have chosen in (i) above would you recommend to be mixed
fragmented? Give reasons for your answers and show the fragments using
relational algebra.
Employee Relation
For payroll information

S4 : πEmployeeID, Position, Sex, Salary, DOB(Employee)

For personnel information


S5: πEmployeeID, Name, Address, StoreID (Employee)

S51 : σ StoreID= 10(S5)


S52 : σ StoreID=20 (S5)
S53 : σ StoreID=30(S5)

Reconstruction:

EMPLOYEE = S1 U S2 U S3  horizontal

EMPLOYEE = S4 join S5  vertical

EMPLOYEE = S4 join (S51 U S52 U S53)  mix

(6 marks)

April 2010
QUESTION 4

a) i. A relation can be distributed by fragmenting or replicating it across several sites.


Explain these concepts and how they differ.
Answer:
Fragmenting – A relation may be divided into a number of subrelations, called
fragments, which are then distributed across several sites. There are two types of
fragmentation – vertical and horizontal.
Replication – This strategy consists of maintaining a complete copy of the
database at each site.
(5 marks)

ii. Explain fragmentation transparency and location transparency.


Answer:
Fragmentation transparency – The highest level of distribution transparency. If
fragmentation transparency is provided by the DDBMS then the user need not
know that the data is fragmentated.

Location transparency – The middle level of distribution transparency. With


location transparency the user must know how the data has been fragmented but
still does not have to know the location of the data.

(6 marks)

b) i. List FOUR (4) types of information contained in a data dictionary.


Answer:
Data descriptors called metadata that define the source, the use, the value, and
the meaning of the data.
(4 marks)
ii What is the difference between mandatory access control and discretionary
access control?
Answer:
Mandatory access control- a database security approach for highly sensitive and
static databases. In mandatory control approaches, each object is assigned a
classification level and each user is given clearance level. A user can access a
database element if the user’s clearance level provides access to the classification
level of the element.

Discretionary access control


Users are assigned access rights or privileges to specified parts of a database. It is
the most common kind of security control supported by commercial DBMSs.
(5 marks)

April 2011
QUESTION 4

a) Explain the significance of sparsity in a data cube.


Ans: Sparsity indicates the extent of empty cells in a data cube. Sparsity can be a
problem if two or more dimensions are related. For example, if certain products are
sold only in selected states, cells may be empty. If a large number of cells are empty,
the data cube can waste space and be slow to process. Special compression
techniques can be used to reduce the size of sparse data cubes.
(4 marks)
b) Give TWO (2) differences between transaction processing and decision support
processing.
Transaction processing
 Uses operational/production databases
 Short-term decisions: fulfill orders, resolve complaints, provide staffing
Decision support processing
 Uses integrated and summarized data
 Medium and long-term decisions: capacity planning, store locations, new
lines of business
(4 marks)

c) A university needs to analyze the quality of its education and wants to use data
warehousing technology. They want to summarize grades, courses, and students
data using a data cube containing grades of students per course, year, and
department.

i. Define a star schema to represent the data cube in a relational database.


Include the attributes and indicates the primary keys and foreign keys.
course(cid,cname, dept, ...) dimension
student(ssn,sname, ...) dimension
grade(ssn, cid,yaercode, score) fact table
year(yearcode,year,semester) dimension
(8 marks)

ii. Define a data query in SQL to summarize the average grades per
department over years 2002-2007.
select course.dept,course.year,avg(grade.score)
from grade, student, course,year
where year >= 2002
and year <= 2007
and grade.ssn = student.ssn
and grade.cid = course.cid
and grade.yearcode =year.yearcode
group by dept,year
(4 marks)

Jan 2012
QUESTION 2

An international bank in Kuala Lumpur wants to distribute its credit card information in
Indonesia, Thailand and Vietnam. The relational schema below represents the information.

CUSTOMER_CREDIT_CARD (CardNo, CardType, Name, Address, Country,


CreditLimit, ExpiryDate)

CardType is of Visa or Master. As well as distributing the data in the country concerned,
there is an additional requirement to access customer data according to personal
information or by credit card information.

As an IT consultant to the bank, you are being asked the following questions:

a) Illustrate the distributed DBMS architecture that you would recommend.

Site 1
Database

Site 3 Site 2
Site 3 Computer

Network
Database Database

(4 marks)
b) Explain the advantages of using a DDBMS for the bank.
Advantages of DDBMS:
- Improved shareability and autonomy
- Improved availability
- Improved reliability
- Improved performance
- etc (refer to Connolly. Pages 738 – 739 for any 2 correct answers)

(Any TWO each 2 marks, total = 4 marks)

c) Describe TWO (2) data allocation strategies that can be applied.

Centralized: Consists of single database and DBMS stored at one site with users
distributed across the network

Partitioned: Database partitioned into disjoint fragments, each fragment assigned to


one site

Complete Replication: Consists of maintaining complete copy of database at each site


Selective Replication: Combination of partitioning, replication, and centralization
(Any TWO each 2 marks, total = 4 marks)

d) Suggest a suitable fragmentation schema for the CUSTOMER_CREDIT_CARD relation


based on the information given.

C1: πCardNo, Name, Address, Country (CUSTOMER_CREDIT_CARD)


C2: πCardNo, CardType, CreditLimit, ExpiryDate (CUSTOMER_CREDIT_CARD)

C11: σcountry = ‘Indonesia’(C1)


C12: σCountry = ‘Thailand’(C1)
C13: σCountry = ‘Vietnam’(C1)

(C1 and C2  3 marks, C11, C12 and C13  3 marks, total = 6 marks)
e) Is your fragmentation in (d) correct? Explain your answer.
Yes, it is correct based on rule of correctness:

Completeness: Each tuple in the relation appears in either fragment C11 or C12 or
C13 or C2.

Reconstruction:

(C11 U C12 U C13) ⊳ ⊲ C2 = CUSTOMER_CREDIT_CARD

Disjointness: the fragments are disjoint

(Any TWO each 1 mark, total = 2 marks)

Jun 2012
QUESTION 4

a) Consider the scenario below:

Encik Abdullah manages a small product distribution company. Because the business
is growing fast, Encik Abdullah recognizes that it is time to manage the vast
information pool to help guide the accelerating growth. Encik Abdullah, who is
familiar with spreadsheet software, currently employs a small sales force of four
people. He has asked you to develop a data warehouse application prototype that
enables him to study sales quantity by year, region, agent, and product. (This
prototype is to be used as the basis for a future data warehouse database.)

The following SALES ORDER table describe about Encik Abdullah’s company sales
quantity according to year, region, agent, and product.
SALES ORDER

Year Region Agent Product Quantity

2009 East Carlos Erasers 50

2009 East Tere Erasers 12

2009 North Carlos Widgets 120

2009 North Tere Widgets 100

2009 North Carlos Widgets 30

2009 South Victor Balls 145

2009 South Victor Balls 34

2009 South Victor Balls 80

2009 West Mary Pencils 89

2009 West Mary Pencils 56

2010 East Carlos Pencils 45

2010 East Victor Balls 55

2010 North Mary Pencils 60

2010 North Victor Erasers 20

2010 South Carlos Widgets 30

2010 South Mary Widgets 75

2010 South Mary Widgets 50

2010 South Tere Balls 70

2010 South Tere Erasers 90

2010 West Carlos Widgets 25

2010 West Tere Balls 100

Using the data from the SALES ORDER table above:

i) Identify the appropriate fact table component.


Ans:
SALES ORDER is a fact table.
(1 mark)

ii) Identify the appropriate dimension tables.


Ans:
YEAR, REGION, AGENT, and PRODUCT are dimension tables.

(3 marks)
iii) Draw a star schema for the Data Warehouse.
Ans:

TIME AGENT
TimeID AgentID

Day AgentName

Week AgentAddress
in in

SALES ORDER
SalesID

TimeID

RegionID

AgentID

REGION PRODUCT
RegionID ProductID

RegionName ProductName
in
in

(4 marks)

iv) Describe the data cube from the table above.


Ans:

Other than below answer also can be accepted.


(e.g Dimension could be YEAR, PRODUCT and REGION, measures= profit)
300

250
Car-
200
los
Mary
150

100

50

0
East North South West

(6 marks)

b) What is the difference between ERD Snowflake Schema and ERD Constellation
Schema? Support your answers with diagrams.
Ans:
Snowflake schema: A data modeling repsentation for multidimensional databases. In
arelational database, a asnowflake schema has mulyiple levels of dimension tables
related to one or more fact tables.
Page 570(fig 16.11, Mannino)

ERD Constellation Schema contains multiple fact tables in the center related to
dimension tables. Typically, the fact tables share some dimension tables.
Page 569(fig 16.10, mannino)

(6 marks)
Jan 2013
QUESTION 1

a) Table 1.0 is part of a relational representation of FSKM student enrollment in 2012.


Table 1.0: FSKM student enrollment in 2012

Campus Program Total Student

Shah Alam CS220 240

Shah Alam CS221 300

Shah Alam CS224 100

Shah Alam CS231 150

Arau CS220 110

Merbok CS231 80

Machang CS221 200

Dungun CS224 70

i) Transform the table above into a multidimensional data cube.


ANSWER:

CS231 150 0 80 0 0

CS224 100 0 0 0 70

CS221 300 0 0 200 0

240 110 0 0 0
CS220
2012

Shah Merbo Macha Dungu


Arau
Alam k ng n
(4 marks)

ii) List TWO (2) advantages of multidimensional representation over relational


representation for FSKM management.
(4 marks)
ANSWER:
Advantages:
Better visualization of data
Easier and smoother decision making
(2 marks each)

b) Match phrases in column P with the correct term in column Q.

i) Process of discovering implicit patterns in data and using these patterns for
business advantage.

ii) Values in dimension.

iii) Allows users to navigate from a more general level to a more specific level.

iv) Retrieves a subset of a data cube similar to the restrict operator of relational
algebra.

v) Multiple levels of dimension tables surround the fact table.

vi) Stores numeric data such as sales results.


Q

Members

Drill-down

Data mining

Dice

Fact table

Data coupling

Measures

Slice

Snowflake schema

Constellation schema

Roll-up

Dimension table

(12 marks)
ANSWER:
i) Data mining
ii) Members
iii) Drill down
iv) Slice
v) Snowflake schema
vi) Fact table
(2 marks each)

Dec 2013
QUESTION 4

LAMAN SURI, a magazine publishing company has its headquarter in Kuala Lumpur. It has
branch offices in Alor Star (northern region – code “N”), Kuantan (eastern region – code
“E”), Johor Baru (southern region – code “S”) while the headquarter in Kuala Lumpur
services the western region – code “W”.
 Currently, the company is using a centralized database system located in its
headquarter.
 The company publishes one regional magazine monthly in the above offices.
 The company has 300,000 customers (subscribers) distributed throughout the regions.

 On the first day of each month, an annual subscription INVOICE is printed and sent to
each customer whose subscription is due for renewal.

The relations exist in LAMAN SURI database relational schema is:

CUSTOMER (CustID, Name, Address, City, State, PostCode, Region, SubscribeDate)


INVOICE (InvID, InvDate, InvTotal, CustID, Region)

The company's management is aware of the problems associated with the centralized
database and has decided that it is time to upgrade the processing of the subscriptions in its
offices. Each office will handle its own customer and invoice data. The company's
management in Kuala Lumpur wants to have access to all customers and invoice data to
generate annual reports and to run queries.

Answer the following questions:

a) List TWO (2) problems associated with centralized database.


(4 marks)
Answer:

 Bottleneck or data traffic


 Data availability is not efficient
 Reliability
 Large population – time consuming for searching data

(Any suitable answer)

b) Suggest an alternative to the centralized database for the company. Illustrate your
answer with a diagram.
Answer:
The company requires a distributed system with distributed database management
systems capabilities. The distributed system will be distributed among the company’s
offices.
(3 marks)

Site 1-Alor Star

Computer
DDBMS
Network
Site 2-Kuala Lumpur Site 3-Kuantan

DDBMS DDBMS

Site 4-Johor Baru DDBMS


(3 marks)

c) What type of data fragmentation is needed for each table?


Answer:
The relations must be horizontally partitioned, using the Region attribute for the
CUSTOMER and INVOICE relations.
(4 marks)

d) Show the database fragments for the CUSTOMER relation. The fragments should
include attributes and sample data (one row only). Indicate where the fragment is
located.
Answer:

Fragment F1 Location: Kuala Lumpur

CustI Name Address City State PostCo Regio SubscribeD


D de n ate

1088 Ammar Lot 32, Kg Shah Alam Selang 40000 W 30/7/2010


4 Ali Jawa or

Fragment F2 Location: Alor Star

CustI Name Address City State PostCo Regio SubscribeD


D de n ate

6645 Nabilah 19, Jalan Sik Kedah 08200 N 28/4/2012


3 Isa Mawar

Fragment F3 Location: Kuantan

CustI Name Address City State PostCo Regio SubscribeD


D de n ate

7440 Iman KM 5, Kg Kota Bharu Kelanta 15050 E 8/8/2013


2 Ismail Bukit n

Fragment F4 Location: Johor Baru


CustI Name Address City State PostCo Regio SubscribeD
D de n ate

4654 Mina 12, Jalan Alor Gajah Melaka 78000 S 1/6/2013


6 Samy Panglima

(6 marks)
Jun 2014
QUESTION 5

AM Rich Sdn, Bhd (AM Rich) is a manufacturing firm that hires you as an IT consultant to design
a distributed relational database. The company is headquartered in Kuala Lumpur and has
major branches in Shah Alam, Jasin, and Dungun. The database involved consists of four
tables, labeled A, B, C, and D, with the following characteristics:

 Table A consists of 500,000 records and is heavily used in Kuala Lumpur and Shah
Alam.
 Table B consists of 100,000 records and is frequently required in all four cities.
 Table C consists of 75,000 records. Records 1-30,000 are most frequently used in
Shah Alam. Records 30,001-75,000 are most frequently used in Jasin.
 Table D consists of 20,000 records and is used almost exclusively in Kuala Lumpur.

e) Plan a distributed relational database design for AM Rich. Justify your placement,
replication, and partitioning of the tables.
Answer:
Table A should be replicated with a copy in Kuala Lumpur and a copy in Shah Alam. This
will minimize telecommunications costs and improve availability.

Since Table B is used in all four cities, it should be replicated for availability purposes. If it is
not particularly volatile we could consider placing a copy of it in each city. If it is somewhat
volatile we might have just two copies, placed in any two of the cities. If it is very volatile
then we have to weigh the costs of synchronous or asynchronous update against the
benefits of availability when considering replication.

Table C would certainly be a candidate for partitioning, with one partition in Shah Alam and
one in Jasin as indicated.
Since Table D is used almost exclusively in Kuala Lumpur, it should not be replicated and
should be stored only in Kuala Lumpur.
(Each table explanation, 2½ marks, total 10 marks)

f) Give TWO (2) advantages and TWO (2) disadvantages of implementing a distributed
relational database at AM Rich.
Answer:
Advantages:

 Local autonomy.
 Reduced communications costs because each table can be located at the site that
most heavily uses it.
 Improved availability because portions of the database are available even if one or
some of the sites are down.

Disadvantages:

 Several sites have to be concerned with security, concurrency, backup and recovery
controls.
 Requires a distributed directory and the software to support location transparency.
 Requires distributed joins.
(Any 2 advantages and 2 disadvantages, 1½ marks each, total 6 marks)

g) In the next five years, AM Rich Sdn. Bhd. is planning to implement a mobile database for
the company due to the increasing branches in the northern and southern areas. As an
IT consultant, do you think that mobile database is suitable for this company? In your
opinion, give a driving force or factor that contributes to the wide usage of mobile
database.
Answer:
A mobile database is a database that can be connected to by a mobile computing device
over a wireless mobile network. It is suitable for AM Rich when it deals with many
branches to save cost of travelling.

Driving forces (factors):

 Number of Smartphones in use around the world passed 1 billion in 2012.


 Next billion devices could be reached within less than three years.
 More businesses move toward employees’ mobility.
 Powerful lightweight computing devices and low cost mobile connectivity paved
the way for data-driven applications.
(Opinion 2 marks, one factor 2 marks, total 4 marks)

Dec 2015
-

You might also like