Data Mining Assgn 1 2025
Data Mining Assgn 1 2025
1. Construct a bitmap index for the attributes item, colour and city for the following relation
cars:
ItemID Item Colour City
I1 Nut Grey Faridabad
I2 Bolt Red Delhi
I3 Screw Black Noida
I4 Bolt Black Faridabad
I5 Screw Blue Noida
2. Construct a bitmap index for the attributes brand, type, colour and risk for the following
relation cars:
CarID brand Type colour risk
C1 Opel Corsa Grey Low
C2 Opel Corsa Red Medium
C3 Toyota Etios Black Medium
C4 Audi A4 Black High
3. Suppose that a data warehouse consists of the three dimensions time, doctor, and patient,
and the two measures count and charge, where charge is the fee that a doctor charges a
patient for a visit.
a. Enumerate three classes of schemas that are popularly used for modeling data
warehouses.
b. Draw a schema diagram for the above data warehouse using one of the schema classes
listed in (a).
c. Starting with the base cuboid [day, doctor, patient], what specific OLAP operations
should be performed in order to list the total fee collected by each doctor in 2010?
d. To obtain the same list, write an SQL query assuming the data are stored in a relational
database with the schema fee (day, month, year, doctor, hospital, patient, count,
charge).
4. Suppose that a data warehouse for Big University consists of the four dimensions student,
course, semester, and instructor, and two measures count and avg grade. At the lowest
conceptual level (e.g., for a given student, course, semester, and instructor combination),
the avg grade measure stores the actual course grade of the student. At higher conceptual
levels, avg grade stores the average grade for the given combination.
a. Draw a snowflake schema diagram for the data warehouse.
b. Starting with the base cuboid [student, course, semester, instructor], what specific
OLAP operations (e.g., roll-up from semester to year) should you perform in order to
list the average grade of CS courses for each Big University student.
c. If each dimension has five levels (including all), such as “student < major < status <
university < all”, how many cuboids will this cube contain (including the base and apex
cuboids)?
5. Suppose that a data warehouse consists of the four dimensions date, spectator, location,
and game, and the two measures count and charge, where charge is the fare that a spectator
pays when watching a game on a given date. Spectators may be students, adults, or seniors,
with each category having its own charge rate.
a. Draw a star schema diagram for the data warehouse.
b. Starting with the base cuboid [date, spectator, location, game], what specific OLAP
operations should you perform in order to list the total charge paid by student spectators
at GM Place in 2010?