Assignment Data Warehousing - Odt
Assignment Data Warehousing - Odt
Suppose that a data warehouse for Big-University consists of the following four dimensions; Student, Course, Semester, and instructor, and two measures count and avg grade . !hen at the lowest conceptual level "e.g., for a given student, course, semester, and instructor com#ination$, avg-grade measure stores the actual course grade of the student. %t higher conceptual levels, avg-gradestores the average grade for the given com#ination. "a$ &raw a snowfla'e schema diagram for the data warehouse.
Answer: (a) The following is the best possible Snowflake Schema Diagram for the given Data Warehouse.
semeste r semest dimension er_id table season year instruct or instructor_ dimension id table instructor_ num first_name last_name department
grade fact studen table t_id course _id semest er_id instruct or_id grade
student dimension student table _id student _num first_na me course last_na dimension course_ me table id major_i dcourse_ num depart ment
"#$ Starting with the #ase cu#oid (student; course; semester; instructor), what specific *+%, operations "e.g. roll-up from semester to year$ should one perform in order to list the average grade of CS courses for each Big-university student.
Answer: (b) Roll up instructor an! semester !imensions to "all". Roll up course !imension to !epartment an! take slice for #!epartment $ %S&.
"c$ -f each dimensions has five levels "including all$, such as .student/ ma0or / status / university / all., 1ow many cu#oids will this cu#e contain " including the #ase and ape2 cu#oids$3
Answer: (c) the number of possible cuboi!s is the pro!uct' i.e.
( Li + ( ) '
i
Where Li !enotes the number of levels in !imension i' therefore the cube contains'
) *= (+,) cuboi!s.
Q4. % data warehouse can #e modeled #y either a star schema or a snowfla'e schema. Briefly descri#e the similarities and the differences of the two models, and then analy5e their advantage and disadvantage with regard to one another. 6ive your opinion of which might #e more empirically useful; and state the reason #ehind your answer.
Answer:
The- are similar in the sense that the- all have a fact table' as well as some !imensional tables. The ma.or !ifference is that some !imension tables in the snowflake schema are normali/e!' therebfurther splitting the !ata into a!!itional tables. The a!vantage of the star schema is its simplicit-' which will enable efficienc-' but it re0uires more space. 1or the snowflake schema' it re!uces some re!un!anc- b- sharing common tables: the tables are easto maintain an! save some space. 2owever' it is less efficient an! the saving of space is negligible in comparison with the t-pical magnitu!e of the fact table. Therefore' empiricall-' the star schema is better simpl- because efficienc- t-picall- has higher prioritover space as long as the space re0uirement is not too huge. 3n in!ustr-' sometimes the !ata from a snowflake schema ma- be !e normali/e! into a star schema to spee! up processing. Another option is to use a snowflake schema to maintain !imensions' an! then present users with the same !ata collapse! into a star.