Checklist For DATA3404
Checklist For DATA3404
Storage layer
• DBMS Storage Hierarchy
• Buffer Manager
o Buffer replacement policies/pinning of pages
• Disk Storage organisation
o Column vs row store and page/record layouts
Indexing
• B+ Tree
• Static and Dynamic hashing
• Bitmap Indexes
• Index classification
• Database tuning using indexing
Query Execution
• Query processing steps
o Pipelining vs materialization
• Relation algebra expression and query execution plans
• Physical operator algorithms: Join algos, external sorting…
Query Optimization
• Basic query optimization steps
• Heuristic query optimizations (algebraic query transformation, equivalent RA expressions)
• Cost-based query optimization
• Role of statistics
NoSQL
• NoSQL background and classification
• Key-Value Stores, data model, querying (Dynamo & Cassandra)
• (Distributed) Column Stores, data model, querying (HBASE)
• Document Stores, data model, querying (MongoDB)
What is the role of a buffer manager in a DBMS?
Estimate the minimum and maximum storage costs for a given schema.
When are indexes good, when are they bad? Role for querying?
Given a database schema and a workload specification, suggest a set of suitable indexes to improve
the performance of the system.
What role does sorting play for different query execution algorithms?
What are the costs of a table scan versus an (clustered/unclustered) index scan?
Draw the best (left-deep) query plan for the following SQL query. Why left-deep?
Give an SQL query, find a good query execution plan for it. Explain your choices.
How does HDFS follow the distributed data processing principles covered in this lecture?
What kind of queries can you answer with a column store? Which ones with a document store?