0% found this document useful (0 votes)
195 views2 pages

Checklist For DATA3404

The document provides an exam revision checklist covering various topics in database management systems including storage layer mechanisms, indexing techniques, query execution, optimization, distributed data management, big data processing, and NoSQL systems. Specifically, it discusses buffer management in storage, indexing methods like B+ trees and hashing, query processing steps and physical operators, cost-based optimization and statistics, distributed system architectures, MapReduce and Spark frameworks, and various NoSQL data models.

Uploaded by

Abdullah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
195 views2 pages

Checklist For DATA3404

The document provides an exam revision checklist covering various topics in database management systems including storage layer mechanisms, indexing techniques, query execution, optimization, distributed data management, big data processing, and NoSQL systems. Specifically, it discusses buffer management in storage, indexing methods like B+ trees and hashing, query processing steps and physical operators, cost-based optimization and statistics, distributed system architectures, MapReduce and Spark frameworks, and various NoSQL data models.

Uploaded by

Abdullah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Exam Revision Checklist:

Storage layer
• DBMS Storage Hierarchy
• Buffer Manager
o Buffer replacement policies/pinning of pages
• Disk Storage organisation
o Column vs row store and page/record layouts

Indexing
• B+ Tree
• Static and Dynamic hashing
• Bitmap Indexes
• Index classification
• Database tuning using indexing

Query Execution
• Query processing steps
o Pipelining vs materialization
• Relation algebra expression and query execution plans
• Physical operator algorithms: Join algos, external sorting…

Query Optimization
• Basic query optimization steps
• Heuristic query optimizations (algebraic query transformation, equivalent RA expressions)
• Cost-based query optimization
• Role of statistics

Distributed data management


• Distributed system architectures, CAP Theorem
• Data replication, data partitioning and sharding
• Distributed query processing, distributed join algorithms
• HDFS

Big data processing


• Scale-Agnostic computation: MapReduce Principle
• Distributed Data Processing Frameworks (Apache Spark)
• Lazy evaluation in Apache Spark
• Data Stream Processing, notions of time, window processing (Kafka, Apache Flink)

NoSQL
• NoSQL background and classification
• Key-Value Stores, data model, querying (Dynamo & Cassandra)
• (Distributed) Column Stores, data model, querying (HBASE)
• Document Stores, data model, querying (MongoDB)
What is the role of a buffer manager in a DBMS?

Estimate the minimum and maximum storage costs for a given schema.

Differences between row and column stores? When use which?

When are indexes good, when are they bad? Role for querying?

Explain the differences / access costs of B+-Tree and Hash indices.

Given a database schema and a workload specification, suggest a set of suitable indexes to improve
the performance of the system.

Explain your index choice; classify some indexes.

Determine the number of runs for a n-way sort-merge over X tuples.

Compare the costs of nested-loops join, sort-merge-join and hash join.

Which limitations do the different join algorithms have?

What role does sorting play for different query execution algorithms?

What are the costs of a table scan versus an (clustered/unclustered) index scan?

Briefly explain cost-based query optimization. What is the goal?

What role do database statistics play in query optimization?

How do they affect, e.g., join orders?

Draw the best (left-deep) query plan for the following SQL query. Why left-deep?

Give an SQL query, find a good query execution plan for it. Explain your choices.

What is the meaning of the CAP theorem?

Suggest a partitioning strategy for a given scenario

Explain one of the different data replication algorithms.

How is a distributed join algorithm executed for a given data set?

How does HDFS follow the distributed data processing principles covered in this lecture?

When would you use a DBMS, when MapReduce or Spark/Flink?

Given a Spark program, in which tasks or stages will it be executed?

Role of lazy evaluation in Spark.

What is the difference between batch and stream processing?

What is the difference between Amazon Dynamo, HBASE and MongoDB?

What is a key-value store?

What kind of queries can you answer with a column store? Which ones with a document store?

How does the NoSQL system X follow the CAP theorem?

You might also like