0% found this document useful (0 votes)

195 views2 pages

Checklist For DATA3404

The document provides an exam revision checklist covering various topics in database management systems including storage layer mechanisms, indexing techniques, query execution, optimization, distributed data management, big data processing, and NoSQL systems. Specifically, it discusses buffer management in storage, indexing methods like B+ trees and hashing, query processing steps and physical operators, cost-based optimization and statistics, distributed system architectures, MapReduce and Spark frameworks, and various NoSQL data models.

Uploaded by

Abdullah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

195 views2 pages

Checklist For DATA3404

Uploaded by

Abdullah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

Exam Revision Checklist:

Storage layer
• DBMS Storage Hierarchy
• Buffer Manager
o Buffer replacement policies/pinning of pages
• Disk Storage organisation
o Column vs row store and page/record layouts

Indexing
• B+ Tree
• Static and Dynamic hashing
• Bitmap Indexes
• Index classification
• Database tuning using indexing

Query Execution
• Query processing steps
o Pipelining vs materialization
• Relation algebra expression and query execution plans
• Physical operator algorithms: Join algos, external sorting…

Query Optimization
• Basic query optimization steps
• Heuristic query optimizations (algebraic query transformation, equivalent RA expressions)
• Cost-based query optimization
• Role of statistics

Distributed data management

• Distributed system architectures, CAP Theorem
• Data replication, data partitioning and sharding
• Distributed query processing, distributed join algorithms
• HDFS

Big data processing

• Scale-Agnostic computation: MapReduce Principle
• Distributed Data Processing Frameworks (Apache Spark)
• Lazy evaluation in Apache Spark
• Data Stream Processing, notions of time, window processing (Kafka, Apache Flink)

NoSQL
• NoSQL background and classification
• Key-Value Stores, data model, querying (Dynamo & Cassandra)
• (Distributed) Column Stores, data model, querying (HBASE)
• Document Stores, data model, querying (MongoDB)
What is the role of a buffer manager in a DBMS?

Estimate the minimum and maximum storage costs for a given schema.

Differences between row and column stores? When use which?

When are indexes good, when are they bad? Role for querying?

Explain the differences / access costs of B+-Tree and Hash indices.

Given a database schema and a workload specification, suggest a set of suitable indexes to improve
the performance of the system.

Explain your index choice; classify some indexes.

Determine the number of runs for a n-way sort-merge over X tuples.

Compare the costs of nested-loops join, sort-merge-join and hash join.

Which limitations do the different join algorithms have?

What role does sorting play for different query execution algorithms?

What are the costs of a table scan versus an (clustered/unclustered) index scan?

Briefly explain cost-based query optimization. What is the goal?

What role do database statistics play in query optimization?

How do they affect, e.g., join orders?

Draw the best (left-deep) query plan for the following SQL query. Why left-deep?

Give an SQL query, find a good query execution plan for it. Explain your choices.

What is the meaning of the CAP theorem?

Suggest a partitioning strategy for a given scenario

Explain one of the different data replication algorithms.

How is a distributed join algorithm executed for a given data set?

How does HDFS follow the distributed data processing principles covered in this lecture?

When would you use a DBMS, when MapReduce or Spark/Flink?

Given a Spark program, in which tasks or stages will it be executed?

Role of lazy evaluation in Spark.

What is the difference between batch and stream processing?

What is the difference between Amazon Dynamo, HBASE and MongoDB?

What is a key-value store?

What kind of queries can you answer with a column store? Which ones with a document store?

How does the NoSQL system X follow the CAP theorem?

An Introduction To Database Systems Bipin C.desaI
No ratings yet
An Introduction To Database Systems Bipin C.desaI
849 pages
Dbms Cheat Sheet
100% (5)
Dbms Cheat Sheet
5 pages
Barclays Data Engineer Interview Questions
No ratings yet
Barclays Data Engineer Interview Questions
17 pages
G120 Lista de Parametros CU230-2
No ratings yet
G120 Lista de Parametros CU230-2
668 pages
The NOSQL CheatSheet
No ratings yet
The NOSQL CheatSheet
7 pages
Chapter 2 - Routing and Switching
No ratings yet
Chapter 2 - Routing and Switching
21 pages
Customer List
No ratings yet
Customer List
22 pages
Big Data SV Publication
No ratings yet
Big Data SV Publication
142 pages
Peer To Peer File Sharing
No ratings yet
Peer To Peer File Sharing
64 pages
Big Data Analytics - Sem 7 CVMU
No ratings yet
Big Data Analytics - Sem 7 CVMU
4 pages
ICT Microproject
No ratings yet
ICT Microproject
20 pages
Data Engineer
No ratings yet
Data Engineer
19 pages
Glossary Tool
No ratings yet
Glossary Tool
6 pages
HTML Videos and Audio
No ratings yet
HTML Videos and Audio
9 pages
PDF Copa Lesson Plan Sem1 in Doc File DL
No ratings yet
PDF Copa Lesson Plan Sem1 in Doc File DL
119 pages
Data Modeling
No ratings yet
Data Modeling
164 pages
Introduction To Parallel Computing
No ratings yet
Introduction To Parallel Computing
149 pages
Dbms Complete Interview Guide
No ratings yet
Dbms Complete Interview Guide
130 pages
Unit-4 DBMS Merged
No ratings yet
Unit-4 DBMS Merged
156 pages
8085 ALP Five ALP To Count Even or and Odd Data Byte
No ratings yet
8085 ALP Five ALP To Count Even or and Odd Data Byte
5 pages
20cb402 Dbms Unit 3
No ratings yet
20cb402 Dbms Unit 3
83 pages
Adm Full Notes
No ratings yet
Adm Full Notes
74 pages
Microsoft Word - B.Tech. - 3rd - Yr - CSE (DS) - 2022 - 23
No ratings yet
Microsoft Word - B.Tech. - 3rd - Yr - CSE (DS) - 2022 - 23
43 pages
CBN Manual 1936
No ratings yet
CBN Manual 1936
70 pages
Remote User Recognition and Access Provision
No ratings yet
Remote User Recognition and Access Provision
54 pages
B561 Advanced Database Concepts: 0 Introduction
No ratings yet
B561 Advanced Database Concepts: 0 Introduction
53 pages
Intro To RMAN-10g-ok
No ratings yet
Intro To RMAN-10g-ok
41 pages
RDBMS
100% (2)
RDBMS
208 pages
Chapter15 1
No ratings yet
Chapter15 1
43 pages
ACMP 351Nf
No ratings yet
ACMP 351Nf
59 pages
AIS SQL Final
No ratings yet
AIS SQL Final
16 pages
Business Radio Solutions EUR
No ratings yet
Business Radio Solutions EUR
28 pages
NCR SelfServ 28 Datasheet English US
No ratings yet
NCR SelfServ 28 Datasheet English US
2 pages
Toshiba e STUDIO5525AC Brochure
No ratings yet
Toshiba e STUDIO5525AC Brochure
2 pages
Big Data 2023
No ratings yet
Big Data 2023
18 pages
DBMS Imp
No ratings yet
DBMS Imp
32 pages
DBMS
No ratings yet
DBMS
3 pages
SQL Interview Questions
No ratings yet
SQL Interview Questions
3 pages
Adbms Notes
No ratings yet
Adbms Notes
17 pages
Acmp 351
No ratings yet
Acmp 351
33 pages
Query Execution
No ratings yet
Query Execution
25 pages
Course Plan - IMS
No ratings yet
Course Plan - IMS
10 pages
Notification SBI Specialist Cadre Officer Posts
No ratings yet
Notification SBI Specialist Cadre Officer Posts
21 pages
Unit I
No ratings yet
Unit I
11 pages
MCA NEW Syllbus (NEP2020) - Updated With BIG DATA Analytics
No ratings yet
MCA NEW Syllbus (NEP2020) - Updated With BIG DATA Analytics
17 pages
II Year III Semester CSE Core Syllabus
No ratings yet
II Year III Semester CSE Core Syllabus
13 pages
Distibuted System
No ratings yet
Distibuted System
11 pages
Database and Design
No ratings yet
Database and Design
19 pages
Cryptographic Techniques For Data Privacy in Digit
No ratings yet
Cryptographic Techniques For Data Privacy in Digit
19 pages
UNIT-3 Structural Patterns: Intent
No ratings yet
UNIT-3 Structural Patterns: Intent
16 pages
MCA1
No ratings yet
MCA1
9 pages
Question
No ratings yet
Question
6 pages
CSC429 - Assignment - Storage Medium
No ratings yet
CSC429 - Assignment - Storage Medium
9 pages
Agmarknet
No ratings yet
Agmarknet
9 pages
Big Data Analytics Comp Syllabus Sem7
No ratings yet
Big Data Analytics Comp Syllabus Sem7
4 pages
CDE Sample Interview Questions
No ratings yet
CDE Sample Interview Questions
10 pages
DBMS Tutoria1
No ratings yet
DBMS Tutoria1
7 pages
Summary of DataSys
No ratings yet
Summary of DataSys
5 pages
Hadoop Development Download Syllabus PDF
No ratings yet
Hadoop Development Download Syllabus PDF
5 pages
PLC Konversi Bilangan
No ratings yet
PLC Konversi Bilangan
9 pages
B.Tech. CS - CE and CSE Syllabus 3rd Year 2024-25
No ratings yet
B.Tech. CS - CE and CSE Syllabus 3rd Year 2024-25
2 pages
21cs71BDA Question Bank
No ratings yet
21cs71BDA Question Bank
4 pages
GitanjaliJoshi QA 8years
No ratings yet
GitanjaliJoshi QA 8years
3 pages
Imp Mid Sem
No ratings yet
Imp Mid Sem
8 pages
IE494 - Big - Data - Processing - Course - File - Autumn24 - PMJ - PM Jat
No ratings yet
IE494 - Big - Data - Processing - Course - File - Autumn24 - PMJ - PM Jat
5 pages
Database Management Systems
No ratings yet
Database Management Systems
4 pages
Oopc Assignment 3
No ratings yet
Oopc Assignment 3
6 pages
Cheat Sheet v4
No ratings yet
Cheat Sheet v4
3 pages
19ECS442: BIG DATA Question Bank
No ratings yet
19ECS442: BIG DATA Question Bank
4 pages
Dbms High Odds Questions Full Terms
No ratings yet
Dbms High Odds Questions Full Terms
3 pages
1 Marks
No ratings yet
1 Marks
6 pages
Database Management Systems
No ratings yet
Database Management Systems
4 pages
CS 6675 2025-1
No ratings yet
CS 6675 2025-1
5 pages
DBMS
No ratings yet
DBMS
3 pages
DBMS Topics Detailed Explanation
No ratings yet
DBMS Topics Detailed Explanation
3 pages
Dbms Topic
No ratings yet
Dbms Topic
3 pages
Database Topics
No ratings yet
Database Topics
3 pages
PCB Artist User Tips Guide
No ratings yet
PCB Artist User Tips Guide
10 pages
Bad601 Simp Q
No ratings yet
Bad601 Simp Q
4 pages
Scrivener Keyboard Shortcuts
No ratings yet
Scrivener Keyboard Shortcuts
3 pages
T 8 TVIV3 SFX
No ratings yet
T 8 TVIV3 SFX
2 pages
Data Models: Preface XV
No ratings yet
Data Models: Preface XV
8 pages
Lt. Colonel Khalid Mahmood: (Retd)
No ratings yet
Lt. Colonel Khalid Mahmood: (Retd)
3 pages
DBDD Assignment H01309688 CRT2
No ratings yet
DBDD Assignment H01309688 CRT2
2 pages
Meraki Wi-Fi6 Indoor and Outdoor AP V2
No ratings yet
Meraki Wi-Fi6 Indoor and Outdoor AP V2
2 pages
Mastering Elasticsearch 5.x - Third Edition
From Everand
Mastering Elasticsearch 5.x - Third Edition
Bharvi Dixit
3/5 (1)
HBase Configuration and Operations: Definitive Reference for Developers and Engineers
From Everand
HBase Configuration and Operations: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Mastering Elasticsearch - Second Edition
From Everand
Mastering Elasticsearch - Second Edition
Rafał Kuć
No ratings yet
Elasticsearch Server: Second Edition
From Everand
Elasticsearch Server: Second Edition
Rafał Kuć
No ratings yet
Redshift Essentials: Definitive Reference for Developers and Engineers
From Everand
Redshift Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet

Checklist For DATA3404

Uploaded by

Checklist For DATA3404

Uploaded by

Exam Revision Checklist:

Distributed data management

Big data processing

Differences between row and column stores? When use which?

Explain the differences / access costs of B+-Tree and Hash indices.

Explain your index choice; classify some indexes.

Determine the number of runs for a n-way sort-merge over X tuples.

Compare the costs of nested-loops join, sort-merge-join and hash join.

Which limitations do the different join algorithms have?

Briefly explain cost-based query optimization. What is the goal?

What role do database statistics play in query optimization?

How do they affect, e.g., join orders?

What is the meaning of the CAP theorem?

Suggest a partitioning strategy for a given scenario

Explain one of the different data replication algorithms.

How is a distributed join algorithm executed for a given data set?

When would you use a DBMS, when MapReduce or Spark/Flink?

Given a Spark program, in which tasks or stages will it be executed?

Role of lazy evaluation in Spark.

What is the difference between batch and stream processing?

What is the difference between Amazon Dynamo, HBASE and MongoDB?

What is a key-value store?

How does the NoSQL system X follow the CAP theorem?

You might also like