0% found this document useful (0 votes)

340 views7 pages

Cse 511

This document outlines a course on scalable data processing. The course covers topics such as efficient query processing, indexing structures, distributed database design, parallel query execution, concurrency control, NoSQL database systems, data management in cloud computing and MapReduce environments. Students will learn to perform queries and analytics tasks in database systems, design distributed and parallel databases, and perform scalable data processing in cloud computing environments. The course consists of lectures, assignments, projects and a final exam. Required skills include programming knowledge and a basic understanding of computer science topics. The course aims to equip students to differentiate data models, apply techniques for distributed databases, and utilize cloud-based systems for specified cases.

Uploaded by

Ioana

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

340 views7 pages

Cse 511

Uploaded by

Ioana

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Scalable Data Processing

(CSE 511)
Note: Below outline is subject to modifications and updates.

About this Course

Database systems are used to provide convenient access to disk-resident data through efficient
query processing, indexing structures, concurrency control, and recovery. T his course delves
into new frameworks for processing and generating large-scale datasets with parallel and
distributed algorithms, covering the design, deployment and use of state-of-the-art data
processing systems, which provide scalable access to data.

Specific topics covered include:

yy Efficient query processing yy Data management in cloud

yy Indexing structures computing environments
yy Distributed database design yy Data management in Map/Reduce-based
yy Parallel query execution yy NoSQL database systems
yy Concurrency control in distributed parallel
database systems

Learning Outcomes

Learners completing this course will be able to:

yyDifferentiate among major data models such as relational, spatial, and NoSQL
yyPerform queries (e.g., SQL) and analytics tasks in state-of-the-art database systems
yyApply leading-edge techniques to design/tune distributed and parallel database systems
yyUtilize existing NoSQL database systems as appropriate for specified cases
yyPerform database operations (e.g., selection, projection, join, and groupby) in state-of-the-art
cluster computing systems such as Hadoop/Spark
yyPerform scalable data processing operations (e.g., selection, projection, join, and groupby) in
cloud computing environments, including Amazon AWS

Scalable Data Processing

Lead: Mohamed Sarwat, Ph.D. | Updated 12/28/2017 1
Projects
yyProject 1: Movie Recommendation Database
yyProject 2: Distributed Movie Recommendation Database
yyProject 3: Location-Aware Twitter Analytics
yyProject 4: Spatial Data Processing using Apache Spark
yyProject 5: SQL queries on Amazon EC2

Course Content
Instruction Assessments
yy Video Lectures yy Practice activities and quizzes (auto-graded)
yy Other Videos yy Practice assignments (instructor-
yy Readings or peer-reviewed)
yy Interactive Learning Objects yy Team and/or individual project(s)
(instructor-graded)
yy Live office hours
yy Final exam (graded)
yy Webinars

Estimated Workload/Time Commitment Per Week

Approximately 9 hours per week

Required Prior Knowledge and Skills

yy Basic statistics and computer science knowledge including computer organization and
architecture, discrete mathematics, data structures, and algorithms
yy Knowledge of high-level programming languages (e.g., C++, Java) and scripting
language (e.g., Python)

Technology Requirements

Hardware
yy Standard with major OS

Software and Other

yy To complete course projects, some of the following software may be required: Amazon AWS
yy Cloud, Hadoop/Spark, GitHub, PostgreSQL, MongoDB, Neo4j.

Scalable Data Processing

Lead: Mohamed Sarwat, Ph.D. | Updated 12/28/2017 2
Course Outline
Unit 1: Basic Data Processing Concepts

Learning Objectives
1.1: Explain Data Models and Data processing concepts
1.2: Utilize Relational Model and Relational Algebra
1.3: Utilize SQL query language
• Unit Introduction
• Module 1: Big Data and Data Processing
• Introduction to Data and Data Processing
• Database Management Systems
• Data Models
• Module 2: Basic Data Concepts
• Database Systems - What and Why?
• Database Management Systems
• Data Model
• Database Design: Entity Relationship Model to Relational Model
• Entity Relational Model
• ER to Relational Model
• Assignment: Create a Movie Database
• Relational Model and Relational Algebra
• Relational Data Model
• Relational Algebra: Query Language
• Query Language: Union
• Query Language: Difference
• Query Language: Cartesian Product
• Query Language: Selection
• Query Language: Projection
• Query Language: Intersection
• Query Language: 0-Join
• SQL Query Language:
• Part 1: SQL Query Language
• Part 2: SQL Query Language
• Assignment: SQL Query for Movie Recommendation

Scalable Data Processing

Lead: Mohamed Sarwat, Ph.D. | Updated 12/28/2017 3
Unit 2: Data Storage and Indexing

Learning Objectives
2.1 Recognize major data storage layouts
2.2 Identify major indexing schemes in Database Systems
• Unit Introduction
• Module 1: Major Storage Layouts
• Introduction to Data Storage
• Alternative File Organizations
• Module 2: Major Indexing Schemes in Database Systems
• Hash-based Indexes
• Index Classification

Unit 3: Transactions and Recovery

Learning Objectives
3.1 Examine the ACID properties
3.2 Explain Transactions and Concurrency Control concepts
3.3 Describe how recovery from failures happens in database systems
• Unit Introduction
• Module 1: ACID Properties
• Principles of Transactions: ACID Properties
• Module 2: Concurrency Control Concepts
• Concurrency Control
• Module 3: Lock-based Concurrency Control and Recovery from Failures
• Lock-Based Concurrency Control
• Database Recovery

Unit 4: Principles of Distributed and Parallel Database Systems

Learning Objectives
4.1 Describe data fragmentation and replication models
4.2 Describe the components of a distributed database
4.3. Apply skills learned to complete an assignment using data partitioning
• Unit Introduction
• Module 1: Distributed Databases: Why, What?
• Why Distribution?
• Module 2: Data Fragmentation and Replication Model
• Introduction to Fragmentation
• Introduction to Replication
• Assignment: Data Fragmentation

Scalable Data Processing

Lead: Mohamed Sarwat, Ph.D. | Updated 12/28/2017 4
• Module 3: Advanced Distributed Database Systems
• Query Processing and Optimization in Distributed Databases
• Distributed Query Processing
• Total Cost of Query Execution Plan
• Assignment: Query Processing
• Module 4: Parallel Database Systems
• Parallel Data Architecture
• Introduction to Parallel DBMS
• The Different Types of DBMS Parallelism
• Parallel Sorting and Joins
• Assignment: Parallel Sort and Joins

Unit 5: NoSQL Database Systems

Learning Objectives
• Unit Introduction
• Module 1: NoSQL Database Systems
• Key-Value Stores
• Graph Databases
• Document Databasesy
• Module 2: Big Data Analytics Systems
• Intro Map-Reduce / Spark
• Data Analytics in Map-Reduce / Spark
• Graph Processing Engines
• Module 3: Data Processing on Modern HW

PROJECT: Distributed Movie Recommendation Database

Unit 6: Big Data Tools

PROJECT: Location-Aware Twitter Analytics

PROJECT: Spatial Data Processing using Apache Spark

Scalable Data Processing

Lead: Mohamed Sarwat, Ph.D. | Updated 12/28/2017 5
Unit 7: Additional Tools Used for Data Visualization

Learning Objectives
7.1 Explain data processing in the cloud
7.2 Evaluate service models
7.3 Evaluate deployment models
• Unit Introduction
• Module 1: Introduction to Cloud Computing
• Introduction to Cloud Computing
• Module 2: Service Models
• Service Models
• Module 3: Deployment Models
• Deployment Models

Unit 8: Cloud-based Data Management

Learning Objectives
8.1 Explain AWS
• Unit Introduction
• Module 1: Amazon Web Services
• Introduction to Amazon Web Services
• AWS Computing
• AWS Storage
• AWS Queueing Services
• Module 2: Build an Elastic Cloud Application
• AWS Interfaces
• Auto-Scaling
• Module 3: Build a MapReduce Cloud Application
• Scalable Data Processing
• AWS Security

PROJECT: SQL queries on Amazon EC2

Scalable Data Processing

Lead: Mohamed Sarwat, Ph.D. | Updated 12/28/2017 6
Creators
Established in Tempe in 1885, Arizona State University (ASU) has developed a new model
for the American Research University, creating an institution that is committed to access,
excellence and impact.

As the prototype for a New American University, ASU pursues research that contributes to the
public good, and ASU assumes major responsibility for the economic, social and cultural vitality
of the communities that surround it. Recognizing the university’s groundbreaking initiatives,
partnerships, programs and research, U.S. News and World Report has named ASU as the
most innovative university all three years it has had the category.

The innovation ranking is due at least in part to a more than 80 percent improvement in ASU’s
graduation rate in the past 15 years, the fact that ASU is the fastest-growing research university
in the country and the emphasis on inclusion and student success that has led to more than 50
percent of the school’s in-state freshman coming from minority backgrounds.

Mohamed Sarwat is an Assistant Professor of Computer Science and the director of the
Data Systems (DataSys) lab at Arizona State University (ASU). He is also an affiliate member
of the Center for Assured and Scalable Data Engineering (CASCADE). Before joining ASU,
Mohamed obtained his MSc and PhD degrees in computer science from the University of
Minnesota. His research interest lies in the broad area of data management systems.

Ming Zhao is an associate professor of the ASU School of Computing, Informatics, and
Decision Systems Engineering. Before joining ASU, he was an associate professor of the
School of Computing and Information Sciences (SCIS) at Florida International University.
He directs the Research Laboratory for Virtualized Infrastructure, Systems, and Applications
(VISA). His research interests are in distributed/cloud computing, big data, high-performance
computing, autonomic computing, virtualization, storage systems and operating systems.

Scalable Data Processing

Lead: Mohamed Sarwat, Ph.D. | Updated 12/28/2017 7

MIM Advanced Databases Outline
No ratings yet
MIM Advanced Databases Outline
4 pages
CS3492 Database Management Systems Lecture Notes 2
100% (1)
CS3492 Database Management Systems Lecture Notes 2
170 pages
ISO 9001 Quality Manual
0% (1)
ISO 9001 Quality Manual
29 pages
CBLM Final
75% (4)
CBLM Final
56 pages
2.fall 23 Lecture2QualityMetrics
No ratings yet
2.fall 23 Lecture2QualityMetrics
62 pages
Power Line Carrier
No ratings yet
Power Line Carrier
15 pages
Instruction Format 8086
No ratings yet
Instruction Format 8086
15 pages
Lesson7-Advanced Function Blocks
No ratings yet
Lesson7-Advanced Function Blocks
6 pages
Web Application Penetration Testing - Final Project
No ratings yet
Web Application Penetration Testing - Final Project
50 pages
USIT304 Database Management Systems
No ratings yet
USIT304 Database Management Systems
222 pages
Cse 551 Mcs
No ratings yet
Cse 551 Mcs
6 pages
DA Full
No ratings yet
DA Full
738 pages
Electromotive XDI-V1.6 Electronic ECU
100% (1)
Electromotive XDI-V1.6 Electronic ECU
35 pages
Big Data Analytics-Digital Notes
No ratings yet
Big Data Analytics-Digital Notes
86 pages
Bda - Digital Notes
No ratings yet
Bda - Digital Notes
85 pages
DBMS
No ratings yet
DBMS
251 pages
Scada System
No ratings yet
Scada System
18 pages
RTR Bharti Shine Role
No ratings yet
RTR Bharti Shine Role
535 pages
DBMS
No ratings yet
DBMS
95 pages
Plasturi PT Durere IceWave Propusi PT Premiul Nobel! (Engleza)
100% (1)
Plasturi PT Durere IceWave Propusi PT Premiul Nobel! (Engleza)
2 pages
Lecture 1 - Introduction
No ratings yet
Lecture 1 - Introduction
94 pages
Software Analysis and Design - Syllabus
No ratings yet
Software Analysis and Design - Syllabus
11 pages
plugin-TM722 User Guide
No ratings yet
plugin-TM722 User Guide
73 pages
Media Docs Uploads Owners-Manual Document-Document M40-50D2 OM2016 en
No ratings yet
Media Docs Uploads Owners-Manual Document-Document M40-50D2 OM2016 en
90 pages
Wa0009.
No ratings yet
Wa0009.
88 pages
Microsoft Word - B.Tech. - 3rd - Yr - CSE (DS) - 2022 - 23
No ratings yet
Microsoft Word - B.Tech. - 3rd - Yr - CSE (DS) - 2022 - 23
43 pages
Swru295e-Sub-1 GHZ RF TransceiversTransmitter
No ratings yet
Swru295e-Sub-1 GHZ RF TransceiversTransmitter
111 pages
DB Lab Manuals
No ratings yet
DB Lab Manuals
87 pages
Teaching Scheme and Syllabus of M.Tech AI - DS-04-03-2023
No ratings yet
Teaching Scheme and Syllabus of M.Tech AI - DS-04-03-2023
50 pages
Lect01-Annotated DB
No ratings yet
Lect01-Annotated DB
31 pages
Manual GEM GSM-19 PDF
No ratings yet
Manual GEM GSM-19 PDF
149 pages
B561 Advanced Database Concepts: 0 Introduction
No ratings yet
B561 Advanced Database Concepts: 0 Introduction
53 pages
WebNavigatorInformationSystem en-US PDF
No ratings yet
WebNavigatorInformationSystem en-US PDF
206 pages
MCA Syllabus
No ratings yet
MCA Syllabus
76 pages
SuperbHIDDENGEM #Optiemus Infra
No ratings yet
SuperbHIDDENGEM #Optiemus Infra
25 pages
v7 Conf Heft
No ratings yet
v7 Conf Heft
39 pages
Introduction To Dbms
No ratings yet
Introduction To Dbms
37 pages
001-2023-0921 DLMDSBDT01 Course Book
No ratings yet
001-2023-0921 DLMDSBDT01 Course Book
124 pages
CS3492 DBMS Notes
100% (1)
CS3492 DBMS Notes
165 pages
LFS101x Course Syllabus
No ratings yet
LFS101x Course Syllabus
11 pages
DBMS FPP
No ratings yet
DBMS FPP
20 pages
Protocols II
No ratings yet
Protocols II
73 pages
Department of Computer Science 2016-2017: Graduate Student Handbook
No ratings yet
Department of Computer Science 2016-2017: Graduate Student Handbook
52 pages
Course Plan - IMS
No ratings yet
Course Plan - IMS
10 pages
M.SC - II Sem - Curriculum and Syllabus.
No ratings yet
M.SC - II Sem - Curriculum and Syllabus.
10 pages
Att DSTA00011851 - Visco Damper Rev6 - e
No ratings yet
Att DSTA00011851 - Visco Damper Rev6 - e
9 pages
MCA 2nd Sem Detailed Syllabus
No ratings yet
MCA 2nd Sem Detailed Syllabus
14 pages
Course Syllabus and Schedule/Map - Fall 2020 (Session A) : CSE 551: Foundations of Algorithms
No ratings yet
Course Syllabus and Schedule/Map - Fall 2020 (Session A) : CSE 551: Foundations of Algorithms
17 pages
AZM300 Manual
No ratings yet
AZM300 Manual
12 pages
H9 222L Series FTTH Catv Optical Receiver Technical Specification
No ratings yet
H9 222L Series FTTH Catv Optical Receiver Technical Specification
11 pages
PG Diploma in Data Analytics2024
No ratings yet
PG Diploma in Data Analytics2024
15 pages
DB Station DRT PVC EN
No ratings yet
DB Station DRT PVC EN
2 pages
CSE - 578 - Syllabus - Summer-C-2020 Data Visualization
No ratings yet
CSE - 578 - Syllabus - Summer-C-2020 Data Visualization
13 pages
CSE5003 - DAT ABA Se Syste MS: DES IGN A ND I M PLE Ment Atio N L, T, P, J, C 2,0,2,4,4
No ratings yet
CSE5003 - DAT ABA Se Syste MS: DES IGN A ND I M PLE Ment Atio N L, T, P, J, C 2,0,2,4,4
9 pages
Summer 2020 (Session C) CSE 548: Advanced Computer Network Security
No ratings yet
Summer 2020 (Session C) CSE 548: Advanced Computer Network Security
15 pages
Business Innovation Unit Plan Consult
No ratings yet
Business Innovation Unit Plan Consult
15 pages
Web Development PHP
No ratings yet
Web Development PHP
9 pages
CSE2004 - DATABASE-MANAGEMENT-SYSTEMS - ETH - 1.0 - 0 - CSE2004 Database Management System PDF
No ratings yet
CSE2004 - DATABASE-MANAGEMENT-SYSTEMS - ETH - 1.0 - 0 - CSE2004 Database Management System PDF
14 pages
Database Management System: CSMI14
No ratings yet
Database Management System: CSMI14
24 pages
Sport Concussion Assessment System Project: Purpose
No ratings yet
Sport Concussion Assessment System Project: Purpose
9 pages
Puranmal Lahoti Government Polytechnic Latur: Name of The Students
No ratings yet
Puranmal Lahoti Government Polytechnic Latur: Name of The Students
11 pages
MCA NEW Syllbus (NEP2020) - Updated With BIG DATA Analytics
No ratings yet
MCA NEW Syllbus (NEP2020) - Updated With BIG DATA Analytics
17 pages
Course Code CSE3001 CT C LTP 4 Prerequisite: Objectives
No ratings yet
Course Code CSE3001 CT C LTP 4 Prerequisite: Objectives
7 pages
IA Big Data Lab Works
No ratings yet
IA Big Data Lab Works
7 pages
CSE - Database Management Systems
No ratings yet
CSE - Database Management Systems
17 pages
Introduction To CS 4604: Zaki Malik August 26, 2007
No ratings yet
Introduction To CS 4604: Zaki Malik August 26, 2007
17 pages
Case Study About Database Tools
No ratings yet
Case Study About Database Tools
13 pages
Datasheet HWT-D2152-10-SIU
No ratings yet
Datasheet HWT-D2152-10-SIU
6 pages
CSE 460 - Syllabusf23
No ratings yet
CSE 460 - Syllabusf23
4 pages
IE494 - Big - Data - Processing - Course - File - Autumn24 - PMJ - PM Jat
No ratings yet
IE494 - Big - Data - Processing - Course - File - Autumn24 - PMJ - PM Jat
5 pages
00 Introduction
No ratings yet
00 Introduction
20 pages
Mustang Coyote 10rib R11
No ratings yet
Mustang Coyote 10rib R11
9 pages
User Management Module
No ratings yet
User Management Module
3 pages
ET472 Datamanagementandanalytics
No ratings yet
ET472 Datamanagementandanalytics
4 pages
Learning Resources - Ok
No ratings yet
Learning Resources - Ok
4 pages
vs121 vs121 P Datasheet en
No ratings yet
vs121 vs121 P Datasheet en
4 pages
Course Info Dbms
No ratings yet
Course Info Dbms
6 pages
Step 2. Creating A Model Animation: Draw An Oval To Depict The ATM
No ratings yet
Step 2. Creating A Model Animation: Draw An Oval To Depict The ATM
7 pages
Course Handout
No ratings yet
Course Handout
4 pages
CS 3492 DBMS
No ratings yet
CS 3492 DBMS
2 pages
Database System Implementation
No ratings yet
Database System Implementation
16 pages
Syllabus DATABASE MANAGEMENT SYSTEMS
No ratings yet
Syllabus DATABASE MANAGEMENT SYSTEMS
2 pages
DBMSC 03 Co 4 NOtes
No ratings yet
DBMSC 03 Co 4 NOtes
3 pages
CS 3492 DBM
No ratings yet
CS 3492 DBM
2 pages
Fintech Sybcom - SQL Syllabus
No ratings yet
Fintech Sybcom - SQL Syllabus
3 pages
Fundamentals of Database System Course Outline
No ratings yet
Fundamentals of Database System Course Outline
3 pages
Feasi Info
No ratings yet
Feasi Info
5 pages
Java Multithreading
No ratings yet
Java Multithreading
17 pages
Data Warehousing and Data Mining
No ratings yet
Data Warehousing and Data Mining
6 pages
BCSC 0034
No ratings yet
BCSC 0034
2 pages
422cit03 DBMS
No ratings yet
422cit03 DBMS
3 pages
DBMS Syllabus
No ratings yet
DBMS Syllabus
2 pages
SEM VII BDA Syllabus Theory
No ratings yet
SEM VII BDA Syllabus Theory
4 pages
Foundations of Algorithms Prerequisite Knowledge Review Quiz
No ratings yet
Foundations of Algorithms Prerequisite Knowledge Review Quiz
2 pages
Big Data Analytics Comp Syllabus Sem7
No ratings yet
Big Data Analytics Comp Syllabus Sem7
4 pages
CSE511 CourseBrief
No ratings yet
CSE511 CourseBrief
2 pages
Directions:: CSE 598: Advanced Software Analysis and Design Online Shopping Store Submission
No ratings yet
Directions:: CSE 598: Advanced Software Analysis and Design Online Shopping Store Submission
3 pages
1 Lecture Plan - ADBMS - DR
No ratings yet
1 Lecture Plan - ADBMS - DR
1 page
Syllabus - Compre
No ratings yet
Syllabus - Compre
2 pages
Syllabus E63 Spring2016-2
No ratings yet
Syllabus E63 Spring2016-2
3 pages
How To Schedule and Take ProctorU Exams
No ratings yet
How To Schedule and Take ProctorU Exams
2 pages
CS8492 - DBMS Syll
No ratings yet
CS8492 - DBMS Syll
1 page
CP7202
No ratings yet
CP7202
1 page
Plasturi Silent Nights Brosura / Propusi PT Premiul Nobel! (Engleza)
No ratings yet
Plasturi Silent Nights Brosura / Propusi PT Premiul Nobel! (Engleza)
2 pages
Protocoale Plasturi Lifewave / Afectiuni Vindecate Cu Plasturi Lifewave !
No ratings yet
Protocoale Plasturi Lifewave / Afectiuni Vindecate Cu Plasturi Lifewave !
1 page
Transiting To A Student-Managed Maker Space
No ratings yet
Transiting To A Student-Managed Maker Space
9 pages

Cse 511

Uploaded by

Cse 511

Uploaded by

Scalable Data Processing

About this Course

Specific topics covered include:

yy Efficient query processing yy Data management in cloud

Learners completing this course will be able to:

Scalable Data Processing

Estimated Workload/Time Commitment Per Week

Required Prior Knowledge and Skills

Software and Other

Scalable Data Processing

Scalable Data Processing

Unit 3: Transactions and Recovery

Unit 4: Principles of Distributed and Parallel Database Systems

Scalable Data Processing

Unit 5: NoSQL Database Systems

PROJECT: Distributed Movie Recommendation Database

Unit 6: Big Data Tools

PROJECT: Location-Aware Twitter Analytics

Scalable Data Processing

Unit 8: Cloud-based Data Management

PROJECT: SQL queries on Amazon EC2

Scalable Data Processing

Scalable Data Processing

You might also like