SystemaForDataAnalytics Regular HO
SystemaForDataAnalytics Regular HO
Course Description
Course Objectives
CO1 Introduce students to a systems perspective of data analytics: to leverage systems effectively,
understand, measure, and improve performance while performing data analytics tasks
CO2 Enable students to develop a working knowledge of how to use parallel and distributed systems
for data analytics
CO3 Enable students to apply best practices in storing and retrieving data for analytics
CO4 Enable students to leverage commodity infrastructure (such as scale-out clusters, distributed data-
stores, and the cloud) for data analytics.
Text Book(s)
T1 Kai Hwang, Geoffrey Fox, and Dongarra. Distributed Computing and Cloud
Computing. Morgan Kauffman
T2
# Topics
1 Introduction to Data Engineering
1.1 Systems Attributes for Data Analytics - Single System
Storage for Data: Structured Data (Relational Databases) , Semi-structured data (Object
Stores), Unstructured Data (file systems)
Processing: In-memory vs. (from) secondary storage vs. (over the) network
Storage Models and Cost: Memory Hierarchy, Access costs, I/O Costs (i.e. number of disk
blocks accessed);
Locality of Reference: Principle, examples
Impact of Latency: Algorithms and data structures that leverage locality, data organization
on disk for better locality
1.2 Systems Attributes for Data Analytics - Parallel and Distributed Systems
Storing data in parallel and distributed systems: Shared Memory vs. Message Passing
Memory Hierarchy in Parallel Systems: Shared memory access and memory contention;
shared data access and mutual exclusion
Memory Hierarchy in Distributed Systems: In-node vs. over the network latencies, Locality,
Communication Cost
2 Systems Architecture for Data Analytics
2.1 Introduction to Systems Architecture
Parallel Architectures and Programming Models: Flynn’s Taxonomy (SIMD, MISD, MIMD)
and Parallel Programming (SPMD, MPSD, MPMD)
Parallel Processing Models:, {Data, Task, and Request}-Parallelism;
Mapping: Data Parallel - SPMD, Task Parallel - MPMD, Request Parallel - Services/
Cloud,
Client-Server vs. Peer-to-Peer models of distributed Computing.
Parallel vs. Distributed Systems: Shared Memory vs. Distributed Memory (i.e. message
passing)
Motivation for distributed systems (large size, easy scalability, cost-benefit)
Map-reduce model: Examples (of map, reduce, map-reduce combinations, Iterative map-
reduce)
Batch processing vs. Online Processing; Streaming - Systems-level understanding (input-
output, memory model, constraints)
Master-Slave Processing: Implications for speedup and communication cost
● Partitioning vs. Replication and Communication vs. Locality for Data Mining
algorithms like k-means, DBSCAN, Nearest Neighbor
● Using data structures (such as kd-trees) for partitioning)
● Matrices and Locality - Row-major vs. Column major vs. Blocking in distributed
context
Learning Outcomes:
No Learning Outcomes
LO1 [to be done ]
LO2 [to be done ]
LO3 [to be done ]
LO4 [to be done ]
Academic Term
Course Title Systems for Data analytics
Course No DSE* ZG517
Lead Instructor Prof. Anindya Neogi
Course Contents
# The above contact hours and topics can be adapted for non-specific and specific WILP programs
depending on the requirements and class interests.
Evaluation Scheme
Legend: EC = Evaluation Component
No Name Type Duration Weight Day, Date, Session, Time
Assignment-1 Take Home 12 To be announced
EC-1 Best out of 2 Quizes Take Home 5 To be announced
Assignment-II Take Home 13 To be announced
EC-2 Mid-Semester Test Open Book 90 Min 30 To be announced
EC-3 Comprehensive Exam Open Book 120 Min 40 To be announced
Note - Evaluation components can be tailored depending on the proposed model.
Important Information
Syllabus for Mid-Semester Test (Open Book): Topics in Weeks 1-7
Syllabus for Comprehensive Exam (Open Book): All topics given in plan of study
Evaluation Guidelines:
1. EC-1 consists of two Assignments and a Quiz. Announcements regarding the same will be made in a
timely manner.
2. For Closed Book tests: No books or reference material of any kind will be permitted. Laptops/
Mobiles of any kind are not allowed. Exchange of any material is not allowed.
3. For Open Book exams: Use of prescribed and reference text books, in original (not photocopies) is
permitted. Class notes/slides as reference material in filed or bound form is permitted. However,
loose sheets of paper will not be allowed. Use of calculators is permitted in all exams. Laptops/
Mobiles of any kind are not allowed. Exchange of any material is not allowed.
4. If a student is unable to appear for the Regular Test/Exam due to genuine exigencies, the student
should follow the procedure to apply for the Make-Up Test/Exam. The genuineness of the reason for
absence in the Regular Exam shall be assessed prior to giving permission to appear for the Make-up
Exam. Make-Up Test/Exam will be conducted only at selected exam centres on the dates to be
announced later.
It shall be the responsibility of the individual student to be regular in maintaining the self-study schedule as
given in the course handout, attend the lectures, and take all the prescribed evaluation components such as
Assignment/Quiz, Mid-Semester Test and Comprehensive Exam according to the evaluation scheme
provided in the handout.