0% found this document useful (0 votes)

40 views10 pages

SystemaForDataAnalytics Regular HO

The document outlines a course on systems for data analytics, including course objectives to introduce students to leveraging systems effectively for data analytics tasks and developing knowledge of using parallel and distributed systems for data analytics. The course content is divided into modular topics that cover systems attributes, architectures, performance, data storage and organization, and distributed data processing. References are provided for further reading on related concepts around distributed computing, cloud computing, data storage management, and distributed data processing systems.

Uploaded by

Roma Thakare

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

40 views10 pages

SystemaForDataAnalytics Regular HO

Uploaded by

Roma Thakare

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

BIRLA INSTITUTE OF TECHNOLOGY & SCIENCE, PILANI

WORK INTEGRATED LEARNING PROGRAMMES

COURSE HANDOUT

Part A: Content Design

Course Title Systems for Data Analytics

Course No(s) DSE* ZG517
Credit Units 5
Course Author Prof. Shan Balasubramaniam
Version No 1
Date 26 / April / 2019

Course Description

Course Objectives
CO1 Introduce students to a systems perspective of data analytics: to leverage systems effectively,
understand, measure, and improve performance while performing data analytics tasks
CO2 Enable students to develop a working knowledge of how to use parallel and distributed systems
for data analytics
CO3 Enable students to apply best practices in storing and retrieving data for analytics
CO4 Enable students to leverage commodity infrastructure (such as scale-out clusters, distributed data-
stores, and the cloud) for data analytics.
Text Book(s)
T1 Kai Hwang, Geoffrey Fox, and Dongarra. Distributed Computing and Cloud
Computing. Morgan Kauffman
T2

Reference Book(s) & other resources

R1 Nikolas Roman Herbst, Samuel Kounev, Ralf Reussner. Elasticity in cloud computing:
What it is, and what it is not. 10th International Conference on Autonomic Computing
(ICAC ’13). USENIX Association.
R2 Mohammed Alhamad, Tharam Dillon, Elizabeth Chang.Conceptual SLA Framework
for Cloud Computing.4th IEEE International Conference on Digital Ecosystems and
Technologies. April 2010, Dubai, UAE.
R3 Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. The Google File System.
SOSP’03, October 19–22, 2003, Bolton Landing, New York, USA.
R4 Apache CouchDB. Technical Overview. http://docs.couchdb.org/en/stable/intro/
overview.html
R5 Apache CouchDB. Eventual Consistency. http://docs.couchdb.org/en/stable/intro/
consistency.html
R6 Seth Gilbert and Nancy A. Lynch. Perspectives on the CAP Theorem. IEEE
Computer. vol. 45. Issue 2. Feb. 2012
R7 Werner vogels. Eventually Consistent. january 2009. vol. 52. no. 1 Communications
of the acm.
R8 Eric Brewer.CAP Twelve Years Later: How the “Rules” Have Changed. IEEE
Computer. vol. 45. Issue 2. Feb. 2012
R9 M. Burrows, The Chubby Lock Service for Loosely-Coupled Distributed Systems, in:
OSDI’06: Seventh Symposium on Operating System Design and Implementation,
USENIX, Seattle, WA, 2006, pp. 335–350.
R10 MATEI ZAHARIA et. al. Apache Spark: A Unified Engine for Big Data
Processing .COMMUNICATIONS OF THE ACM | NOVEMBER 2016 | VOL. 59 | NO.
11.
R11 YASER MANSOURI, ADEL NADJARAN TOOSI, and RAJKUMAR BUYYA. Data Storage
Management in Cloud Environments:Taxonomy, Survey, and Future Directions . ACM Computing
Surveys, Vol. 50, No. 6, Article 91. December 2017

Modular Content Structure

Modular Structure

# Topics
1 Introduction to Data Engineering
1.1 Systems Attributes for Data Analytics - Single System
Storage for Data: Structured Data (Relational Databases) , Semi-structured data (Object
Stores), Unstructured Data (file systems)
Processing: In-memory vs. (from) secondary storage vs. (over the) network

Storage Models and Cost: Memory Hierarchy, Access costs, I/O Costs (i.e. number of disk
blocks accessed);
Locality of Reference: Principle, examples

Impact of Latency: Algorithms and data structures that leverage locality, data organization
on disk for better locality
1.2 Systems Attributes for Data Analytics - Parallel and Distributed Systems

Motivation for Parallel Processing (Size of data and complexity of processing)

Storing data in parallel and distributed systems: Shared Memory vs. Message Passing

Strategies for data access: Partition, Replication, and Messaging

Memory Hierarchy in Parallel Systems: Shared memory access and memory contention;
shared data access and mutual exclusion
Memory Hierarchy in Distributed Systems: In-node vs. over the network latencies, Locality,
Communication Cost
2 Systems Architecture for Data Analytics
2.1 Introduction to Systems Architecture

Parallel Architectures and Programming Models: Flynn’s Taxonomy (SIMD, MISD, MIMD)
and Parallel Programming (SPMD, MPSD, MPMD)
Parallel Processing Models:, {Data, Task, and Request}-Parallelism;
Mapping: Data Parallel - SPMD, Task Parallel - MPMD, Request Parallel - Services/
Cloud,
Client-Server vs. Peer-to-Peer models of distributed Computing.
Parallel vs. Distributed Systems: Shared Memory vs. Distributed Memory (i.e. message
passing)
Motivation for distributed systems (large size, easy scalability, cost-benefit)

Cluster Computing: Components and Architecture.

2.2 Performance Attributes of Systems

Scalability - Speedup and Amdahl’s Law;
How to apply Amdahl’s Law?
(Relation to Barsis-Gustafson Law?)
Impact of Memory Hierarchy on Performance:
● Shared Memory and Memory Contention
● Communication Cost
● Locality
Reliability (for distributed systems): MTTF and MTTR, Serial vs. Parallel Connections,
Single Point-of-Failure
Building Reliable Systems: Redundancy and Resilience; Failure Models in Distributed
systems: Transient vs. Permanent Failures,
Failure Recovery: Fail-over, Active Fail-over etc
Process Migration
Availability: Calculating Availability; Service Agreements and SLAs

Elasticity: Resilient Performance and Scenarios; Calculating Elasticity; Achieving elasticity

(via resource provisioning and virtualization)

3. Data Storage and Organization for Analytics:

File systems vs. Database systems. Vs. Object Stores

Distributed File Systems - Basic architecture, Case Studies (GFS/HDFS)

Unstructured Databases - Basic architecture, Case Study and Examples (Google

BigTable, CouchDB / MongoDB)
Consistency Models - Weak and Strong Consistency, Eventual Consistency, CAP
Theorem - Result and Implications;
Synchronization: Chubby Locking as a case study.

4. Distributed Data Processing for Analytics

4.1 (Re-)Designing Algorithms for Distributed Systems

Design Strategy: Divide-and-conquer for Parallel / Distributed Systems - Basic scenarios

and Implications
Parallel Programming Pattern: Data-parallel programs, and map as a construct

Parallel Programming Pattern: Tree-parallelism, reduce as a construct

Map-reduce model: Examples (of map, reduce, map-reduce combinations, Iterative map-
reduce)
Batch processing vs. Online Processing; Streaming - Systems-level understanding (input-
output, memory model, constraints)
Master-Slave Processing: Implications for speedup and communication cost

4.2 Distributed Data Analytics

● Partitioning vs. Replication and Communication vs. Locality for Data Mining
algorithms like k-means, DBSCAN, Nearest Neighbor
● Using data structures (such as kd-trees) for partitioning)
● Matrices and Locality - Row-major vs. Column major vs. Blocking in distributed
context

Learning Outcomes:
No Learning Outcomes
LO1 [to be done ]
LO2 [to be done ]
LO3 [to be done ]
LO4 [to be done ]

Part B: Contact Session Plan

Academic Term
Course Title Systems for Data analytics
Course No DSE* ZG517
Lead Instructor Prof. Anindya Neogi

Course Contents

Contact Topic # List of Topic Title Reading / Reference

(from (from content structure in Part A)
Session #
content
structure in
(2 hours / Part A)
Session)

1 1.1 Systems Attributes for Data Analytics -

Single System

Storage for Data: Structured Data T1 Sec. 1.2.3

(Relational Databases) , Semi-structured
data (Object Stores), Unstructured Data
(file systems)

Processing: In-memory vs. (from)

secondary storage vs. (over the) network
Storage Models and Cost: Memory
Hierarchy, Access costs, I/O Costs (i.e.
number of disk blocks accessed);

2 1.1 Locality of Reference: Principle,

examples

Impact of Latency: Algorithms and data

structures that leverage locality, data
organization on disk for better locality

3 1.2 Systems Attributes for Data Analytics -

Parallel and Distributed Systems

Motivation for Parallel Processing (Size T1. Sec. 1.4.3

of data and complexity of processing)

Storing data in parallel and distributed

systems: Shared Memory vs. Message
Passing

4 1.2 Strategies for data access: Partition,

Replication, and Messaging

5 1.2 Memory Hierarchy in Parallel Systems:

Shared memory access and memory
contention; shared data access and
mutual exclusion

Memory Hierarchy in Distributed

Systems: In-node vs. over the network
latencies, Locality, Communication Cost
6 2.1 Introduction to Systems Architecture T1 Sec. 1.4.3
Parallel Architectures and Programming
Models: Flynn’s Taxonomy (SIMD,
MISD, MIMD) and Parallel
Programming (SPMD, MPSD, MPMD)

Parallel Processing Models:, {Data,

Task, and Request}-Parallelism;
Mapping: Data Parallel - SPMD, Task
Parallel - MPMD, Request Parallel -
Services/Cloud,
Client-Server vs. Peer-to-Peer models of
distributed Computing.

7 2.1 Parallel vs. Distributed Systems: Shared T1 Sec. 1.4.3

Memory vs. Distributed Memory (i.e. T1 Sec. 2.1
message passing)
Motivation for distributed systems (large
size, easy scalability, cost-benefit)

Cluster Computing: Components and T1 Sec. 2.2.1 to 2.2.4,

Architecture. Sec 2.3

8 2.2 Impact of Memory Hierarchy on Additional Reading

Performance:
● Shared Memory and Memory
Contention
● Communication Cost
● Locality

Reliability (for distributed systems): T1 Sec. 1.5.2 and

MTTF and MTTR, Serial vs. Parallel 2.3.3
Connections, Single Point-of-Failure

Mid Term Portion - Review

9 2.2 Building Reliable Systems: Redundancy T1 Sec. 1.5.2 and
and Resilience; Failure Models in 2.3.3
Distributed systems: Transient vs.
Permanent Failures,

Failure Recovery: Fail-over, Active Fail- T1 Sec. 1.5.2 and

Over etc 2.3.3
Overview of Process Migration

Availability: Calculating Availability; T1 Sec. 1.5.2

10 3.1 File systems vs. Database systems. Vs.

Object Stores

Distributed File Systems - Basic T1 Sec. 6.3.2

architecture, Case Studies (GFS/HDFS) AR. Google File
System paper

11 3.1 Unstructured Databases - Basic T1 Sec. 6.3.3

architecture, Case Study and Examples
(Google BigTable, CouchDB /
MongoDB)

12 3.1 Overview of Consistency Models - Weak AR - papers on

and Strong Consistency, Eventual consistency and CAP
Consistency, CAP Theorem - Result and
Implications;

Additional 3.1 Synchronization: Chubby Locking as a AR - paper on

Content case study. Chubby
[supplementary video to be added. Not to
be done in Class]

13 4.1 (Re-)Designing Algorithms for

Distributed Systems

Design Strategy: Divide-and-conquer Notes

for Parallel / Distributed Systems - Basic
scenarios and Implications
Parallel Programming Pattern: Data- T1 Sec. 6.2.1
parallel programs, and map as a construct

Parallel Programming Pattern: Tree- T1 Sec. 6.2.2

parallelism, reduce as a construct

14-15 4.1 Map-reduce model: Examples (of map, T1 Sec. 6.2.2

reduce, map-reduce combinations,
Iterative map-reduce)

Batch processing vs. Online Processing; AR - Spark Paper

Streaming - Systems-level understanding
(input-output, memory model,
constraints)

16 4.1 Master-Slave Processing: Implications Notes

for speedup and communication cost

● Parallelization of Data mining AR - Notes

algorithms like k-means,
4.2 DBSCAN, Nearest Neighbor &
identifying locality issues
● Matrices and Locality - Row-
major vs. Column major vs.
Blocking in distributed context

# The above contact hours and topics can be adapted for non-specific and specific WILP programs
depending on the requirements and class interests.

Select Topics for experiential learning

Topic Select Topics in Syllabus for experiential learning Resources (Need Weka or
No. equivalent software)

1 Programming exercises on map-reduce [Resources: Cloud Infra. Lab in

Hyd.]
2 Setting up a simple 3-tier application on the Cloud [Resources: Amazon student
license]

3 Synchronization exercise on CouchDB [Resources: Cloud Infra. Lab or

Amazon student license]

4 Pen-and-paper exercise on Locality, Memory

Contention, and Communication Requirement

5 Pen-and-paper exercise on calculations of speedup,

MTTF, and MTTR.

Evaluation Scheme
Legend: EC = Evaluation Component
No Name Type Duration Weight Day, Date, Session, Time
Assignment-1 Take Home 12 To be announced
EC-1 Best out of 2 Quizes Take Home 5 To be announced
Assignment-II Take Home 13 To be announced
EC-2 Mid-Semester Test Open Book 90 Min 30 To be announced
EC-3 Comprehensive Exam Open Book 120 Min 40 To be announced
Note - Evaluation components can be tailored depending on the proposed model.

Important Information
Syllabus for Mid-Semester Test (Open Book): Topics in Weeks 1-7
Syllabus for Comprehensive Exam (Open Book): All topics given in plan of study

Evaluation Guidelines:
1. EC-1 consists of two Assignments and a Quiz. Announcements regarding the same will be made in a
timely manner.
2. For Closed Book tests: No books or reference material of any kind will be permitted. Laptops/
Mobiles of any kind are not allowed. Exchange of any material is not allowed.
3. For Open Book exams: Use of prescribed and reference text books, in original (not photocopies) is
permitted. Class notes/slides as reference material in filed or bound form is permitted. However,
loose sheets of paper will not be allowed. Use of calculators is permitted in all exams. Laptops/
Mobiles of any kind are not allowed. Exchange of any material is not allowed.
4. If a student is unable to appear for the Regular Test/Exam due to genuine exigencies, the student
should follow the procedure to apply for the Make-Up Test/Exam. The genuineness of the reason for
absence in the Regular Exam shall be assessed prior to giving permission to appear for the Make-up
Exam. Make-Up Test/Exam will be conducted only at selected exam centres on the dates to be
announced later.
It shall be the responsibility of the individual student to be regular in maintaining the self-study schedule as
given in the course handout, attend the lectures, and take all the prescribed evaluation components such as
Assignment/Quiz, Mid-Semester Test and Comprehensive Exam according to the evaluation scheme
provided in the handout.

Schematic Diagram MCB-V6-En Ver.18.06 Rev.1 (GEEC)
100% (1)
Schematic Diagram MCB-V6-En Ver.18.06 Rev.1 (GEEC)
44 pages
SANDEL-3308 Install Manual Rev G 9-25-03
No ratings yet
SANDEL-3308 Install Manual Rev G 9-25-03
130 pages
Mobile Networks: Hamid Reza Bolhasani
No ratings yet
Mobile Networks: Hamid Reza Bolhasani
58 pages
SMART HELMET and SOS
No ratings yet
SMART HELMET and SOS
9 pages
Installation: Order No.: Customer: Equipment: Converter Type: Document: 3BHS213774E01 ACS 1000 W
No ratings yet
Installation: Order No.: Customer: Equipment: Converter Type: Document: 3BHS213774E01 ACS 1000 W
73 pages
3.7.1 Copies of Colabarations For 2021 22 Part 3
No ratings yet
3.7.1 Copies of Colabarations For 2021 22 Part 3
240 pages
4 Word Processor
No ratings yet
4 Word Processor
22 pages
PilotstarD AP02-S01 Mar09
No ratings yet
PilotstarD AP02-S01 Mar09
168 pages
Config WCM
100% (1)
Config WCM
17 pages
Grade 9 Final - Google Forms
100% (1)
Grade 9 Final - Google Forms
15 pages
hw4 Sol PDF
100% (2)
hw4 Sol PDF
23 pages
CH - 5. Memory Management
No ratings yet
CH - 5. Memory Management
86 pages
Instruction Manual: Programmable Automatic Shift System
No ratings yet
Instruction Manual: Programmable Automatic Shift System
25 pages
Set 5
No ratings yet
Set 5
10 pages
Physics Investigatory Project
No ratings yet
Physics Investigatory Project
17 pages
Ice Stone1
No ratings yet
Ice Stone1
38 pages
Tree Menu Magic 2
No ratings yet
Tree Menu Magic 2
77 pages
Lecture 2 - Problem Solving Process
No ratings yet
Lecture 2 - Problem Solving Process
32 pages
Management Policy PDF
No ratings yet
Management Policy PDF
50 pages
1-Introduction To Algorithms and C Programming
No ratings yet
1-Introduction To Algorithms and C Programming
50 pages
Datatool Alarm Manual
No ratings yet
Datatool Alarm Manual
20 pages
Aman Pandey Resume 20241012
No ratings yet
Aman Pandey Resume 20241012
2 pages
ATCR33S LQ (mm08610)
No ratings yet
ATCR33S LQ (mm08610)
2 pages
Brakes Volvo Trucks
No ratings yet
Brakes Volvo Trucks
2 pages
Quants Intern - JD
No ratings yet
Quants Intern - JD
3 pages
GetTempFileName Function (Winbase.h) - Win32 Apps - Microsoft Learn
No ratings yet
GetTempFileName Function (Winbase.h) - Win32 Apps - Microsoft Learn
4 pages
Nighthawk Ac1900 Wifi Usb Adapter-Usb 3.0, Dual Band: Performance & Use
No ratings yet
Nighthawk Ac1900 Wifi Usb Adapter-Usb 3.0, Dual Band: Performance & Use
4 pages
MMW HW05
No ratings yet
MMW HW05
4 pages
Clearance Propeller PS
No ratings yet
Clearance Propeller PS
1 page
Double Skin Ducted Blower Split System (A5DSB-H/A5MC-H) Double Skin Ducted Blower Split System (A5DSB-H/A5MC-H)
No ratings yet
Double Skin Ducted Blower Split System (A5DSB-H/A5MC-H) Double Skin Ducted Blower Split System (A5DSB-H/A5MC-H)
1 page
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
From Everand
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
Mark Manson
4/5 (6458)
Principles: Life and Work
From Everand
Principles: Life and Work
Ray Dalio
4/5 (643)
Never Split the Difference: Negotiating As If Your Life Depended On It
From Everand
Never Split the Difference: Negotiating As If Your Life Depended On It
Chris Voss
4.5/5 (1005)
Grit: The Power of Passion and Perseverance
From Everand
Grit: The Power of Passion and Perseverance
Angela Duckworth
4/5 (650)
The Glass Castle: A Memoir
From Everand
The Glass Castle: A Memoir
Jeannette Walls
4.5/5 (1856)
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
From Everand
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
Viet Thanh Nguyen
4.5/5 (141)
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
From Everand
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Brene Brown
4/5 (1175)
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
From Everand
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
Ashlee Vance
4.5/5 (582)
The Little Book of Hygge: Danish Secrets to Happy Living
From Everand
The Little Book of Hygge: Danish Secrets to Happy Living
Meik Wiking
3.5/5 (464)
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
From Everand
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
Ben Horowitz
4.5/5 (361)
A Man Called Ove: A Novel
From Everand
A Man Called Ove: A Novel
Fredrik Backman
4.5/5 (5181)
The Emperor of All Maladies: A Biography of Cancer
From Everand
The Emperor of All Maladies: A Biography of Cancer
Siddhartha Mukherjee
4.5/5 (298)
Shoe Dog: A Memoir by the Creator of Nike
From Everand
Shoe Dog: A Memoir by the Creator of Nike
Phil Knight
4.5/5 (629)
Yes Please
From Everand
Yes Please
Amy Poehler
4/5 (2016)
The Perks of Being a Wallflower
From Everand
The Perks of Being a Wallflower
Stephen Chbosky
4.5/5 (4103)
Steve Jobs
From Everand
Steve Jobs
Walter Isaacson
4.5/5 (1139)
Rise of ISIS: A Threat We Can't Ignore
From Everand
Rise of ISIS: A Threat We Can't Ignore
Jay Sekulow
3.5/5 (144)
The World Is Flat 3.0: A Brief History of the Twenty-first Century
From Everand
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Thomas L. Friedman
3.5/5 (2289)
The Yellow House: A Memoir (2019 National Book Award Winner)
From Everand
The Yellow House: A Memoir (2019 National Book Award Winner)
Sarah M. Broom
4/5 (100)
Angela's Ashes: A Memoir
From Everand
Angela's Ashes: A Memoir
Frank McCourt
4.5/5 (943)
The Unwinding: An Inner History of the New America
From Everand
The Unwinding: An Inner History of the New America
George Packer
4/5 (45)
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
From Everand
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
Margot Lee Shetterly
4/5 (1022)
The Light Between Oceans: A Novel
From Everand
The Light Between Oceans: A Novel
M.L. Stedman
4.5/5 (815)
The Woman in Cabin 10
From Everand
The Woman in Cabin 10
Ruth Ware
3.5/5 (2814)
Fear: Trump in the White House
From Everand
Fear: Trump in the White House
Bob Woodward
3.5/5 (836)
Team of Rivals: The Political Genius of Abraham Lincoln
From Everand
Team of Rivals: The Political Genius of Abraham Lincoln
Doris Kearns Goodwin
4.5/5 (244)
Her Body and Other Parties: Stories
From Everand
Her Body and Other Parties: Stories
Carmen Maria Machado
4/5 (903)
Bad Feminist: Essays
From Everand
Bad Feminist: Essays
Roxane Gay
4/5 (1090)
Sing, Unburied, Sing: A Novel
From Everand
Sing, Unburied, Sing: A Novel
Jesmyn Ward
4/5 (1267)
The Outsider: A Novel
From Everand
The Outsider: A Novel
Stephen King
4/5 (2885)
The Constant Gardener: A Novel
From Everand
The Constant Gardener: A Novel
John le Carré
4/5 (278)
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
From Everand
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
Gilbert King
4.5/5 (280)
Wolf Hall: A Novel
From Everand
Wolf Hall: A Novel
Hilary Mantel
4/5 (4135)
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
From Everand
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Dave Eggers
3.5/5 (233)
The Art of Racing in the Rain: A Novel
From Everand
The Art of Racing in the Rain: A Novel
Garth Stein
4/5 (4372)
A Tree Grows in Brooklyn
From Everand
A Tree Grows in Brooklyn
Betty Smith
4.5/5 (2033)
On Fire: The (Burning) Case for a Green New Deal
From Everand
On Fire: The (Burning) Case for a Green New Deal
Naomi Klein
4/5 (78)
John Adams
From Everand
John Adams
David McCullough
4.5/5 (2546)
Manhattan Beach: A Novel
From Everand
Manhattan Beach: A Novel
Jennifer Egan
3.5/5 (919)
Brooklyn: A Novel
From Everand
Brooklyn: A Novel
Colm Tóibín
3.5/5 (2133)
Little Women
From Everand
Little Women
Louisa May Alcott
4.5/5 (2369)

SystemaForDataAnalytics Regular HO

Uploaded by

SystemaForDataAnalytics Regular HO

Uploaded by

BIRLA INSTITUTE OF TECHNOLOGY & SCIENCE, PILANI

WORK INTEGRATED LEARNING PROGRAMMES

Part A: Content Design

Course Title Systems for Data Analytics

Reference Book(s) & other resources

Modular Content Structure

Motivation for Parallel Processing (Size of data and complexity of processing)

Strategies for data access: Partition, Replication, and Messaging

Cluster Computing: Components and Architecture.

2.2 Performance Attributes of Systems

Elasticity: Resilient Performance and Scenarios; Calculating Elasticity; Achieving elasticity

3. Data Storage and Organization for Analytics:

File systems vs. Database systems. Vs. Object Stores

Distributed File Systems - Basic architecture, Case Studies (GFS/HDFS)

Unstructured Databases - Basic architecture, Case Study and Examples (Google

4. Distributed Data Processing for Analytics

Design Strategy: Divide-and-conquer for Parallel / Distributed Systems - Basic scenarios

Parallel Programming Pattern: Tree-parallelism, reduce as a construct

4.2 Distributed Data Analytics

Part B: Contact Session Plan

Contact Topic # List of Topic Title Reading / Reference

1 1.1 Systems Attributes for Data Analytics -

Storage for Data: Structured Data T1 Sec. 1.2.3

Processing: In-memory vs. (from)

2 1.1 Locality of Reference: Principle,

Impact of Latency: Algorithms and data

3 1.2 Systems Attributes for Data Analytics -

Motivation for Parallel Processing (Size T1. Sec. 1.4.3

Storing data in parallel and distributed

4 1.2 Strategies for data access: Partition,

5 1.2 Memory Hierarchy in Parallel Systems:

Memory Hierarchy in Distributed

Parallel Processing Models:, {Data,

7 2.1 Parallel vs. Distributed Systems: Shared T1 Sec. 1.4.3

Cluster Computing: Components and T1 Sec. 2.2.1 to 2.2.4,

8 2.2 Impact of Memory Hierarchy on Additional Reading

Reliability (for distributed systems): T1 Sec. 1.5.2 and

Mid Term Portion - Review

Failure Recovery: Fail-over, Active Fail- T1 Sec. 1.5.2 and

Availability: Calculating Availability; T1 Sec. 1.5.2

10 3.1 File systems vs. Database systems. Vs.

Distributed File Systems - Basic T1 Sec. 6.3.2

11 3.1 Unstructured Databases - Basic T1 Sec. 6.3.3

12 3.1 Overview of Consistency Models - Weak AR - papers on

Additional 3.1 Synchronization: Chubby Locking as a AR - paper on

13 4.1 (Re-)Designing Algorithms for

Design Strategy: Divide-and-conquer Notes

Parallel Programming Pattern: Tree- T1 Sec. 6.2.2

14-15 4.1 Map-reduce model: Examples (of map, T1 Sec. 6.2.2

Batch processing vs. Online Processing; AR - Spark Paper

16 4.1 Master-Slave Processing: Implications Notes

● Parallelization of Data mining AR - Notes

Select Topics for experiential learning

1 Programming exercises on map-reduce [Resources: Cloud Infra. Lab in

3 Synchronization exercise on CouchDB [Resources: Cloud Infra. Lab or

4 Pen-and-paper exercise on Locality, Memory

5 Pen-and-paper exercise on calculations of speedup,

You might also like