0% found this document useful (0 votes)

187 views9 pages

Birla Institute of Technology & Science, Pilani: Work Integrated Learning Programmes

The document provides information on the "Systems for Data Analytics" course, including: 1) The course aims to teach fundamentals of data engineering and systems/techniques for data processing, including relevant concepts in databases, cloud computing and distributed computing. 2) The course objectives are to introduce a systems perspective of data analytics, develop knowledge of using parallel/distributed systems for analytics, apply best practices for data storage/retrieval for analytics, and leverage commodity infrastructure for analytics. 3) The course content is divided into topics that cover storage models, parallel/distributed architectures, performance attributes, data storage/organization, and distributed data processing.

Uploaded by

khkarthik

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

187 views9 pages

Birla Institute of Technology & Science, Pilani: Work Integrated Learning Programmes

Uploaded by

khkarthik

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

BIRLA INSTITUTE OF TECHNOLOGY & SCIENCE, PILANI

WORK INTEGRATED LEARNING PROGRAMMES

COURSE HANDOUT

Part A: Content Design

Course Title Systems for Data Analytics

Course No(s) DSE* ZG517
Credit Units 5
Course Author Prof. Shan Balasubramaniam
Version No 1
Date 26 / April / 2019

Course Description
Learn about fundamentals of data engineering; Basics of systems and techniques for data processing -
comprising of relevant database, cloud computing and distributed computing concepts.

Course Objectives
CO1 Introduce students to a systems perspective of data analytics: to leverage systems effectively,
understand, measure, and improve performance while performing data analytics tasks

CO2 Enable students to develop a working knowledge of how to use parallel and distributed systems for
data analytics

CO3 Enable students to apply best practices in storing and retrieving data for analytics

CO4 Enable students to leverage commodity infrastructure (such as scale-out clusters, distributed data-
stores, and the cloud) for data analytics.

Text Book(s)
T1 Kai Hwang, Geoffrey Fox, and Dongarra. - Distributed Computing and Cloud
Computing. Morgan Kauffman

Reference Book(s) & other resources

R1 Nikolas Roman Herbst, Samuel Kounev, Ralf Reussner. Elasticity in cloud computing:
What it is, and what it is not. 10th International Conference on Autonomic Computing
(ICAC ’13). USENIX Association.
R2 Mohammed Alhamad, Tharam Dillon, Elizabeth Chang.Conceptual SLA Framework
for Cloud Computing.4th IEEE International Conference on Digital Ecosystems and
Technologies. April 2010, Dubai, UAE.
R3 Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. The Google File System.
SOSP’03, October 19–22, 2003, Bolton Landing, New York, USA.
R4 Apache CouchDB. Technical Overview.
http://docs.couchdb.org/en/stable/intro/overview.html
R5 Apache CouchDB. Eventual Consistency.
http://docs.couchdb.org/en/stable/intro/consistency.html
R6 Seth Gilbert and Nancy A. Lynch. Perspectives on the CAP Theorem. IEEE
Computer. vol. 45. Issue 2. Feb. 2012
R7 Werner vogels. Eventually Consistent. january 2009. vol. 52. no. 1 Communications
of the acm.
R8 Eric Brewer.CAP Twelve Years Later: How the “Rules” Have Changed. IEEE
Computer. vol. 45. Issue 2. Feb. 2012
R9 M. Burrows, The Chubby Lock Service for Loosely-Coupled Distributed Systems, in:
OSDI’06: Seventh Symposium on Operating System Design and Implementation,
USENIX, Seattle, WA, 2006, pp. 335–350.
R10 MATEI ZAHARIA et. al. Apache Spark: A Unified Engine for Big Data Processing
.COMMUNICATIONS OF THE ACM | NOVEMBER 2016 | VOL. 59 | NO. 11.
R11 YASER MANSOURI, ADEL NADJARAN TOOSI, and RAJKUMAR BUYYA. Data Storage
Management in Cloud Environments:Taxonomy, Survey, and Future Directions . ACM Computing
Surveys, Vol. 50, No. 6, Article 91. December 2017
R12 Ananth Grama, Anshul Gupta, George Karypis, Vipin Kumar Introduction to Parallel
Computing, Second Edition(2003), Addison Wesley (at least Chapters 1, 2, 3 & 5)
R13 George Coulouris, Jean Dollimore, Tim Kindberg, Gordon Blair - Distributed Systems
Concepts and Design, Fifth Edition, Pearson (Chapter 1 & 2)

Modular Content Structure

# Topics

1 Introduction to Data Engineering

1.1 Systems Attributes for Data Analytics - Single System

Storage for Data: Structured Data (Relational Databases) , Semi-structured data (Object
Stores), Unstructured Data (file systems)

Processing: In-memory vs. (from) secondary storage vs. (over the) network

Storage Models and Cost: Memory Hierarchy, Access costs, I/O Costs (i.e. number of disk
blocks accessed);

Locality of Reference: Principle, examples

Impact of Latency: Algorithms and data structures that leverage locality, data organization
on disk for better locality

1.2 Systems Attributes for Data Analytics - Parallel and Distributed Systems

Motivation for Parallel Processing (Size of data and complexity of processing)

Storing data in parallel and distributed systems: Shared Memory vs. Message Passing
Strategies for data access: Partition, Replication, and Messaging

Memory Hierarchy in Parallel Systems: Shared memory access and memory contention;
shared data access and mutual exclusion

Memory Hierarchy in Distributed Systems: In-node vs. over the network latencies, Locality,
Communication Cost

2 Systems Architecture for Data Analytics

2.1 Introduction to Systems Architecture

Parallel Architectures and Programming Models: Flynn’s Taxonomy (SIMD, MISD, MIMD)
and Parallel Programming (SPMD, MPSD, MPMD)

Parallel Processing Models:, {Data, Task, and Request}-Parallelism;

Mapping: Data Parallel - SPMD, Task Parallel - MPMD, Request Parallel -
Services/Cloud,
Client-Server vs. Peer-to-Peer models of distributed Computing.

Parallel vs. Distributed Systems: Shared Memory vs. Distributed Memory (i.e. message
passing)
Motivation for distributed systems (large size, easy scalability, cost-benefit)

Cluster Computing: Components and Architecture.

2.2 Performance Attributes of Systems

Scalability - Speedup and Amdahl’s Law;

How to apply Amdahl’s Law?
(Relation to Barsis-Gustafson Law?)

Impact of Memory Hierarchy on Performance:

● Shared Memory and Memory Contention
● Communication Cost
● Locality

Reliability (for distributed systems): MTTF and MTTR, Serial vs. Parallel Connections,
Single Point-of-Failure

Building Reliable Systems: Redundancy and Resilience; Failure Models in Distributed

systems: Transient vs. Permanent Failures,

Failure Recovery: Fail-over, Active Fail-over etc

Process Migration

Availability: Calculating Availability; Service Agreements and SLAs

Elasticity: Resilient Performance and Scenarios; Calculating Elasticity; Achieving elasticity

(via resource provisioning and virtualization)

3. Data Storage and Organization for Analytics:

File systems vs. Database systems. Vs. Object Stores

Distributed File Systems - Basic architecture, Case Studies (GFS/HDFS)

Unstructured Databases - Basic architecture, Case Study and Examples (Google

BigTable, CouchDB / MongoDB)

Consistency Models - Weak and Strong Consistency, Eventual Consistency, CAP

Theorem - Result and Implications;

Synchronization: Chubby Locking as a case study.

4. Distributed Data Processing for Analytics

4.1 (Re-)Designing Algorithms for Distributed Systems

Design Strategy: Divide-and-conquer for Parallel / Distributed Systems - Basic scenarios

and Implications

Parallel Programming Pattern: Data-parallel programs, and map as a construct

Parallel Programming Pattern: Tree-parallelism, reduce as a construct

Map-reduce model: Examples (of map, reduce, map-reduce combinations, Iterative map-
reduce)

Batch processing vs. Online Processing; Streaming - Systems-level understanding (input-

output, memory model, constraints)

Master-Slave Processing: Implications for speedup and communication cost

4.2 Distributed Data Analytics

● Partitioning vs. Replication and Communication vs. Locality for Data Mining
algorithms like k-means, DBSCAN, Nearest Neighbor
● Using data structures (such as kd-trees) for partitioning)
● Matrices and Locality - Row-major vs. Column major vs. Blocking in distributed
context

Learning Outcomes:
No Learning Outcomes

L01 Ability to identify the right storage model to use given a dataset

L02 Ability to apply the appropriate parallel programming model to a given dataset

L03 Ability to identify and tune some common quality attributes of a distributed system

L04 Ability to choose the relevant consistency model for data stores based on application

L05 Ability to apply data mining algorithms like k-means clustering on appropriate dataset
L06 Ability to design and develop a n-tier data mining system in a cloud environment

Part B: Contact Session Plan

Academic Term 2019 Second Semester

Course Title Systems for Data analytics
Course No DSE* ZG517
Lead Instructor Y. R. Sudhakar

Course Contents

Contact Topic # List of Topic Title Reading /

Session # (from (from content structure in Part A) Reference
content
(2 hours / structu
Session) re in
Part A)

Systems Attributes for Data Analytics - Single System Class Slides

Storage for Data: Structured Data (Relational Class Slides

1 1.1
Databases) , Semi-structured data (Object Stores),
Unstructured Data (file systems)

Processing: In-memory vs. (from) secondary storage vs. T1 Sec. 1.2.3

(over the) network

Storage Models and Cost: Memory Hierarchy, Access Class Slides

costs, I/O Costs (i.e. number of disk blocks accessed);

Locality of Reference: Principle, examples Class Slides

2 1.1
Impact of Latency: Algorithms and data structures that Class Slides
leverage locality, data organization on disk for better
locality

Systems Attributes for Data Analytics - Parallel and R12

Distributed Systems Class Slides

Motivation for Parallel Processing (Size of data and R12

complexity of processing) Class Slides

Storing data in parallel and distributed systems: Shared T1. Sec. 1.4.3
1.2 Memory vs. Message Passing R12
3-4 Class Slides

Strategies for data access: Partition, Replication, and R12

Messaging Class Slides
Memory Hierarchy in Parallel Systems: Shared memory R12
access and memory contention; shared data access and Class Slides
mutual exclusion

Memory Hierarchy in Distributed Systems: In-node vs. R12

1.2 over the network latencies, Locality, Communication Class Slides
Cost

Introduction to Systems Architecture

Parallel Architectures and Programming Models: Flynn’s T1 Sec. 1.4.3

Taxonomy (SIMD, MISD, MIMD) and Parallel R12
5 Programming (SPMD, MPSD, MPMD) Class Slides

Parallel Processing Models:, {Data, Task, and Request}- T1 Sec. 1.4.3

2.1 Parallelism; R12
Mapping: Data Parallel - SPMD, Task Parallel - MPMD, R13
Request Parallel - Services/Cloud, Class Slides
Client-Server vs. Peer-to-Peer models of distributed
Computing.

Parallel vs. Distributed Systems: Shared Memory vs. T1 Sec. 1.4.3

Distributed Memory (i.e. message passing) T1 Sec. 2.1
Motivation for distributed systems (large size, easy R12
scalability, cost-benefit) Class Slides
6 2.1

Cluster Computing: Components and Architecture. T1 Sec. 2.2.1

to 2.2.4, Sec
2.3

Scalability - Speedup and Amdahl’s Law; T1 Sec. 1.5.1

How to apply Amdahl’s Law?
(Relation to Barsis-Gustafson Law)

Impact of Memory Hierarchy on Performance: Additional

● Shared Memory and Memory Contention Reading
2.2 ● Communication Cost
7-8 ● Locality

Reliability (for distributed systems): MTTF and MTTR, T1 Sec. 1.5.2

Serial vs. Parallel Connections, Single Point-of-Failure and 2.3.3

Building Reliable Systems: Redundancy and Resilience; T1 Sec. 1.5.2

Failure Models in Distributed systems: Transient vs. and 2.3.3
2.2 Permanent Failures,

Failure Recovery: Fail-over, Active Fail-Over etc T1 Sec. 1.5.2

Overview of Process Migration and 2.3.3

Availability: Calculating Availability; T1 Sec. 1.5.2

9 2.2

Review of Topics for Mid Semester Exam ( ~40 Mins)

File systems vs. Database systems. Vs. Object Stores -

Distributed File Systems - Basic architecture, Case T1 Sec. 6.3.2

Studies (GFS/HDFS) R3
10 - 12 3.1
Unstructured Databases - Basic architecture, Case T1 Sec. 6.3.3
Study and Examples (Google BigTable, CouchDB /
MongoDB)

Overview of Consistency Models - Weak and Strong R6, R7 & R8

Consistency, Eventual Consistency, CAP Theorem -
Result and Implications;

3.1 Synchronization: Chubby Locking as a case study. R9

[additional [supplementary video to be added. Not to be done in
content] Class]

(Re-)Designing Algorithms for Distributed Systems

Design Strategy: Divide-and-conquer for Parallel / Notes

Distributed Systems - Basic scenarios and Implications
4.1 Parallel Programming Pattern: Data-parallel programs, T1 Sec. 6.2.1
13
and map as a construct

Parallel Programming Pattern: Tree-parallelism, reduce T1 Sec. 6.2.2

as a construct

Map-reduce model: Examples (of map, reduce, map- T1 Sec. 6.2.2

reduce combinations, Iterative map-reduce)

4.1 Batch processing vs. Online Processing; Streaming - R10

14-15
Systems-level understanding (input-output, memory
model, constraints)

4.1 Master-Slave Processing: Implications for speedup and Notes

communication cost

● Parallelization of Data mining algorithms like k- AR –

means, DBSCAN, Nearest Neighbor & identifying Notes
4.2 locality issues
16 ● Matrices and Locality - Row-major vs. Column
major vs. Blocking in distributed context

# The above contact hours and topics can be adapted for non-specific and specific WILP programs
depending on the requirements and class interests.
Select Topics for experiential learning [Tutorials]

Topic Select Topics in Syllabus for experiential Resources (Need Weka or equivalent
No. learning software)

1 Introduction to Cloud Computing (with AWS [Resources: Amazon student license]

as an example)

2 Setting up a simple 3-tier application on the [Resources: Amazon student license]

Cloud

3 Programming exercises on map-reduce [Resources: Cloud Infra. Lab in Hyd.]

4 Synchronization exercise on CouchDB [Resources: Cloud Infra. Lab or

Amazon student license]

5 Pen-and-paper exercise on Locality, Memory

Contention, and Communication
Requirement

6 Pen-and-paper exercise on calculations of

speedup, MTTF, and MTTR.

Evaluation Scheme
Legend: EC = Evaluation Component
No Name Type Duration Weight Day, Date, Session, Time
Assignment-1 Take Home 12 To be announced
EC-1 Quiz-II Take Home 5 To be announced
Assignment-II Take Home 13 To be announced
EC-2 Mid-Semester Test Closed Book 90 Min 30 To be announced
EC-3 Comprehensive Exam Open Book 120 Min 40 To be announced
Note - Evaluation components can be tailored depending on the proposed model.

Important Information
Syllabus for Mid-Semester Test (Closed Book): Topics in Weeks 1-7
Syllabus for Comprehensive Exam (Open Book): All topics given in plan of study

Evaluation Guidelines:
1. EC-1 consists of two Assignments and a Quiz. Announcements regarding the same will be made in a
timely manner.
2. For Closed Book tests: No books or reference material of any kind will be permitted.
Laptops/Mobiles of any kind are not allowed. Exchange of any material is not allowed.
3. For Open Book exams: Use of prescribed and reference text books, in original (not photocopies) is
permitted. Class notes/slides as reference material in filed or bound form is permitted. However,
loose sheets of paper will not be allowed. Use of calculators is permitted in all exams.
Laptops/Mobiles of any kind are not allowed. Exchange of any material is not allowed.
4. If a student is unable to appear for the Regular Test/Exam due to genuine exigencies, the student
should follow the procedure to apply for the Make-Up Test/Exam. The genuineness of the reason for
absence in the Regular Exam shall be assessed prior to giving permission to appear for the Make-up
Exam. Make-Up Test/Exam will be conducted only at selected exam centres on the dates to be
announced later.
It shall be the responsibility of the individual student to be regular in maintaining the self-study schedule as
given in the course handout, attend the lectures, and take all the prescribed evaluation components such as
Assignment/Quiz, Mid-Semester Test and Comprehensive Exam according to the evaluation scheme
provided in the handout.

Information Technology s7 & s8
No ratings yet
Information Technology s7 & s8
317 pages
Ivandic Odyssey 2022
No ratings yet
Ivandic Odyssey 2022
1,208 pages
19 Cool Acoustic Guitar Tabs
100% (2)
19 Cool Acoustic Guitar Tabs
19 pages
PC Magazine - February 2014 USA
No ratings yet
PC Magazine - February 2014 USA
142 pages
Course Handout - Mathematical Foundations For Data Science
0% (1)
Course Handout - Mathematical Foundations For Data Science
5 pages
MCA - BigData Notes
No ratings yet
MCA - BigData Notes
136 pages
DA Full
No ratings yet
DA Full
738 pages
Metaverse Seminar Report
No ratings yet
Metaverse Seminar Report
19 pages
Distributed DBMS
No ratings yet
Distributed DBMS
62 pages
The Body Meridians
100% (14)
The Body Meridians
61 pages
Late Modernism and Peter Eisenmann
No ratings yet
Late Modernism and Peter Eisenmann
16 pages
Norway 6 Contents PDF
No ratings yet
Norway 6 Contents PDF
9 pages
JRC GPS Roll-Over
No ratings yet
JRC GPS Roll-Over
1 page
Session 14 - Joint Probability Distributions (GbA) PDF
No ratings yet
Session 14 - Joint Probability Distributions (GbA) PDF
69 pages
ACS (OCR'et) PDF
No ratings yet
ACS (OCR'et) PDF
501 pages
Session 11 - Multiple Regression Analysis (GbA) PDF
No ratings yet
Session 11 - Multiple Regression Analysis (GbA) PDF
119 pages
Product Design and Development - Design For Manufacturing
No ratings yet
Product Design and Development - Design For Manufacturing
35 pages
Session 1&2 - Descriptive Statistics (GbA) PDF
No ratings yet
Session 1&2 - Descriptive Statistics (GbA) PDF
125 pages
It - (R20) - 4-1 - Big Data Analytics - Digital Notes
No ratings yet
It - (R20) - 4-1 - Big Data Analytics - Digital Notes
117 pages
Session 13A - The ARMA and ARIMA Models
No ratings yet
Session 13A - The ARMA and ARIMA Models
173 pages
Module 1-BDA
No ratings yet
Module 1-BDA
82 pages
Data Storage Technologies and Networks
No ratings yet
Data Storage Technologies and Networks
7 pages
Struckoffcomapnies 07092017
No ratings yet
Struckoffcomapnies 07092017
90 pages
Avast 2050 License Faker by ZeNiX 2014-03-14 en
No ratings yet
Avast 2050 License Faker by ZeNiX 2014-03-14 en
1 page
Session 12 - Time Series and Forecasting (GbA) PDF
No ratings yet
Session 12 - Time Series and Forecasting (GbA) PDF
84 pages
Catalogue of Microbial Cultures
100% (1)
Catalogue of Microbial Cultures
78 pages
Da Notes - 2019
No ratings yet
Da Notes - 2019
201 pages
Bigdata
No ratings yet
Bigdata
2 pages
A Thing of Beauty 2023
No ratings yet
A Thing of Beauty 2023
7 pages
Lecture 1 - Introduction
No ratings yet
Lecture 1 - Introduction
94 pages
Data Collection & Analysis Educational Presentation in Pink and Blue Lined Style
No ratings yet
Data Collection & Analysis Educational Presentation in Pink and Blue Lined Style
51 pages
Adultery
No ratings yet
Adultery
2 pages
Course Introduction: Dsecl Zc556 Stream Processing and Analytics Lecture No. 1.0
No ratings yet
Course Introduction: Dsecl Zc556 Stream Processing and Analytics Lecture No. 1.0
52 pages
Table of Content
No ratings yet
Table of Content
13 pages
Rec It-It17701 Data Analytics Unit 1 Part - II
No ratings yet
Rec It-It17701 Data Analytics Unit 1 Part - II
59 pages
Today: - How Do Caches Work?
No ratings yet
Today: - How Do Caches Work?
38 pages
It (r20) 4-1 Big Data Analytics Digital Notes
No ratings yet
It (r20) 4-1 Big Data Analytics Digital Notes
84 pages
Detailed Estimate / Bill of Materials: Item Particulars Unit Quantity Unit Cost Total Cost
No ratings yet
Detailed Estimate / Bill of Materials: Item Particulars Unit Quantity Unit Cost Total Cost
4 pages
Slides Chapter 2 (PDF) (ENG) Theories of International Trade
No ratings yet
Slides Chapter 2 (PDF) (ENG) Theories of International Trade
33 pages
No SQL Database in Bda
No ratings yet
No SQL Database in Bda
84 pages
Chapter 3-Updated
No ratings yet
Chapter 3-Updated
34 pages
Lectures - Week 1 2 Introduction To Distributed Computing
No ratings yet
Lectures - Week 1 2 Introduction To Distributed Computing
65 pages
BDA 02 - Fundamentals
No ratings yet
BDA 02 - Fundamentals
64 pages
NM03 Act.3
No ratings yet
NM03 Act.3
2 pages
20ai402 Data Analytics Unit-2
No ratings yet
20ai402 Data Analytics Unit-2
72 pages
Unit 1 - DA - Introduction To Data Science
No ratings yet
Unit 1 - DA - Introduction To Data Science
70 pages
Big Data Analytics (R20a0520)
No ratings yet
Big Data Analytics (R20a0520)
84 pages
07 DistributedDataManagement
No ratings yet
07 DistributedDataManagement
44 pages
Distributed Computing BE (AI&DS)
No ratings yet
Distributed Computing BE (AI&DS)
53 pages
Birla Institute of Technology & Science, Pilani Course Handout Part A: Content Design
No ratings yet
Birla Institute of Technology & Science, Pilani Course Handout Part A: Content Design
5 pages
r22 IV CSD Year Syllabus
No ratings yet
r22 IV CSD Year Syllabus
21 pages
Unit 1 - DA - Introduction To Big Data
No ratings yet
Unit 1 - DA - Introduction To Big Data
65 pages
Agenda: Big Data Systems
No ratings yet
Agenda: Big Data Systems
25 pages
Bharathidasan University, Tiruchirappalli - 620 024. M. Sc. Computer Science - Course Structure Under CBCS
No ratings yet
Bharathidasan University, Tiruchirappalli - 620 024. M. Sc. Computer Science - Course Structure Under CBCS
33 pages
Microbiology A Laboratory Manual 12th Edition by James Cappuccino, Chad Welsh ISBN 0137546521 9780137546527 Download
No ratings yet
Microbiology A Laboratory Manual 12th Edition by James Cappuccino, Chad Welsh ISBN 0137546521 9780137546527 Download
70 pages
BDA Module - 1 PSM
No ratings yet
BDA Module - 1 PSM
32 pages
Lecture01 Introduction
No ratings yet
Lecture01 Introduction
42 pages
Lecture 16
No ratings yet
Lecture 16
31 pages
Bda Unit 1
No ratings yet
Bda Unit 1
32 pages
System Design
No ratings yet
System Design
30 pages
Rcu Mca Iv Sem Syllabus
No ratings yet
Rcu Mca Iv Sem Syllabus
17 pages
Condition Monitoring of A Surface Mounted Permanen
No ratings yet
Condition Monitoring of A Surface Mounted Permanen
18 pages
Conversational AI - Short Course HO
No ratings yet
Conversational AI - Short Course HO
3 pages
Rcaller: A Library For Calling R From Java: by M.Hakan Satman August 17, 2013
No ratings yet
Rcaller: A Library For Calling R From Java: by M.Hakan Satman August 17, 2013
6 pages
M.tech 1-II Syllabus JNTUGV
No ratings yet
M.tech 1-II Syllabus JNTUGV
6 pages
CS8091 Big Data Analytics
No ratings yet
CS8091 Big Data Analytics
28 pages
Course Handout - 21CSE372P - Mastering Cloud Data Services and Analytics With AWS, Azure, and GCP - VF-1
No ratings yet
Course Handout - 21CSE372P - Mastering Cloud Data Services and Analytics With AWS, Azure, and GCP - VF-1
18 pages
P24CDMCA4 Unit2
No ratings yet
P24CDMCA4 Unit2
15 pages
VTU Exam Question Paper With Solution of 18CS72 Big Data and Analytics Feb-2022-Dr. v. Vijayalakshmi
No ratings yet
VTU Exam Question Paper With Solution of 18CS72 Big Data and Analytics Feb-2022-Dr. v. Vijayalakshmi
25 pages
Unit 1
No ratings yet
Unit 1
19 pages
Chapter 2 - Data Science
No ratings yet
Chapter 2 - Data Science
20 pages
RI 2022 H3 Test 2 (Questions and Solutions)
No ratings yet
RI 2022 H3 Test 2 (Questions and Solutions)
8 pages
1.4 Module-1
No ratings yet
1.4 Module-1
21 pages
Birla Institute of Technology & Science, Pilani: Work Integrated Learning Programmes Part A: Content Design
No ratings yet
Birla Institute of Technology & Science, Pilani: Work Integrated Learning Programmes Part A: Content Design
6 pages
Cyber Security and IT Laws
No ratings yet
Cyber Security and IT Laws
25 pages
Statement Details: Transaction Date Posting Date Description Debit Credit Posting Amount Posting Currency Auth Code
No ratings yet
Statement Details: Transaction Date Posting Date Description Debit Credit Posting Amount Posting Currency Auth Code
2 pages
3) Wase 2021 Dds Ho Modified
No ratings yet
3) Wase 2021 Dds Ho Modified
8 pages
Chapter 1n2 Characterization and Design Goals
No ratings yet
Chapter 1n2 Characterization and Design Goals
48 pages
Unit 4 LT
No ratings yet
Unit 4 LT
16 pages
Wilp - Bits Pilani - Mtech Data Science - 2Nd Semester - October 2019 - Batch - Jain Campus - Bangalore
No ratings yet
Wilp - Bits Pilani - Mtech Data Science - 2Nd Semester - October 2019 - Batch - Jain Campus - Bangalore
3 pages
Big Data Syllabus
No ratings yet
Big Data Syllabus
1 page
SS ZG554
No ratings yet
SS ZG554
13 pages
CC ZG522 Course Handout
No ratings yet
CC ZG522 Course Handout
6 pages
Be - Ai-Ds (1) - 80-83
No ratings yet
Be - Ai-Ds (1) - 80-83
4 pages
Technical Specifications Baby Warmer
No ratings yet
Technical Specifications Baby Warmer
1 page
MCEN2001 Lab Report 1
No ratings yet
MCEN2001 Lab Report 1
8 pages
Oid Esp All Eat A Paper Thailand Final
No ratings yet
Oid Esp All Eat A Paper Thailand Final
6 pages
DBS Reviewer
No ratings yet
DBS Reviewer
4 pages
KTK Bank Privacy Policy
No ratings yet
KTK Bank Privacy Policy
3 pages
Data Analytics Course Plan 2016
No ratings yet
Data Analytics Course Plan 2016
7 pages
3rd Sem Syllabus
No ratings yet
3rd Sem Syllabus
13 pages
IMS Questions 2024 - Bangalore (English) Above 15 Years
No ratings yet
IMS Questions 2024 - Bangalore (English) Above 15 Years
2 pages
BDS Course Handout - Intuit PDF
No ratings yet
BDS Course Handout - Intuit PDF
6 pages
(QP - Test-01 CH-10) (Std. 8 Mathematics 'Ii')
No ratings yet
(QP - Test-01 CH-10) (Std. 8 Mathematics 'Ii')
2 pages
Projectile and Mortar Parts
No ratings yet
Projectile and Mortar Parts
2 pages
Jessica Chong CV - Sep 2018
No ratings yet
Jessica Chong CV - Sep 2018
1 page
Syllabus E63 Spring2016-2
No ratings yet
Syllabus E63 Spring2016-2
3 pages
CP7202
No ratings yet
CP7202
1 page

Birla Institute of Technology & Science, Pilani: Work Integrated Learning Programmes

Uploaded by

Birla Institute of Technology & Science, Pilani: Work Integrated Learning Programmes

Uploaded by

BIRLA INSTITUTE OF TECHNOLOGY & SCIENCE, PILANI

WORK INTEGRATED LEARNING PROGRAMMES

Part A: Content Design

Course Title Systems for Data Analytics

Reference Book(s) & other resources

Modular Content Structure

1 Introduction to Data Engineering

1.1 Systems Attributes for Data Analytics - Single System

Locality of Reference: Principle, examples

Motivation for Parallel Processing (Size of data and complexity of processing)

2 Systems Architecture for Data Analytics

2.1 Introduction to Systems Architecture

Parallel Processing Models:, {Data, Task, and Request}-Parallelism;

Cluster Computing: Components and Architecture.

2.2 Performance Attributes of Systems

Scalability - Speedup and Amdahl’s Law;

Impact of Memory Hierarchy on Performance:

Building Reliable Systems: Redundancy and Resilience; Failure Models in Distributed

Failure Recovery: Fail-over, Active Fail-over etc

Availability: Calculating Availability; Service Agreements and SLAs

Elasticity: Resilient Performance and Scenarios; Calculating Elasticity; Achieving elasticity

3. Data Storage and Organization for Analytics:

Distributed File Systems - Basic architecture, Case Studies (GFS/HDFS)

Unstructured Databases - Basic architecture, Case Study and Examples (Google

Consistency Models - Weak and Strong Consistency, Eventual Consistency, CAP

Synchronization: Chubby Locking as a case study.

4. Distributed Data Processing for Analytics

4.1 (Re-)Designing Algorithms for Distributed Systems

Design Strategy: Divide-and-conquer for Parallel / Distributed Systems - Basic scenarios

Parallel Programming Pattern: Data-parallel programs, and map as a construct

Parallel Programming Pattern: Tree-parallelism, reduce as a construct

Batch processing vs. Online Processing; Streaming - Systems-level understanding (input-

Master-Slave Processing: Implications for speedup and communication cost

4.2 Distributed Data Analytics

Part B: Contact Session Plan

Academic Term 2019 Second Semester

Contact Topic # List of Topic Title Reading /

Systems Attributes for Data Analytics - Single System Class Slides

Storage for Data: Structured Data (Relational Class Slides

Processing: In-memory vs. (from) secondary storage vs. T1 Sec. 1.2.3

Storage Models and Cost: Memory Hierarchy, Access Class Slides

Locality of Reference: Principle, examples Class Slides

Systems Attributes for Data Analytics - Parallel and R12

Motivation for Parallel Processing (Size of data and R12

Strategies for data access: Partition, Replication, and R12

Memory Hierarchy in Distributed Systems: In-node vs. R12

Introduction to Systems Architecture

Parallel Architectures and Programming Models: Flynn’s T1 Sec. 1.4.3

Parallel Processing Models:, {Data, Task, and Request}- T1 Sec. 1.4.3

Parallel vs. Distributed Systems: Shared Memory vs. T1 Sec. 1.4.3

Cluster Computing: Components and Architecture. T1 Sec. 2.2.1

Scalability - Speedup and Amdahl’s Law; T1 Sec. 1.5.1

Impact of Memory Hierarchy on Performance: Additional

Reliability (for distributed systems): MTTF and MTTR, T1 Sec. 1.5.2

Building Reliable Systems: Redundancy and Resilience; T1 Sec. 1.5.2

Failure Recovery: Fail-over, Active Fail-Over etc T1 Sec. 1.5.2

Availability: Calculating Availability; T1 Sec. 1.5.2

Review of Topics for Mid Semester Exam ( ~40 Mins)

Distributed File Systems - Basic architecture, Case T1 Sec. 6.3.2

Overview of Consistency Models - Weak and Strong R6, R7 & R8

3.1 Synchronization: Chubby Locking as a case study. R9

(Re-)Designing Algorithms for Distributed Systems

Design Strategy: Divide-and-conquer for Parallel / Notes

Parallel Programming Pattern: Tree-parallelism, reduce T1 Sec. 6.2.2

Map-reduce model: Examples (of map, reduce, map- T1 Sec. 6.2.2

4.1 Batch processing vs. Online Processing; Streaming - R10

4.1 Master-Slave Processing: Implications for speedup and Notes

● Parallelization of Data mining algorithms like k- AR –

1 Introduction to Cloud Computing (with AWS [Resources: Amazon student license]

2 Setting up a simple 3-tier application on the [Resources: Amazon student license]

3 Programming exercises on map-reduce [Resources: Cloud Infra. Lab in Hyd.]

4 Synchronization exercise on CouchDB [Resources: Cloud Infra. Lab or

5 Pen-and-paper exercise on Locality, Memory

6 Pen-and-paper exercise on calculations of

You might also like