0% found this document useful (0 votes)
2 views65 pages

T1 Intro

ICS 4104: Distributed Systems at Strathmore University covers the design and implementation of distributed systems, requiring prerequisites in data structures, operating systems, and networks. The course includes lectures, programming assignments, and term projects, with a focus on collaborative group work and strict deadlines. Key topics include the definition of distributed systems, design goals, and challenges such as asynchrony, failure detection, and scalability.

Uploaded by

kimaniann443
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views65 pages

T1 Intro

ICS 4104: Distributed Systems at Strathmore University covers the design and implementation of distributed systems, requiring prerequisites in data structures, operating systems, and networks. The course includes lectures, programming assignments, and term projects, with a focus on collaborative group work and strict deadlines. Key topics include the definition of distributed systems, design goals, and challenges such as asynchrony, failure detection, and scalability.

Uploaded by

kimaniann443
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 65

ICS 4104: Distributed Systems

T1:
Course
Introduction​

TALO MARTINI HARRISON


[email protected]
Strathmore University
Pre-requisite for this course

• Data Structures and Algorithms


• Operating Systems
• Computer Networks

• Database Systems

Strathmore University
Course Structure
• A blend of lecture-based study and research-based study with Programming
Assignments / Short Projects
• Lectures
• Discuss the fundamentals – algorithms and systems-aspect of the course
• Term Projects
• Two types of term projects – Reseach-based and Survey-based
• Research-based: Explore a recent research topic and come up with some
experimental studies/new algorithms/new system designs, etc.
• Survey-based: Read some (at least 3N papers for N-member groups) recent research
papers on a topic assigned and come-up with a survey paper
• Group size: 4-6 members
• Fill up this Google form to submit your group information by March 10, 2024: https://
forms.gle/28mSq9yWneqLoBd6A

Strathmore University
Course Structure

• Programming Assignments / Short Projects


• Roughly four assignments during the semester
• Form a group of 3-4 students. Submit the group information through this Google form
by 10th March 2024: https://forms.gle/28mSq9yWneqLoBd6A
• Create a private GitHub repo; collaborate among the group members to build the
assignment solutions.
• You may discuss among your friends, TAs, instructors, but the code should be on your
own
• We'll be having a target deadline for each of the assignments. You need to make the
GitHub repo public by the target deadline and share it with the instructor.
• You are allowed to edit your code after the target deadline; but we can evaluate it any
day after this deadline – your marks for the assignment will be based on this
evaluation only.
• There would be no extension of this target deadline
Strathmore University
Course Structure

• Programming Assignments / Short Projects


• Your assignment will be evaluated exactly once. We'll raise an issue on the GitHub if
there is any comments because of which your marks are deducted. However,
reevaluation will not be done.
• Your code should run on a standard Ubuntu 20.04 or later. You should include a
README file in each submission, that should mention the test environment, sample
inputs and outputs clearly.
• You'll not be entertained for any marks based on any changes made after the
evaluation date, or any incomplete information in your README file.
• You should also include a Makefile to compile and run your code, and one or more
sample inputs with which you have tested the code (under an input directory).
• In case of plagiarism, marks will be given only to the submission with the earliest last
modification timestamp.
• Every member of the group should collaborate, which should get reflected in the
commit logs.
Strathmore University
Grading

• Mid Sem CAT 1: 10%


• CAT 2: 10%
• End Sem Exam: 60%
• Project & Assignments : 20%

Strathmore University
What This Course is About

• Sports
• Movies
• Travel to Saturn
• Interviews
• Company Acquisitions
• (Not Kidding) 7

Strathmore University
What This Course is Really About

• Distributed Systems
• How to Design Algorithms for them
• How to Design The Systems
• How they work in real life
• How to build real distributed systems

Strathmore University
Many Students are Intimidated by Computer Science…

• … It’s a good sign that you will do well!

• (Ice skating story: Narration)

• Sometimes, one has to unlearn first, before learning.


9

Strathmore University
Our Main Goal Today

To Define the Term Distributed System

10

Strathmore University
Can you name some examples of
Distributed Systems?

11

Strathmore University
Can you name some examples of
Distributed Systems?

• Client-Server (NFS)
• The Web
• The internet
• A wireless network
• DNS
• Gnutella or BitTorrent (peer to peer overlays) 12

• A “cloud”, e.g., Amazon EC2/S3, Microsoft Azure


• A datacenter, e.g., NCSA, a Google datacenter, AWS

Strathmore University
What is a Distributed System?

13

Strathmore University
I asked Alexa…

14

Strathmore University
How does Leslie Lamport define Distributed System?

15

Strathmore University
FOLDOC definition

16

Strathmore University
Textbook definitions

• A distributed system is a collection of independent computers that


appear to the users of the system as a single computer.
[Andrew Tanenbaum]

• A distributed system is several computers doing something together.


Thus, a distributed system has three primary characteristics: multiple
computers, interconnections, and shared state. 17
[Michael Schroeder]

Strathmore University
Unsatisfactory

18

Strathmore University
19

Strathmore University
hich is a Distributed System – (A) or (B)?

(A)

20

(A) Facebook Social Network Graph among humans


Source: https://www.facebook.com/note.php?note_id=469716398919
(B)

21

(B) Peer to peer file-sharing system (Gnutella)


A working definition for us

A distributed system is a collection of entities, each of which is


autonomous, programmable, asynchronous and failure-prone,
and which communicate through an unreliable communication
medium.

• Entity=a process on a device (PC, PDA)


22
• Communication Medium=Wired or wireless network
• Our interest in distributed systems involves
• design and implementation, maintenance, algorithmics
Strathmore University
Gnutella Peer to Peer System

23

What are the “entities”


(nodes)?

Source: GnuMap Project What is the


communication medium
Strathmore University (links)?
Web Domains

What are the “entities”


(nodes)?

What is the
communication medium
(links)?

24

Source: http://www.vlib.us/web/worldwideweb3d.html

Strathmore University
Datacenter

25
What are the “entities”
(nodes)?

What is the
communication medium
(links)?
Strathmore University
An Intranet & a distributed system
email server Desktop
computers
print and other servers

Local area
Running over this Intranet Web server network
is a distributed file system

email server
print
File server
other servers
26
the rest of
the Internet
router/firewall

prevents unauthorized messages from leaving/entering;


implemented by filtering incoming and outgoing messages
via University
Strathmore firewall “rules” (configurable)
Does our Working Definition work for the http
Web?

A distributed system is a collection of entities, each of which is


autonomous, programmable, asynchronous and failure-prone,
and that communicate through an unreliable communication
medium.

• Entity=a process on a device (PC, PDA)


27
• Communication Medium=Wired or wireless network
• Our interest in distributed systems involves
• design and implementation, maintenance, study, algorithmics
Strathmore University
“Important” Distributed Systems Issues

• No global clock; no single global notion of the correct time (asynchrony)


• Unpredictable failures of components: lack of response may be due to
either failure of a network component, network path being down, or a
computer crash (failure-prone, unreliable)
• Highly variable bandwidth: from 16Kbps (slow modems or Google
Balloon) to Gbps (Internet2) to Tbps (in between DCs of same big
company)
• Possibly large and variable latency: few ms to several28 seconds
• Large numbers of hosts: 2 to several million

Strathmore University
Many Interesting Design Problems



• Real distributed systems
• Cloud Computing, Peer to peer systems, Hadoop, key-value stores/NoSQL, distributed
file systems, sensor networks, measurements, graph processing, stream processing, …
• Classical Problems
• Failure detection, Asynchrony, Snapshots, Multicast, Consensus, Mutual Exclusion,
Election, …
• Concurrency
• RPCs, Concurrency Control, Replication Control, Paxos, …
29
• Security
• ACLs, Capabilities, …
• Others…

Strathmore University
Typical Distributed Systems Design Goals

• Common Goals:
• Heterogeneity – can the system handle a large variety of types of PCs and devices?
• Robustness – is the system resilient to host crashes and failures, and to the network
dropping messages?
• Availability – are data+services always there for clients?
• Transparency – can the system hide its internal workings from the users? (warning:
term means the opposite of what the name implies!)
• Concurrency – can the server handle multiple clients simultaneously?
• Efficiency – is the service fast enough? Does it utilize 100% of all resources?
• Scalability – can it handle 100 million nodes without degrading
30
service?
(nodes=clients and/or servers) How about 6 B? More?
• Security – can the system withstand hacker attacks?
• Openness – is the system extensible?

Strathmore University
“Important” Issues

• If you’re already complaining that the list of topics we’ve


discussed so far has been perplexing…
• You’re right!
• It was meant to be (perplexing)

• The Goal for the Rest of the Course: see enough examples
and learn enough concepts so these topics31and issues will
make sense
• We will revisit many of these slides in the very last lecture of the
course!
Strathmore University
“Concepts”?

32

Strathmore University
What is a Distributed System?

"A distributed system is one in which the failure of a computer you


didn't even know existed can render your own computer unusable”
-- Leslie Lamport

Strathmore University
What is a Distributed System?

Multiple Computers
Strathmore University
What is a Distributed System?

Strathmore University
Wants to talk to each other
What is a Distributed System?

Strathmore University Having a common application goal ...


What is a Distributed System?

Strathmore University But, a system may fail ...


What is a Distributed System?

Strathmore University Or, may have external attacks ...


Examples of Distributed Systems

• Almost every large system that you use ...

Strathmore University
and ...

Strathmore University
Internet is
also a
Distributed
System

Strathmore University
You have already learned quite a few distributed algos ...

• Internet routing
• TCP congestion control
• Domain Name Systems
• Peer-to-Peer File Transfer (Have you used DC++ or Bittorrent?)
•…

• The fundamental primitives behind such systems


• Message passing
• Shared memory

Strathmore University
What Additional Are You Going to Learn Here?

• It is difficult to satisfy certain properties simultaneously in a distributed


system ...

Strathmore University
What Additional Are You Going to Learn Here?

• It is difficult to satisfy certain properties simultaneously in a distributed


system ...

Performance

Strathmore University
What Additional Are You Going to Learn Here?

• It is difficult to satisfy certain properties simultaneously in a distributed


system ...

Performance

Scalability

Strathmore University
What Additional Are You Going to Learn Here?

• It is difficult to satisfy certain properties simultaneously in a distributed


system ...

Availability
Performance

Scalability

Strathmore University
What Additional Are You Going to Learn Here?

• It is difficult to satisfy certain properties simultaneously in a distributed


system ...

tion
lica
p
Re Availability
Performance

Scalability

Strathmore University
What Additional Are You Going to Learn Here?

• It is difficult to satisfy certain properties simultaneously in a distributed


system ...

tion
lica
p
Re Availability
Agr
Performance eem
ent

Scalability

Strathmore University
What Additional Are You Going to Learn Here?

• It is difficult to satisfy certain properties simultaneously in a distributed


system ...

tion
lica
p
Re Availability
Agr
Performance eem
ent
Cac
hin
g Scalability

Strathmore University
What Additional Are You Going to Learn Here?

• It is difficult to satisfy certain properties simultaneously in a distributed


system ...

tion
lica
p
Re Availability
Agr
Performance eem
Cac
? ent
hin
g Scalability

Strathmore University
Books and References

• van Steen and Tanenbaum, Distributed Systems (any edition)


• Free e-book available: https://www.distributed-systems.net/index.php/books/ds3/

• Bacon and Harris, Operating Systems: Concurrent and Distributed Software


Design, Addison-Wesley 2003

• A. D. Kshemkalyani and M. Singhal, Distributed Algorithms: Principles,


Algorithms, and Systems

• We'll follow various papers and articles, will refer them during discussing
different topics

Strathmore University
Some Conferences and Journals to Follow ...

• PODC
• DISC
• ICDCS
• OSDI/SOSP
• ASPLOS
• Usenix ATC
• IEEE Transactions on Parallel and Distributed Systems
• ACM Transactions on Computer Systems

Strathmore University
Some running
systems that
you might be
using ….

But you don't


know about ...
Strathmore University
Facebook Shard Manager

• Data from billions of users, are stored in many databases

• Sharing: "a way to scale out services to support high throughput"

• Divide the data into shards and allocate servers for individual shards
• Spread the load across different databases
• Failure of shards (hardware or software failure)

Strathmore University
Facebook Shard Manager

• Maintain multiple replicas for each shard


• Why? Data can be rerouted from another shard when one shard fails

• Challenge: During data update, how do you ensure that all the replicas of a
shard are consistent?
Shard 1
Replica 1

Shard 1 Shard 1
Primary Replica 2

Shard 1
Replica 3
Strathmore University
Facebook Shard Manager

• Maintain multiple replicas for each shard


• Why? Data can be rerouted from another shard when one shard fails

• Challenge: During data update, how do you ensure that all the replicas of a
shard are consistent?
Shard 1
Replica 1

Shard 1 Shard 1
Primary Replica 2

Shard 1
Replica 3
Strathmore University
Facebook Shard Manager

• Maintain multiple replicas for each shard


• Why? Data can be rerouted from another shard when one shard fails

• Challenge: During data update, how do you ensure that all the replicas of a
shard are consistent?
Shard 1
Replica 1

Shard 1 Shard 1
Primary Replica 2

Shard 1
Replica 3
Strathmore University
Facebook Shard Manager

• Maintain multiple replicas for each shard


• Why? Data can be rerouted from another shard when one shard fails

• Challenge: During data update, how do you ensure that all the replicas of a
shard are consistent?
Shard 1
Replica 1
Failed
Shard 1 Shard 1
Primary Replica 2

Shard 1
Replica 3
Strathmore University
Facebook Shard Manager

• Maintain multiple replicas for each shard


• Why? Data can be rerouted from another shard when one shard fails

• Challenge: During data update, how do you ensure that all the replicas of a
shard are consistent?
Shard 1
Replica 1
Failed
Shard 1 Shard 1
Primary Replica 2

Shard 1
Replica 3
Strathmore University
Facebook Shard Manager

• Maintain multiple replicas for each shard


• Why? Data can be rerouted from another shard when one shard fails

• Challenge: During data update, how do you ensure that all the replicas of a
shard are consistent?
Shard 1
Replica 1
Failed
Shard 1 Shard 1
Primary Replica 2

Shard 1
Replica 3
Strathmore University
Facebook Shard Manager

• The classical problem of distributed consensus / agreement


• The shard manager needs to scale up with millions of shards per application
• Further reading:
https://engineering.fb.com/2020/08/24/production-engineering/scaling-serv
ices-with-shard-manager/

Strathmore University
Some Other Distributed Systems from Facebook

• Facebook Ordered Queuing Service (FOQS) -- A distributed priority queue to


store and process microservice works and pass them from one microservice
to another
• https://engineering.fb.com/2021/02/22/production-engineering/foqs-scaling-a-distrib
uted-priority-queue/

• Async: Distributed asynchronous computing for Facebook applications


• https://engineering.fb.com/2020/08/17/production-engineering/async/

• NTP Service for Facebook


• https://engineering.fb.com/2020/03/18/production-engineering/ntp-service/

Strathmore University
Distributed Computing @ Google

• Pathways: Asynchronous Distributed Data Flow for ML


• https://research.google/pubs/pub51473/

• Debugging incidents in Google's distributed system


• https://research.google/pubs/pub49291/

• Monarch: Google's Planet-Scale In-Memory Time Series Database


• https://research.google/pubs/pub50652/

• Sundial: Fault-tolerant Clock Synchronization for Datacenters


• https://research.google/pubs/pub49716/

Strathmore University
Some Other Resources

• Amazon Builder's library: https://aws.amazon.com/builders-library/

• An interesting collection of materials on distributed systems


• https://github.com/theanalyst/awesome-distributed-systems

Strathmore University
Strathmore University

You might also like