0% found this document useful (0 votes)

33 views35 pages

01 en Principles of Distributed Systems

Uploaded by

yagcifuad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views35 pages

01 en Principles of Distributed Systems

Uploaded by

yagcifuad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 35

BASICS OF DISTRIBUTED SYSTEMS

INTRODUCTION TO THE COURSE

DEFINITION OF DISTRIBUTED SYSTEM

 A collection of independent computers connected through a

communication network that work together to accomplish some goal

 No shared operating system

 No shared memory
 No shared clock
SINGLE SYSTEM IMAGE

 Collection of independent computers that appears as a single system

to the user(s)

 Independent = autonomous, self-contained

 Single system = user not aware of distribution
FLYNN’S TAXONOMY (1966)

 Classify computer architectures by looking at the number of instruction streams and number of data streams
 SISD — Single Instruction, Single Data stream
 Traditional uniprocessor systems
 SIMD — Single Instruction, Multiple Data streams
 Array (vector) processors
 Examples:
 GPUs – Graphical Processing Units for computer graphics, GPGPU (General Purpose GPU): AMD/ATI, NVIDIA
 AVX: Intel’s Advanced Vector Extensions

 MISD — Multiple Instructions, Single Data stream

 Multi-layer data processing
 MIMD — Multiple Instruction, Multiple Data streams – Multiple computers, each with a program counter, program (instructions),
data
 Parallel and distributed systems
SUBCLASSIFYING MIMD

 Memory
 Shared memory systems: multiprocessors
 No shared memory: networks of computers, multicomputers
 Interconnect
 Bus
 Switch
 Delay/bandwidth
 Tightly coupled systems
 Loosely coupled systems
MULTIPROCESSORS AND MULTICOMPUTERS

 Multiprocessors
 Shared memory
 Shared clock
 Shared operating system
 All-or-nothing failure

 Multicomputers (networks of computers) ⇒ distributed systems

 No shared memory
 No shared clock
 Partial failures
 Inter-computer communication mechanism needed: the network
WHY DO WE WANT DISTRIBUTED SYSTEMS?

 Scale
 Collaboration
 Reduced latency
 Mobility
 High availability & Fault tolerance
 Incremental cost
 Delegated infrastructure & operations
SCALE – INCREASING PERFORMANCE

 Computers are
getting faster
 Moore's Law
prediction:
performance doubles
approximately every
18 months because
of faster transistors
and more transistors
per chip
SCALING A SINGLE SYSTEM HAS LIMITS

 Getting harder for technology to keep up with Moore's law

 More cores per chip → requires multithreaded programming
 There are limits to the die size and # of transistors
 Intel Xeon W-3175X CPU: 28 cores per chip ($2,999/chip!)
 8 billion transistors, 255 watts @ 3.1-4.3 GHz
 AMD EPYC 7601 CPU: 32 cores per chip ($4,200/chip)
 19.2 billion transistors, 180 watts
 NVIDIA GeForce RTX 2080 Ti: 4,352 CUDA cores per chip
 Special purpose apps: Graphics rendering, neural networks
OUR COMPUTING NEEDS EXCEED CPU ADVANCES

 Google
 Over 63,000 search queries per second on average
 Over 130 trillion pages indexed
 Uses hundreds of thousands of servers to do this
 In 1999, it took Google one month to crawl and build an index of about 50 million pages
 In 2012, the same task was accomplished in less than one minute.
 16% to 20% of queries that get asked every day have never been asked before
 Every query has to travel on average 1,500 miles to a data center and back to return the answer to the user
 A single Google query uses 1,000 computers in 0.2 seconds to retrieve an answer
 Facebook
 Approximately 100M requests per second with 4B users
COLLABORATION AND CONTENT

 Systems exchange their data and functionality

 Metcalfe’s Law states the value of a telecommunications network is
proportional to the square of the number of connected users of the
system.
REDUCED LATENCY

 Cache data close to where it is needed

 Caching vs. replication
 Replication: multiple copies of data for increased fault tolerance
 Caching: temporary copies of frequently accessed data closer to where it’s
needed
 Some caching services:
 Akamai, Cloudflare, Amazon Cloudfront, Apache Ignite
MOBILITY

 3.5 billion smartphone users

 Remote sensors
 Cars
 Traffic cameras
 Toll collection
 Shipping containers
 Vending machines

 IoT = Internet of Things

 2017: more IoT devices than
humans
HIGH AVAILABILITY

 Redundancy = replicated components

 Service can run even if some systems die
 If P(any one system down) = 5%
 P(two systems down at the same time) = 5% × 5% = 0.25%
 Uptime = 1 – downtime = 1 – 0.0025 = 99.75%

 We get 99.7% uptime instead of 95% because we need both

replicated components to fail instead of just one.
HIGH AVAILABILITY?

 No redundancy = dependence on all components

 If we need all systems running to provide a service
 P(any system down) = 1 - P( A is up AND B is up )
 = 1 - (1-5%) × (1-5%) = 1 - 0.95 × 0.95 = 9.75%
 ⇒ 39x greater than a single component failure with redundancy!
 Uptime = 1 – downtime = 1 – 0.0975 = 90.25%

 With a large # of systems, P(any system down) approaches 100% !

 Requiring a lot of components to be up & running is a losing proposition.
 With large enough systems, something is always breaking!
AVAILABILITY REQUIRES FAULT TOLERANCE

 Fault tolerance
 Identify & recover from component failures

 Recoverability
 Software can restart and function
 May involve restoring state
INCREMENTAL COST

 Version 1 does not have to be the full system

 Add more servers & storage over time
 Scale also implies cost – you don’t need millions of $ for v1.0
DELEGATED OPERATIONS

 Offload responsibility
 Let someone else manage systems
 Use third-party services
 Speed deployment
 Don’t buy & configure your own systems
 Don’t build your own data center
 Modularize services on different systems
 Dedicated systems for storage, email, etc.
 Use cloud, network attached storage
 Let someone else figure out how to expand storage and do backups
TRANSPARENCY

 High level: hide distribution from users

 Low level: hide distribution from software

 Location transparency
 Users don’t care where resources are
 Migration transparency
 Resources move at will
 Replication transparency
 Users cannot tell whether there are copies of resources
 Concurrency transparency
 Users share resources transparently
 Parallelism transparency
 Operations take place in parallel without user’s knowledge
CORE CHALLENGES IN DISTRIBUTED SYSTEMS DESIGN

 Concurrency
 Latency
 Partial Failure
 Security
CONCURRENCY

 Lots of requests may occur at the same time

 Need to deal with concurrent requests
 Need to ensure consistency of all data
 Understand critical sections & mutual exclusion
 Beware: mutual exclusion (locking) can affect performance

 Replication adds complexity

 All operations must appear to occur in the same order on all replicas
LATENCY

 Synchronous network model

 There is some upper bound, T, between when a node sends a message and another node receives it
 Knowing T enables a node to distinguish between a node that has failed and a node that is taking a long time to
respond
 Partially synchronous network model
 There’s an upper bound for message communication but the programmer doesn’t know it – it has to be
discovered
 Protocols will operate correctly only if all messages are received within some time, T
 We cannot make assumptions on the delay time distribution
 Asynchronous network model
 Messages can take arbitrarily long to reach a peer node
 This is what we get from the Internet
LATENCY AND ASYNCHRONOUS NETWORKS

 Asynchronous networks can be a pain

 Messages may take an unpredictable amount of time
 We may think a message is lost but it’s really delayed
 May lead to retransmissions  duplicate messages
 May lead us to assume a service is dead when it isn’t
 May mess with our perception of time
 May cause messages to arrive in a different order … or a different order on
different systems
LATENCY

 Speed up data access via caching – temporary copies of data

 Keep data close to where it’s processed to maximize efficiency
 Memory vs. disk
 Local disk vs. remote server
 Remote memory vs. remote disk
 Cache coherence: cached data can become stale
 Underlying data can change → cache needs to be invalidated
 System using the cache may change the data → propagate results
 Write-through cache
 But updates take time ⇒ can lead to inconsistencies (incoherent views)
PARTIAL FAILUTE

Failure is a fact of life in distributed systems!

 In local systems, failure is usually total (all-or-nothing)
 In distributed systems, we get partial failure
 A component can fail while others continue to work
 Failure of a network link is indistinguishable from a remote server failure
 Send a request but don't get a response ⇒ what happened?
 No global state
 There is no global state that can be examined to determine errors
 There is no agent that can determine which components failed and inform everyone else
 Need to ensure the state of the entire system is consistent after a failure
HANDLING FAILURES

 Handle detection, recovery, and restart

 Availability = fraction of time system is usable
 Achieve with redundancy
 But then consistency is an issue!

 Reliability: data must not get lost

 Includes security
SYSTEM FAILURE TYPES

 Fail-stop
 Failed component stops functioning
 Halting = stop without notice
 Detect failed components via timeouts
 But you can’t count on timeouts in asynchronous networks
 And what if the network isn’t reliable?
 Sometimes we guess
 Fail-restart
 Component stops but then restarts
 Danger: stale state
NETWORK FAILURE TYPES

 Omission
 Failure to send or receive messages
 Due to queue overflow in router, corrupted data, receive buffer overflow

 Timing
 Messages take longer than expected
 We may assume a system is dead when it isn't

 Unsynchronized clocks can alter process coordination

 Partition
 Network fragments into two or more sub-networks that cannot communicate with each other
NETWORK AND SYSTEM FAILURE TYPES

 Fail-silent
 A failed component (process or hardware) does not produce any output

 Byzantine failures
 Instead of stopping, a component produces faulty data
 Due to bad hardware, software, network problems, or malicious interference

 Goal: avoid single points of failure

REDUNDANCY

 We deal with failures by adding redundancy

 Replicated components

 But this means we need to keep the state of those components

replicated
STATE, REPLICAS, AND CACHES

 State
 Information about some component that cannot be reconstructed
 Network connection info, process memory, list of clients with open files, lists of which
clients finished their tasks
 Replicas
 Redundant copies of data → used to address fault tolerance

 Cache
 Local storage of frequently-accessed data to reduce latency → used to address latency
NO GLOBAL KNOWLEDGE

 Nobody has the true global state of a system

 There is no global state that can be examined to determine errors
 There is no agent that can determine which components failed and inform everyone
else
 No shared memory
 A process knows its current state
 It may know the last reported state of other processes
 It may periodically report its state to others
No foolproof way to detect failure in all cases
SECURITY

 The environment
 Public networks, remotely-managed services, 3rd party services
 Some issues
 Malicious interference, bad user input, impersonation of users & services
 Protocol attacks, input validation attacks, time-based attacks, replay attacks
Rely on authentication, cryptography (hashes, encryption) … and good defensive
programming!
 Users also want convenience
 Single sign-on, no repeated entering of login credentials
 Controlled access to services
KEY APPROACHES IN DISTRIBUTED SYSTEMS

 Divide & conquer

 Break up data sets (sharding) and have each system work on a small part
 Merging results is usually the easy & efficient part

 Replication
 For high availability, caching, and sharing data
 Challenge: keep replicas consistent even if systems go down and come up

 Quorum/consensus
 Enable a group to reach agreement
QUESTIONS?
NOW, BY E-MAIL, …

Distributed System Notes Midsem
No ratings yet
Distributed System Notes Midsem
183 pages
Distributed Computing
No ratings yet
Distributed Computing
37 pages
CSE352 Lecture9 DistributedSystemsDesignInto
No ratings yet
CSE352 Lecture9 DistributedSystemsDesignInto
98 pages
Distributed Computing: Beakal Gizachew Assefa
No ratings yet
Distributed Computing: Beakal Gizachew Assefa
54 pages
Chapter 1 - Characterization of Distributed Systems
No ratings yet
Chapter 1 - Characterization of Distributed Systems
20 pages
ADSU1VFTVF25
No ratings yet
ADSU1VFTVF25
118 pages
Distributed Systems Report
100% (1)
Distributed Systems Report
36 pages
Introduction To Distributed Systems
No ratings yet
Introduction To Distributed Systems
36 pages
Unit1chap 01 DTU IntroBTech2020Handout
No ratings yet
Unit1chap 01 DTU IntroBTech2020Handout
59 pages
Chapter 1-Introduction To Distributed Systems
No ratings yet
Chapter 1-Introduction To Distributed Systems
59 pages
Distributed Systems
No ratings yet
Distributed Systems
17 pages
10 - Introduction and Overview GPGPU
100% (1)
10 - Introduction and Overview GPGPU
69 pages
Distributed Systems Chapter 1-Introduction
No ratings yet
Distributed Systems Chapter 1-Introduction
32 pages
Chapter 1 Introduction
No ratings yet
Chapter 1 Introduction
60 pages
Distributed Systems Introduction
No ratings yet
Distributed Systems Introduction
40 pages
DC IA 1 Syllabus Prep
No ratings yet
DC IA 1 Syllabus Prep
11 pages
Chapter 1 DC
No ratings yet
Chapter 1 DC
44 pages
Distributed Systems Introduction
No ratings yet
Distributed Systems Introduction
21 pages
Chapter 2
No ratings yet
Chapter 2
61 pages
Control Software v7.1.1 Release Notes: Foxboro Evo Process Automation System
No ratings yet
Control Software v7.1.1 Release Notes: Foxboro Evo Process Automation System
47 pages
Mct702 All Units
No ratings yet
Mct702 All Units
747 pages
DS Mod 1
No ratings yet
DS Mod 1
44 pages
PDC-2.1 Updated Design
No ratings yet
PDC-2.1 Updated Design
121 pages
Chapter 1 Introduction
No ratings yet
Chapter 1 Introduction
42 pages
Introduction
No ratings yet
Introduction
59 pages
Distributed Systems Characterization and Design
No ratings yet
Distributed Systems Characterization and Design
35 pages
Introduction To Distributed Systems: CSE 380 Computer Operating Systems
No ratings yet
Introduction To Distributed Systems: CSE 380 Computer Operating Systems
17 pages
Lecture 1
No ratings yet
Lecture 1
36 pages
Chapter One
No ratings yet
Chapter One
42 pages
Chapter 1-Introduction
No ratings yet
Chapter 1-Introduction
45 pages
DS Syllabus Introduction (Reference)
No ratings yet
DS Syllabus Introduction (Reference)
44 pages
CUDA Cuts Fast Graph Cuts On The GPU
No ratings yet
CUDA Cuts Fast Graph Cuts On The GPU
8 pages
RMCS
No ratings yet
RMCS
127 pages
Chapter 1 Introduction
No ratings yet
Chapter 1 Introduction
53 pages
Chapter 1
No ratings yet
Chapter 1
36 pages
Dr. Sanjay P. Ahuja, Ph.D. FIS Distinguished Professor of CIS School of Computing UNF
No ratings yet
Dr. Sanjay P. Ahuja, Ph.D. FIS Distinguished Professor of CIS School of Computing UNF
14 pages
Parallel Computing Simply in Depth by Ajit Singh PDF
No ratings yet
Parallel Computing Simply in Depth by Ajit Singh PDF
125 pages
Distributed Systems Characterisation and Design: 1DT057 D I S
No ratings yet
Distributed Systems Characterisation and Design: 1DT057 D I S
35 pages
Destributed System Lecture Note Finale
No ratings yet
Destributed System Lecture Note Finale
148 pages
Chapter 6 BasicsDS
No ratings yet
Chapter 6 BasicsDS
38 pages
Des Tribute D
No ratings yet
Des Tribute D
8 pages
Lecture17 12
No ratings yet
Lecture17 12
86 pages
Distributed Systems Characterization and Design: 1DT057 D I S
No ratings yet
Distributed Systems Characterization and Design: 1DT057 D I S
35 pages
Lecture 1
No ratings yet
Lecture 1
55 pages
Tegra K1 DataSheet DS06742001v02
No ratings yet
Tegra K1 DataSheet DS06742001v02
83 pages
Chapter 1
No ratings yet
Chapter 1
55 pages
Introduction To Distributed Systems: BY: Sunita Sahu Assistant Professor, VESIT, Mumbai
No ratings yet
Introduction To Distributed Systems: BY: Sunita Sahu Assistant Professor, VESIT, Mumbai
48 pages
Chapter 1 (A) - Distribted System
No ratings yet
Chapter 1 (A) - Distribted System
40 pages
Cover Sheet: This Is The Author Version of Article Published As
No ratings yet
Cover Sheet: This Is The Author Version of Article Published As
16 pages
Objectives: To Learn The Concept and Characteristics of Distributed System
No ratings yet
Objectives: To Learn The Concept and Characteristics of Distributed System
23 pages
Chapter 1 - Introduction
No ratings yet
Chapter 1 - Introduction
44 pages
L1 Introduction
No ratings yet
L1 Introduction
32 pages
Chapter One
No ratings yet
Chapter One
40 pages
Distributed Systems: An Introduction
No ratings yet
Distributed Systems: An Introduction
43 pages
Enhancing Performance OF Autosar Transformer and Autosar Applications Using Gpus
No ratings yet
Enhancing Performance OF Autosar Transformer and Autosar Applications Using Gpus
100 pages
GPU Benchmarking Report
No ratings yet
GPU Benchmarking Report
45 pages
лк CUDA - 1 PDCn
No ratings yet
лк CUDA - 1 PDCn
31 pages
Lecture 2 On Distributed Systems
No ratings yet
Lecture 2 On Distributed Systems
45 pages
Chapter 1 Introduction
No ratings yet
Chapter 1 Introduction
73 pages
Distributed System
No ratings yet
Distributed System
37 pages
Loop Fusion in Haskell
No ratings yet
Loop Fusion in Haskell
90 pages
Chapter 1 - Introduction - 2023 - Programming Massively Parallel Processors
No ratings yet
Chapter 1 - Introduction - 2023 - Programming Massively Parallel Processors
20 pages
CUDA Algorithms For JAVA
No ratings yet
CUDA Algorithms For JAVA
45 pages
Lecture 1 - Fundamentals of Distributed System
No ratings yet
Lecture 1 - Fundamentals of Distributed System
13 pages
Cuda Program + Wait For User Input
No ratings yet
Cuda Program + Wait For User Input
2 pages
CS407 M1 Ktunotes - in
No ratings yet
CS407 M1 Ktunotes - in
9 pages
ECE 498AL The CUDA Programming Model
No ratings yet
ECE 498AL The CUDA Programming Model
37 pages
Distributed Systems (Cosc 6003) : Chapter 1 - Introduction
No ratings yet
Distributed Systems (Cosc 6003) : Chapter 1 - Introduction
37 pages
Pawan 09 Graph Algorithms
No ratings yet
Pawan 09 Graph Algorithms
26 pages
Chapter 1-Introduction
No ratings yet
Chapter 1-Introduction
35 pages
GPU Programming: Dr. Florian Ferreira
No ratings yet
GPU Programming: Dr. Florian Ferreira
101 pages
A Supercomputer Is A Computer That Is at The Frontline of Current Processing Capacity
No ratings yet
A Supercomputer Is A Computer That Is at The Frontline of Current Processing Capacity
17 pages
Distributed Systems Chapter 1-Introduction
No ratings yet
Distributed Systems Chapter 1-Introduction
34 pages
Paper A Parallel Bloom Filter String Searching Algorithm
No ratings yet
Paper A Parallel Bloom Filter String Searching Algorithm
28 pages
Distributed Systems
No ratings yet
Distributed Systems
47 pages
A Seminar Report On GPU by M.Marshal Murmu (1801109169)
No ratings yet
A Seminar Report On GPU by M.Marshal Murmu (1801109169)
28 pages
Teslapersonalsupercomputer 160201192005
No ratings yet
Teslapersonalsupercomputer 160201192005
16 pages
Low-Cost Real-Time Sar Simulation For Applications in Mission Planning, Education and Information Extraction
No ratings yet
Low-Cost Real-Time Sar Simulation For Applications in Mission Planning, Education and Information Extraction
5 pages
Module - 01 CC (BCS601)
No ratings yet
Module - 01 CC (BCS601)
47 pages
Chapter 1n2 Characterization and Design Goals
No ratings yet
Chapter 1n2 Characterization and Design Goals
48 pages
Comparative Study On CPU GPU and TPU
No ratings yet
Comparative Study On CPU GPU and TPU
9 pages
Nvidia-Learning-Training Course-Catalog
No ratings yet
Nvidia-Learning-Training Course-Catalog
32 pages
Graphical Display: Graphics Cards Tested : Release 19.1 Minimum Graphics Requirements
No ratings yet
Graphical Display: Graphics Cards Tested : Release 19.1 Minimum Graphics Requirements
2 pages
Parallel and Distributed Computing: Composed By: Danish Khan
No ratings yet
Parallel and Distributed Computing: Composed By: Danish Khan
114 pages
Abaqus New Function in V6.11
100% (1)
Abaqus New Function in V6.11
121 pages
Lez.b-06 - nVIDIA GPU and Servers
No ratings yet
Lez.b-06 - nVIDIA GPU and Servers
18 pages
The Video Card Is An Expansion Card That Allows The Computer To Send Graphical Information To A Video Display Device Such As A or Projector
No ratings yet
The Video Card Is An Expansion Card That Allows The Computer To Send Graphical Information To A Video Display Device Such As A or Projector
6 pages
PHP Microservices
From Everand
PHP Microservices
Carlos Pérez Sánchez
3/5 (1)
Hack into your Friends Computer
From Everand
Hack into your Friends Computer
Magelan Cyber Security
No ratings yet
Cloud Infrastructure and Data Center
From Everand
Cloud Infrastructure and Data Center
Duong Tran
No ratings yet

01 en Principles of Distributed Systems

Uploaded by

01 en Principles of Distributed Systems

Uploaded by

BASICS OF DISTRIBUTED SYSTEMS

INTRODUCTION TO THE COURSE

 A collection of independent computers connected through a

 No shared operating system

 Collection of independent computers that appears as a single system

 Independent = autonomous, self-contained

 MISD — Multiple Instructions, Single Data stream

 Multicomputers (networks of computers) ⇒ distributed systems

 Getting harder for technology to keep up with Moore's law

 Systems exchange their data and functionality

 Cache data close to where it is needed

 3.5 billion smartphone users

 IoT = Internet of Things

 Redundancy = replicated components

 We get 99.7% uptime instead of 95% because we need both

 No redundancy = dependence on all components

 With a large # of systems, P(any system down) approaches 100% !

 Version 1 does not have to be the full system

 High level: hide distribution from users

 Lots of requests may occur at the same time

 Replication adds complexity

 Synchronous network model

 Asynchronous networks can be a pain

 Speed up data access via caching – temporary copies of data

Failure is a fact of life in distributed systems!

 Handle detection, recovery, and restart

 Reliability: data must not get lost

 Unsynchronized clocks can alter process coordination

 Goal: avoid single points of failure

 We deal with failures by adding redundancy

 But this means we need to keep the state of those components

 Nobody has the true global state of a system

 Divide & conquer

You might also like