0% found this document useful (0 votes)
35 views9 pages

System Design

The document discusses the key principles of reliability, scalability, and maintainability in data-intensive applications, emphasizing the need for combining various tools to meet diverse requirements. It highlights the importance of designing systems that can handle faults, scale with increasing load, and remain maintainable over time while adapting to new use cases. The document also outlines strategies for achieving these principles, including fault tolerance, performance measurement, and simplifying system complexity.

Uploaded by

Divyam Thakur
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as ODT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views9 pages

System Design

The document discusses the key principles of reliability, scalability, and maintainability in data-intensive applications, emphasizing the need for combining various tools to meet diverse requirements. It highlights the importance of designing systems that can handle faults, scale with increasing load, and remain maintainable over time while adapting to new use cases. The document also outlines strategies for achieving these principles, including fault tolerance, performance measurement, and simplifying system complexity.

Uploaded by

Divyam Thakur
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as ODT, PDF, TXT or read online on Scribd
You are on page 1/ 9

Reliability Scalability Maintainability

data intensive application

databases
Caches
Search indexes
Stream processing
batch processing

We may need to combine different tools together so we can find best matches for our application ,
because everything provide different characterstics

Different design descision to make by sd

Data system -> data base and cache

for so many requirement, a single software cannot meet all the requirement, so a single system is
focused on performing the task and these software are stitched together using code.

using different software for cacheing and searching and stiching them together

screenshot
How to mainatain the data correctly if something goes wrong, this is aslo task of system designer

Reliability
The system should continue to work correctly (performing the correct function at the desired level
of performance) even in the face of adversity (hardware or soft‐
ware faults, and even human error).

Scalability
As the system grows (in data volume, traffic volume, or complexity), there should
be reasonable ways of dealing with that growth

Maintainability
Over time, many different people will work on the system (engineering and oper‐
ations, both maintaining current behavior and adapting the system to new use
cases), and they should all be able to work on it productively.

Reliability :
1. User Expected Performance
2. able handle cases where user make unexpected mistake or error
3. unauthorized acces and abuse -> prevent

reliability -> continuing to work properly when things go wrong

things go wrong is fault and the system that cops with this is falut tolerant and resilent.

fault -> defined as one component deviating from its spec,

failure -> means system as whole stop giving output to system .

Netflix chaos monkey - > fault tolerant testing software

Hardware faults

usallly reduced by adding redudancy in software

if one breaks other can take its place

software errors :

software errors are hard to anticipate and correlated to each other.

when these faults happen, in before faul software is making judgement about its surrounding and it
faults happem then it stop becomming true;

no quick solution for it

human error :
we make the mistake even our best intenstion , it happen mostly because of us , so what to do to
avoid this :

1 design a system that has abstraction, but not to much

2. decouple the places where most mistake happen and provide sandbox enviornment where user
can experiment on their own

3. testing thoroughly at all level using unit testing to whole system integration
4. allow quuick and easy recovery from human errrors, to minimize the impact in the case of a
failure and make it easy to rollback changes
5. set up clear and detailed monitoring system
6. implement good management practices and training .

Importance of reliability :
reliabilty matters. but can be scarificed acc. development needs. but we should be very councious
when we are cutting corner

Scalability

scalability
working reliable today doesn't mean working relaible tommor, main reason degradation due to
increase load
scalability -> system capability to cope with load

if sytsem grow in particular way what are our option of coping with the load

How can we add computational resources to add additional load

Describing load

load parameter -> to calculate the load

Describe perfromance

2 ways to look when load is increasing

1. when load is incereased and all the parameter is kept same, see how it affect the system .

2. when u increase the performance , how much u needed to keep the performance same

throughput : no. of process can processed per sec.


we see response time not as a single no, but as distribution of values that u can measure

in a scenario where you’d


think all requests should take the same time, you get variation: random additional
latency could be introduced by a context switch to a background process, the loss of a
network packet and TCP retransmission, a garbage collection pause, a page fault
forcing a read from disk, mechanical vibrations in the server rack [18], or many other
causes.

we don't use mean of response time as a percentile because it doesn;t tell us how many user actually
experienced the delay, but use percentiles

median is good measure of, now think own of won why its is good metric

hint -> median refers to 50th percentile

high percentile refers to as tail latencies

it important to measure the response time in client side

an
effect known as tail latency amplification

figure
Appraoches for coping load

how do we maintain good performance when our load parameter increases by some ammount

scaling up -> vertical scaling -> increasing the specs of current machine
scaling out -> horizontal scaling -> distributing the load across multiple system. -> it is also known
as shared-nothing architecture.

good eg -> pragamatic mixture of both the approaches

elastic and manual scaling


elastic -> do own its own
manual -> do by human

for database that has state -> its common knowlegde to use scale up ubtill scaling cost is high
forced us to make it distributed.

stateless -> when each request is independ of each other.


means can be handle by each server individually, without interfering or needing assistance of other

distributed data system.

for app. that has large scale is usally highly specific to the application, there is no such thing as a
generic, one size fit all scalable architecture -> magic scaling sauce

which load parameter will be common and which will be rare,l eads to development of architecture

Maintainability

majority of cost goes into ongoing maintenance -fixing bugs, keeping its systems operational,
investigating failures, adapting it to new platforms, modifying it for new use cases,
repaying technical debt, and adding new features.

legacy system-> oudated system that is outdated system that is still in use .

should write code in such way -> that reduces cost of maintainig it to minimilast

three design principle for software system


1. Operability : make the operation system to keep the system running smoothly.

2. simplicity :
make it easy for new engineers to understan the system, by removind as much complexity as
possible from the system

3. Evolvability : make it easy fo engineer to make change to the system in the fututre, adapting it
for unanticipated use cases as rquirement change -> ectensility, modifiability or plasticity.
operability : making life easy for operations

“good operations can often work around the limitations of


bad (or incomplete) software, but good software cannot run reliably with bad opera‐
tions"

responsibility of operation

*moitoring the health of the system and quickly restoring service if it goes into a bad state

*tracking down the cause of problem , such as system failure or degraded performance

*keeping software and platform up to date including security patched

* keeping how system effect each other

* antcipating future problem

*good practice and tools for deployment

* maintaining the security

*Defining the process and help keep process stable

*preserving organisation knowledge

Good operability

Simplicity: Managing Complexity :


a software project that is mired in complexicity is sometimes described as a big ball of mud.

There are various possible symptoms of complexity: explosion of the state space, tight
coupling of modules, tangled dependencies, inconsistent naming and terminology,
hacks aimed at solving performance problems, special-casing to work around issues
elsewhere, and many more.

Making a system simpler does not necessarily mean reducing its functionality, it removing
accidental complexiciyt

moseley and marks define complexiity as accidental if it is not inherent in the problem that problem
that software solve but arises only from the implementation.

removing accidental complexity is abstraction .


find good abstaction is hard, it is much less clear how we should do it.

extract part of a system into a abstarcted component

Evolvability: making Change Easy.

Evolvalibity

agile communtiy developed tool for maintiang the tdd and refactoring

You might also like