0% found this document useful (0 votes)
88 views11 pages

System Design

Uploaded by

Mohit Agarwal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
88 views11 pages

System Design

Uploaded by

Mohit Agarwal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 11

note : nginx documentation

-> http://nginx.org/en/docs/

note :
-> user -> https/http => reverse proxy => server => services on different port =>
database
-> admin -> https/http => reverse proxy => server => services on different port =>
database

note : diagramatic representation


-> user = user icon
-> server = rectangle
-> db = cylinder
-> all arrows : non overlapping and intersecting ; horizontal and vertical
-> sync : single direction arro
-> async messaging queue

note : gateway protocols : http / https


=> http : if we do not configure certificates in http server then it is http
=> https : if we configure signed or self signed certificates in http server then
it is https
=> generally we do not protect services by https, instead we protect reverse proxy
server with certicicates
=> it requires two things to be configured in nginx to make it https : certificate
and key

note : openssl
-> certificates and keys are generated with help of openssl
-> we can generate ssl certificates with help of openssl windows and linux
-> configuring ssl certificates with nginx :
http://nginx.org/en/docs/http/configuring_https_servers.html
-> https server optimization with nginx :
http://nginx.org/en/docs/http/configuring_https_servers.html

note : open ssl installation


-> https://thesecmaster.com/procedure-to-install-openssl-on-the-windows-platform/
-> https://openssl.org/source/gitrepo.html

note : openssl certificate generation


-> https://devopscube.com/create-self-signed-certificates-openssl/

note : nginx functionalities


-> we can server frontend via nginx
-> we can configure self signed certificates with nginx to make it https protocol
-> we can pass proxy to different server and services via nginx
=> conf file : https://www.nginx.com/resources/wiki/start/topics/examples/full/

note : load balancer


-> it helps to balance load between servers: it helps to handle 1000 requests per
seconds
=> different types of load balancing with nginx :
http://nginx.org/en/docs/http/load_balancing.html#nginx_load_balancing_methods
=> loadbalancing: nginx :
https://docs.nginx.com/nginx/admin-guide/load-balancer/http-load-balancer/
=> load balancing through different servers and ports :
https://www.youtube.com/watch?v=v81CzSeiQjo&ab_channel=LinuxAcademy

note : types of load balancing strategy (server picking strategy) : round robin ;
least connection ; ip_hash

note : types of balancers


-> layer 4 : transport layer (it can access tcp/udp,ip,port)
-> layer 7 : app layer (it can access http header, cookies, payload)
-> loadbalancers : nginx, haproxy, traefik

note : benefits of load balancer


-> resilience: if one of the server fails then load balancer sends request to
another server
-> scalability : if request gets twice then
=> if you increase number of servers : horizontal scaling
=> it optimizes the request load on servers : vertical scaling

***********************************************************************************
***********************************************************************************
*******************

note : content delivery network cdn


-> if a user visit yyy.com then browser needs to load content on page
-> if content is in near server then it will take less time; else more time
-> if we increase servers then it can become faster but costly
=> instead of putting servers ; we put cache in each location to store static
content (public folder : images, css, html, javascript)
=> this cache is called cdn server less costly then actual server machine

note : types of cdn


-> push cdn: when you upload a new static content on central server, it will
automatically get pushed to all cdn irrespective of user requests
-> pull cdn : when a user request a updated static content then only it will be
fetched from centeral server and cached to local cdn

note : benefits
-> place static files closer to user
-> reduce latency and cost
-> increase complexity of system

note : amazon : 216 points : ~50 countries


-> cloud front is content delivery network of amazon
-> cloud front : https://www.youtube.com/watch?v=-
DDGYzKtNwc&ab_channel=FunnelGarden

***********************************************************************************
***********************************************************************************
*******************

note : caching
-> code <-> cache <-> storage
-> improve read performance and reduce load
-> increase complexity and consume resource
-> caching dbs : redis, memecache, dynamodb

note : caching strategies


-> cache aside : read from cache ; if miss then code will access db and put it in
cache and get return
-> read through : read from chache ; if miss the cache will get data from storage
and store it in itself and return
-> write through : update the cache and then update the storage
-> write behind : update the cache ; wait for timeout or more writes ; update the
storage
=> https://www.honeybadger.io/blog/nodejs-caching/

note : eviction strategies : lru, mru, lfu, mfu

note : cache nginx


-> nginx server level caching: https://www.youtube.com/watch?v=utdkGzmg1-
Y&ab_channel=TonyTeachesTech
-> nginx content caching:
https://docs.nginx.com/nginx/admin-guide/content-cache/content-caching/

note : cache redis


-> in memory
-> key (string) value (string, list, json) pair
-> limited ram
-> multiple instance of redis are required to store data : single node supports
100k req per seconds
=> when we set a key in redis then we also set time to live (ttl) : amount of time
it will remain in redis
=> https://www.youtube.com/watch?v=jgpVdJB2sKQ&ab_channel=WebDevSimplified

note : cache memcache


-> https://dev.to/franciscomendes10866/caching-in-node-js-using-memcached-5f9k

note : cache aws: dynamo db


-> https://dynobase.dev/dynamodb-cache/#:~:text=How%20Does%20Caching%20Work
%20in,the%20query%20within%20its%20cache.
-> https://www.youtube.com/watch?v=SU4dZ-qgR1Y&ab_channel=CodeSpace
-> https://www.youtube.com/watch?v=J-9rdNeOTNU&ab_channel=MrNick
-> https://www.youtube.com/watch?v=JPQPPLQnyB4&ab_channel=JamesQQuick
-> https://www.youtube.com/watch?
v=2k2GINpO308&list=PL9nWRykSBSFi5QD8ssI0W5odL9S0309E2&ab_channel=BeABetterDev

note : difference usage of different caching dbs


-> https://db-engines.com/en/system/Amazon+DynamoDB%3BMemcached%3BRedis
-> https://aws.amazon.com/elasticache/redis-vs-memcached/
-> https://dynobase.dev/dynamodb-vs-memcached/
-> https://severalnines.com/blog/redis-vs-dynamodb-comparison/

***********************************************************************************
***********************************************************************************
*******************

note : queues
-> if you have a pizza delivery system
-> 10 req comes to pizza service for customization
-> then they pay at payment service (1 paymemt per sec)
-> then payment send res to pizza service
=> it is a sync system and extremely slow : performance = speed slowest component
of system

note : queue : asynchronous system


-> as payment request comes to pizza service : it puts them in a queue
-> as payment service process the current payment and once done ; it sends the resp
of this payment and pick up new payment req from queue
-> meanwhile pizza service keep on taking new customization and adding payment req
in queue
=> it is a producer consumer mech : pizza service : producer & payment service :
consumer

note : queueing mechanisms


-> kafka
=> https://kafka.apache.org/documentation/#gettingStarted

=> https://developer.confluent.io/get-started/nodejs/?
utm_medium=sem&utm_source=google&utm_campaign=ch.sem_br.nonbrand_tp.prs_tgt.dsa_mt.
dsa_rgn.india_lng.eng_dv.all_con.confluent-
developer&utm_term=&creative=&device=c&placement=&gad=1&gclid=CjwKCAjw8ZKmBhArEiwAs
pcJ7kRo20XhlvlUJRptv0dFzr1fz-1gXZCpsAA_6YTttvDzAILcPafF0hoCGSsQAvD_BwE

=> https://docs.confluent.io/home/overview.html?
utm_medium=sem&utm_source=google&utm_campaign=ch.sem_br.nonbrand_tp.prs_tgt.dsa_mt.
dsa_rgn.india_lng.eng_dv.all_con.confluent-
developer&utm_term=&placement=&device=c&creative=&gclid=CjwKCAjw8ZKmBhArEiwAspcJ7kR
o20XhlvlUJRptv0dFzr1fz-
1gXZCpsAA_6YTttvDzAILcPafF0hoCGSsQAvD_BwE&_gl=1*11k55o5*_ga*MjExODc3MTY2MC4xNjkwNjE
1NTY4*_ga_D2D3EGKSGD*MTY5MDYxNTU2OC4xLjEuMTY5MDYxNTY3Ny4xMi4wLjA.&_ga=2.88850331.12
53229520.1690615568-
2118771660.1690615568&_gac=1.259665784.1690615677.CjwKCAjw8ZKmBhArEiwAspcJ7kRo20Xhl
vlUJRptv0dFzr1fz-1gXZCpsAA_6YTttvDzAILcPafF0hoCGSsQAvD_BwE

=> kafka with nodejs : https://www.youtube.com/watch?


v=UhGUaoEEze0&list=PLWkguCWKqN9ODj1BNk5V-aOhjvjPxSb2R&ab_channel=BogdanStashchuk
=> kafka with spring boot : https://www.youtube.com/watch?
v=SqVfCyfCJqw&ab_channel=Amigoscode

-> rabbitMQ : https://www.rabbitmq.com/getstarted.html

-> amazon SQS


=> https://docs.aws.amazon.com/sqs/index.html
=> js: https://docs.aws.amazon.com/sdk-for-javascript/v2/developer-guide/sqs-
examples.html
=> spring boot : https://mydeveloperplanet.com/2021/11/23/how-to-use-amazon-sqs-in-
a-spring-boot-app/
=> python :
https://docs.aws.amazon.com/code-library/latest/ug/python_3_sqs_code_examples.html
=> https://www.youtube.com/watch?
v=CyYZ3adwboc&list=RDCMUCraiFqWi0qSIxXxXN4IHFBQ&start_radio=1&rv=CyYZ3adwboc&t=1&ab
_channel=BeABetterDev
=> https://www.youtube.com/watch?v=EKaklEUB_yA&ab_channel=CodeSpace

note :
-> pros: scalability ; reliability ; buffering ; durability ; req spikes smoothing
-> cons: incr system complexity ; increase latency
=> even if service crashes ; the req remains safe in queue ; once service restarts
it again starts processing the queue

note : messaging paradigms

1) message queue
-> one of the producer (one instance of pizza service) sends message to one of the
consumer (one instance of payment service)
-> if that instance of payment service is unavailable then request goes to another
instance of consumer (payment service instance)
=> action ; exactly once delivery ; message can arrive out of order

2) publisher subscriber
-> if payment service wants to initiate an event about a payment to billing and
reciept service
-> then it publishes that event on a channel which in turn is subscribed by billing
and reciept service
-> notification ; atleast once delivery ; message are always in order

note : rabbitMQ
-> AMQP protocol : it stores messages until consumer retrieves thems
-> offloads heavy tasks
-> distributes tasks

1) routing keys
=> when producer puts a message then it contains a routing key (it contains
information that in which queue this message needs to be droppes like payment
queue)
=> then the actualy body contains information like amount, orderid which again
helps to put the message in a queue related to particular instance of service
running

2) exchange: router / load balancer


-> it recieves all msg and put them in correct queue
=> types
i) direct : it puts message in the queue according to routing key of msg : if more
than one consumer of that queue then it works in round robin manner
i) topic-header : bifurcation of queues on the basis of metadata or configuration
(topic/header) in routing keys
i) fan out : all the consumer recieves the message

=> rabbit MQ acts as msg q in direct, topic and header exchange ; but it works as
pub/sub in fan out exchange

3) channels : concurrency
-> rabbitMQ consumer use tcp connection ; multiple tcp connection from different
threads in same service to consume msg faster
-> but tcp conn are expensive : instead rabbitMQ makes multiple channel with only
on tcp connection with from different threads within same service

4) acknowledgements : when to delete msg from queue : automatic and explicit :


reliability
-> it should be deleted when consumer ack to exchange instead once msg is just
delivered
-> service can ack after storing or processing the msg

note : kafka : most popular pub/sub system : event streaming platform : msg stored
for period of time
-> topic : it is basically queue name and >1 consumers are connected/subscribed to
that topic
-> each msg = event ; each event has key, value, timestamp
-> kafka topics are usually sharded / partitioned : it makes rules such as : (#1:
0,1,2 ; #2 : 3,4 ; #3 : 5,6,7 ; #4: 8,9)
=> if a producer produce and sends key ending with '9' will go into 4th paritioned
(kafka sharding logic)
=> consumers divide parition in themselves: if we have three instance of service
then one instance can be connected to #1,#2 and the other two can be connected to
#3, #4 resp
=> if more consumers then partitioned then wastage of resource

-> pros : high throughput ; cons : latency

note :
-> instances of same services comes under same consumer groups
-> two instances of same group may not recieve all messages of a queue partition
they are connected with (ex : two instances of receipt)
-> but two instance of diff group if connected with same topic then any one
instance of both groups will recieve same message (ex : different instance of
receipts and billing)
=> appending at end ; deleting at start
=> it can process 100k events per second

***********************************************************************************
***********************************************************************************
*******************

note : protocols
-> tcp and udp
=> tcp > web sockets
=> tcp > http
=> http > rest, gRPC, GraphQL

note : tcp : reliable, ordered, error checked :: slower than other protocls
-> sender establish connection to reciever
-> sender breaks the message in peices (payload) and then sends package one by one
-> if reciever do not ack then sender will again send the packet
=> udp : non reliable, fast, good if there is constant stream of data

note : http: hyper text transfer protocol (tcp protocol) : text with links and
other docs
-> client sends request and wait for response
-> it contains status, header, body
=> status codes :
i) 100 - 199 : informations
i) 200 - 299 : success (200: ok ; 201: created)
i) 400 - 499 : client error (401: unauth ; 403: forbidden ; 404: not found)
i) 500 - 599 : server error (500: internal server error ; 503: service unavailable)
i) 300 - 399 : redirections

note : rest : api structure


-> restfullness : get /users/123/books => returns all books burrowed by user 123
-> nested resources: get /users/123/books/567 => it returs the book 567 borrowed by
user 123
-> state : put /users/567/enable
-> safety : get /users/567/ban (incorrect)
-> idempotency : put /online (boolean) ; post /likes (non idempotent : counter
incr)
-> pagination and sorting : get /books?limit=50&offset=100 (***)

=> if there is deeper nesting then use graphQL instead of rest


=> never use get to change any entity

note : web sockets


-> generally client send req and sever responds
-> if we want to send response from client to service without a req then web
sockets comes into picture (chat service)
-> adv : connection is established once and real time message delivery to client
-> disadv : more complicated ; load balancer trouble ; websockets are statefull (if
connection drops we need to reconnect)
=> more effecient than polling in http

note : long polling


-> if sometimes we can not use websockets then long polling is used
-> opening and closing connection ; based on http

note : rpc : remote procedure call


-> generate function : define function in abstract manner
-> it can not use grps in web / mobile client

note : graphQL
-> it solves issues of overfetching and underfetching
=> overfetching : get request fetching multiple columns from diff tables but only
one gets used in frontend
-> instead of get, we can use post and send alias (list of fields) (required
fields) as body and only fetch those fields

=> QL : query language ; http based ; req & res => json format ; define which
fields/nested entities need to return
-> get : query
-> post, put, delete => mutation

note : graphQL documentation


-> https://graphql.org/code/

note : socket.io
-> https://socket.io/
-> https://socket.io/get-started/chat
-> https://socket.io/docs/v4/

note : notification
-> https://www.npmjs.com/package/nodemailer

***********************************************************************************
***********************************************************************************
*********************

note : concurrency and parallelism


-> parallelism : doing more than one thing at same time ; two people parallely &
independently making tea and sandwich ; if 2 cpus then we can acheive parallelism
-> concurrency : illusion of doing more than one thing at same time ; one person
making tea and sandwich then tea and sandwich switching tasks ; if 1 cpu then need
to switch

note : process
-> each process has seperate memory space
-> if you start java server then you see only one process
-> if you start nodejs process then you see multiple processes
=> if 10,000 users needs to be cached in memory then nodejs can consume a lot of
memory than java because they will replicate data multiple times (each process will
have its own data img)
=> what kind of process model your tech uses according to requirements

note : interprocess communication


-> file : each process will not have comman data instead it will be present in-
memory and process will read it as required ; but update problem
-> signal : signal terminate process : kill -9 kills process
-> socket : client process connect at 8080 and server process listen at 8080
-> pipe : output of one process becomes input of another process => ls grep hello
(output of ls process is input of grep process that will run hello file and give a
new output of print hw)
-> shared memory
-> message passing

note : threads: it executes code : code execution ; garbage collection


note : thread pool : limit on number of threads

***********************************************************************************
***********************************************************************************
*********************

note : database

note : indexes
=> indexes are made on a column so that searches on that column becomes much faster
=> if we search something by name col and there is no index then we need to search
all rows and match string
=> if indexing then each name will be mapped to an id and fast to search
=> hashmap and dictionary : string type query
=> b-tree (it can have more than 2 child and leaf nodes are linked list) : if query
: where age between 25 and 50

note : sharding (***)


-> all tables in same db and single instance but then it grows from 4tb to 34 tb
-> split a large db of 16tb => xtb, ytb, ztb : but how our application will know
which db to connect with ?
1) tenant based sharding : suppose we have a global app then we can divide
databases on the basis on country : IN, AUS (uneven distribution ; creating new
shard easy)
2) hash based sharding : if we need to operate record then with help of hashing we
decide the shard (even distribution ; add new shard needs to change func ; hard to
join with fkey)
=> there are 4 shards then hash(id) % 4 => will give shard number
=> shard router : there is locator to locate the shard in which a particular record
with id can be present

note : consistent hashing


-> problem : if use hash approach and need to add a new shard then we need to
reshard
=> number of shard incr 4 => 5
-> inti sharding func : 6%4 => 2 ; new shard func : 6%5 => 1 (we need to move
record with id = 6 from shard 1 => 2)

=> if there are this much records: 0-127 record


=> 1 : 96-127
=> 2 : 0-31
=> 3 : 32-63
=> 4 : 64-95

=> if we need to add one more shard then we will add it in the middle : most of the
ending with ending hashes remains same
-> 1 : 100-127
-> 2 : 0-24
-> 5 : 25-49
-> 3 : 50-74
-> 4 : 75-99

note : partitioning
-> sharding : breaking one large db in smaller dbs
-> partitioning : breaking one large table in smaller table
=> benefits : smaller files = fast queries ; easy indexing

note : partitioning strategies


1) list of values : amazon (orders table partitioned into three tables : placed
orders (small db) ; in-progress orders ; completed orders (large db))
=> most of the users hit placed order db ; if that will be smaller then queries
will be faster ; incr user experience
=> cons : if order is complete then we need to move data from one table to another
table (adding and deleting record instead updating)

2) range of dates : paritioning orders : 2020 ; 2021 ; 2022 ; 2023 ; 2024 ; 2025
(more queries will come to 2025 ; hence decreased paritioned size will make queries
faster)
=> cons : uneven key distribution : if people purchase more in a particular month

3) hash of keys : take a primary key and hash by number of table and put data in
hash table ; it will contain even distribution ; works good if access data by key
=> cons: it will more worse then keeping data in single table as we need to query
all tables to find an order by date

=> cons
-> complexity ; scanning all parition is expensive ; hard to maintain uniqueness

note :
-> cap theoram : consistency ; availablility ; partition tolerance
-> acid preoperties : atomicity ; consistency ; isolation ; durability

***********************************************************************************
***********************************************************************************
*********************

note : architectureal patterns

1) web sessions : session management


-> http is stateless ; it means each request does not know about another one ; but
what if we want to store items in cart because cart is statefull
-> solution 1 : everytime user add something into cart then push it into db :
costly
-> solution 2 : everytime user add something into cart then push it into client
side cookies which will move to serve in http headers : less secure and cookie size
limitations
-> solution 3 : everytime user add something into cart : go to server and save them
in file system and create a session id and set the session id as cookie in browser
=> when we need to save cart in db then cookie containing session id will some to
server ; server will match it with fs and save cart into db (secured and unlimited
size)
-> problem 3 : if we have load balancer then first saved item can go to server 1
and next item can go to server 2 which will lead to inconsistency
-> solution problem 3 : we will map session id with a server so that for one
session req will go to only that same server : load balancing failure
-> final solution : we can use a redis database (k-v store) between client and
server which will resolve all problems : data incons ; load balancing ; secured ;
unlimited size
=> ways to preserve state accorss http
1) cookies : scalable ; limited size ; insecure
2) sticky session : require layer 7 load balancer => app layer ; non resilient to
scale up ; down ; crash
3) key-value store : resilient ; increase complexity

2) serialization
-> data serialization != req serialization
-> serialize and deserialize data structure and object to store them on disk or
transfer them over network
-> use queue to serialize req ; avoid data race ; acheive serializability

3) CQRS
-> if there are buyer ans seller
-> seller add record to table and buyer fetch/update/delete record from table
-> buyer queries can be made faster by increasing number of indexes in table
-> but it will lead to request timeouts for seller as it will take time for db to
add record with indexiing
=> solution : there will two tables
-> seller will add record to a non indexed table
-> in the regular intervals or with help of websockets we will update the added
record in indexed table
-> buyer will read from indexed table

***********************************************************************************
***********************************************************************************
*********************
***********************************************************************************
***********************************************************************************
*********************
***********************************************************************************
***********************************************************************************
*********************

note : system design


-> if there are two kinds of users going to use an app then this is called market
place app (demand and supply) : uber
-> if there is only once kind of users or independent kind of users then it is
called retail app : amazon

-> identify type of system


-> estimate total users ; active users and service consumption estimate (demand /
supply)
-> db estimate :
i) 5000 inserts per second for a single node ;
i) 10k reads per second for a single node ;
-> if we want to scale read operation then we increase db replica
-> if we want to scale write operation then we use db sharding
-> a server can easily handle 1000 req per second

=> redis db read : 100 k rps


=> cache read : 100 k rps

=> estimate number of insert operations and decide number of shards and sharding
technique
=> estimate db size
=> estimate throughput : number of request per second
-> number of user : x
-> number of times user consumes app : y
-> amount of time user consumes app each time : z
-> estimate user session for y and z tradeoff
-> estimate number of api user calls in z time : a
-> find the request per second : x*y*z*a / 24*60*60
=> estimate the req in terms of reads and writes

note : storage size estiamtes


-> relational database : 1 TB data : ~depends on system hdd/ssd
-> redis : 10 GB data ~ ram size : 64 GB
-> ram : ~depends on system of ram = 16-64 GB
-> cache : 32 GB data
=> calculate yearly storage requirements : generalize url/tweets size = 100 bytes
=> how long we need to store old data in storage can help to estimate storage

note : how to approach design problems ?


-> total number of users : storage requirements
-> number of active users : number of request per seconds

note : we place limit at read and sessions to optimize system with help of
pagination

note : db size estimation


-> text / string => 64KB
-> longtext / stringified json => 4 GB
-> datetime = 8 bytes
-> uuid = 128 bits ~ 16 bytes

=> services
=> caches
=> queues
=> load balancing
=> database

***********************************************************************************
***********************************************************************************
*********************

You might also like