Systems Design Study Guide
Systems Design Study Guide
Background
Oftentimes in an interview we will need to estimate the performance of a system, however there
are many ways to measure this. We can look at average performance, but this does not describe
potential variation in calls to our service.
Another important concept is tail latency, which describes the latency at a certain percentile of
requests. For example, the latency metric of 1 second for p95 says that 95% of requests have a
lower time than 1 second, the other 5% do not.
Types of Databases
In the majority of the systems we will be designing, there is a need to hold some sort of
persistent data. It would make no sense to hold application data in server memory, as not only
would it be lost if the server crashed, but having many servers means that they would be out of
sync if only one server stored certain information. The solution to this is to store any persistent
data in one or more databases. I will now go into the types of databases, and the pros and cons
of using each type.
Types of databases:
● Relational databases (also known as SQL databases)
○ Consists of tables holding many rows of structured data
○ Rows of one table can have relation to rows of another table if both rows share a
common key
○ Has a built in query optimizer that user the quickest implementation for a SQL
statement (declarative language)
● Non Relational databases (also known as NoSQL databases)
○ Instead of rows, a table holds a collection of keys mapped to values, which do not
need to have any particular structure
● Graph databases (also technically NoSQL)
○ Has vertices and edges, queries involve traversing the graph
○ Good for storing both homogenous and heterogenous data (with multiple types of
edges)
Indexes
Imagine a basic database where you want to fetch a record when making a read. If all of the
rows are just stored on a hard drive, every single time a read call is made, you would need to
search for the row on the disk, resulting in a very slow O(n) time complexity. Instead, this
process can be sped up by creating an index, which allows a database to quickly search for rows
based on certain values of the tuple that defines a record.
Note that all writes to databases (assuming no indexes) are done by just appending to a log, as
this is the quickest way to write to disk (sequential writes). It also makes concurrency much
easier to deal with as there are no conflicts on crash of one value being partially overwritten.
You can construct secondary indexes (on non unique fields) from these data structures very
easily by just making them point to a list or appending the primary key of the row onto the
secondary index key to make them unique.
In an index, it is often preferable to store the actual rows in an append-only heap file, with the
key value mapping pointing to the offset of said row. However, this incurs a read penalty when
trying to get the value from the heap file, and also means that if a value is updated such that it
can no longer fit in its old position of the heap file, it must be moved, and all the indexes holding
the corresponding key must be updated. In some situations you can store the actual values in the
index (clustered index), but on writes every cluster index must now be updated. One middle
ground to this is the covering index which stores only a few columns of the row in the index.
Multicolumn indexes can also be created by basically just concatenating fields, have to be
careful the way this is done because it is done using an outer sort order with an inner one, so not
all queries will use the index. Some databases have true multidimensional indexes.
Data is typically extracted from the transaction databases, transformed to the proper format for
analytics, and then loaded into the data warehouse. The index patterns discussed above do not
work as well for analytics, so there is a need to create new ways of storing the data.
The data in an analytics table is typically done via the stars and snowflakes schema:
● There is a centralized fact table for important events that we may want to analyze (such
as each sale in a grocery store)
● The fact table has multiple foreign keys to rows of many other tables, known as
dimension tables (in the grocery example possibly a product table or a customer table)
● The snowflakes schema builds on the aforementioned star schema, where each dimension
table has sub dimension tables
In order to optimize for analytical speed, data warehouses typically use column oriented storage:
● Traditional approach
○ Most non analytical (transactional) databases use row oriented storage, because
we typically want to access all of the values in a row
○ This means that there is locality in the storage of a row
○ However, for analytics queries most only use a few out of the hundreds of
columns in a given row
● Column oriented storage
○ Store every value from each column together instead, keep the order of the values
in the column the same as what the rows would be in the table
○ Increased locality of these values allows reading them much faster
● Compression
○ Storing all of the column values can be easily compressed in order to reduce the
amount of space that the data takes up
○ If the number of distinct values in a column is small compared to the number of
rows, a bitmap encoding can easily reduce the amount of space needed (turn each
possible value into a 0 for the indices of the column where the index is not equal
to the value and a 1 if it is equal to the value)
○ Can further compress the bitmap encoding to a run length encoding by saying
how many 0s and 1s are repeated and combining these numbers
○ When trying to find rows that have values of either A or B for a specific column,
load their bitwise encodings and bitwise OR them to quickly get a result
○ Using column oriented storage allows better parallelization, and compressing it
allows more of the data to fit in CPU cache
Sometimes it makes sense to sort all of the columns in the same order to help speed up queries,
in addition to help compress columns even further (if all equal values are next to one another this
is easiest to compress). Can even have replicas of the data warehouse sorted in multiple different
ways to make certain queries faster.
Writing to column oriented storage is hard if you want to write in the middle of a sorted list, but
LSM-trees allow doing this efficiently and then rewriting the column segments to propagate the
changes.
Encoding
Encoding allows us to send data from machine to machine, as objects in memory must be
translated to encodings. Generally, we should not use language specific encoders as they are
slow and lock you into a given language. JSON and XML, the most common encodings, are
useful in the sense that they are so ubiquitous. But in some cases, as in dealing with very large
amounts of proprietary data, it can make sense to use a custom binary encoding to reduce the
amount of space used by an encoding and greatly speed up the transfer of data. Some of these
binary encoders are greatly able to reduce the amount of space used by requiring a schema
predefined for the data - but in defining a schema, it is important that our applications are able to
stay both forwards and backwards compatible. To maintain backwards compatibility, if you add
a new field to a schema, it cannot be made required (or it must have a default value). Schemas
can be useful because having a database of them can help to keep track of forward and backward
compatibility, and also allow for compile time type checking in statically typed languages.
Replication
Replication is the process of storing multiple copies of the same data on multiple different
computers. It serves three main purposes. Firstly, the redundancy of the data means that a
service can still serve reads and writes if one of the database nodes crashes. Secondly,
replication can actually speed up the process of reading or writing if the operation is performed
by a database node that is geographically closer to the client. Finally, replicating data to many
databases allows the reduction of load on each database. However, there are many different
ways of implementing replication, each with their own benefits and drawbacks, and we will read
about them below.
Leaderless replication:
● Any replica can accept writes from any of the clients
● No such thing as failover, simply set a threshold of the number of nodes that need to
accept the write for the write to be successful, same with reads
○ If an unavailable node comes back online, a client may read from many nodes
simultaneously, realize the previously offline node has an outdated value and
update it accordingly (use version numbers to check which values are out of date),
this process is known as read repair
○ Another way of ensuring up to date data is anti entropy, which is a background
process that looks for data differences in replicas and copies the correct data over,
however the writes are not copied in any particular order
● If we can only write to a fraction of nodes at a time and read from a fraction, we can use a
quorum in order to ensure that we always read from at least node with a most up to date
copy of the data
○ This occurs when the number of nodes successfully written to plus the number of
nodes read from are greater than the number of total replicas
○ Typically reads and writes are sent to all replicas in parallel
● There are still cases where quorum reads and writes are not perfect
○ Even if writes do not succeed on the specified number of nodes they will not be
rolled back on the nodes where they have been written
○ In the event that sloppy quorums are used, the writes may end up on different
nodes than reads, such that there is no overlap between them
○ If a node with a new value fails and its data is restored using a node with an old
value, the new value will be lost
● Works well with multi-datacenter operation
○ Send writes to all nodes, but have the acknowledgements from the client’s local
datacenter be sufficient to fulfill a quorum write in order to reduce the high cross
datacenter latency of writes
Sloppy Quorums:
● Sometimes a client can reach certain nodes in a cluster, but not the nodes that the data for
a given write would typically go to
● A sloppy quorum is instead writing and reading from these nodes outside of the cluster
(still using the predefined thresholds for the number of reads and write nodes as a normal
quorum)
○ Once the network outage is fixed the data is sent to the proper nodes, known as
hinted handoff
○ There are no guarantees that reading from the proper number of nodes will return
the new value, as the reads are probably going to the original nodes, not the ones
where the data is written
○ Sloppy quorums are just useful for assuring durability of data
A problem that occurs in both multileader and leaderless implementations of replications is being
able to detect many concurrent writes. Concurrent writes occur when two writes to the database
from different clients do not know about each other. While it is most important that the database
replicas all converge to a consistent state, there are certain ways of dealing with concurrency that
improve durability by not arbitrarily picking one write to keep and throwing out the others.
Partitioning
When dealing with large systems, a common issue that may occur is that a single database table
actually becomes too big to store on a single machine. As a result, the table must be partitioned,
or split, onto multiple different nodes. How exactly this splitting is done is an implementation
detail, but being able to partition a database greatly increases the scalability of a system by
allowing a given database table to get arbitrarily big, and perhaps even store more relevant data
in nodes closer to the users accessing it. This being said, partitioning, also known as sharding,
comes with many complications.
As discussed previously, secondary indexes are an additional way of storing data in a database
that can speed up certain queries. However, in a partitioned database, not all of the data is
available in each partition and making queries based on a secondary index can take a long time.
As we will see below, there are multiple ways to approach building secondary indexes in this
situation.
Local Indexes:
● Each partition keeps track of every possible secondary index for only its local data
○ Lower overhead on writes since no need to coordinate across multiple partitions
○ Potentially very high overhead on reads since need to query all partitions in order
to combine the data from all of the local indexes
Global Indexes:
● Instead of having each node keep track of all of the indexes for only its local data, instead
having each partition hold all of the ids corresponding to a given term, and partition the
terms among the nodes
○ For example, a car sales website would hold all the ids of red cars on partition 1,
even if an id of one of the red cars was not held in that particular partition
○ The terms can be partitioned either by the term itself (better for range scans if
querying multiple terms), or a hash of the term (better for distributing load)
○ The downside of the global index is that writes may have to go to multiple
partitions (which introduces network latency), where on the other hand reads
become much faster because they only go to one partition
○ Requires a distributed transaction for the index to be completely up to date, which
is quite slow
Rebalancing Partitions
Inevitably, as your system scales, you will have to rebalance the data on each partition, as you
will likely have to add or remove nodes in the cluster at some point. You may also need a new
machine to take over a failed machine. Either way, it is important to know how to gradually add
and remove data from a given shard.
Rebalancing Process:
● Use hash ranges instead of hashes mod the number of nodes in the cluster
○ This is bad because adding and removing nodes will make almost every key have
to be remapped to another node, uses a tremendous amount of network bandwidth
to do
○ Better to use a system that keeps the majority of keys in place and only moves a
few
● Instead it is better to use a fixed number of partitions
○ Create many more partitions than there are nodes to start, assign several small
partitions to each node
○ When adding a new node it can steal a few partitions from each of the existing
nodes, do the opposite if a node is removed
○ While the partition transfer is happening continue to use the old node to accept
reads and writes for it
○ Want to choose a high enough number that allows for future scaling, but also not
one that is too high as each partition has management overhead (also large
partitions take longer to rebalance)
● In certain key range partitions, databases will dynamically partition to help balance load
○ When a partition gets too big, it is split, and if a partition is too small, it is merged
○ Good because dynamic splitting keeps a small overhead when little data, and
stops each partition from becoming too large in size
○ While some databases let you choose a starting partition configuration (which is
prone to human error), others do not, and so you need to start with one partition
which could be a bottleneck
● Can also keep the number of partitions per node constant, and go from there
● Sometimes it is not good to let the database rebalance automatically because it may
incorrectly detect a failed node and try to rebalance, putting more load on network and
breaking more things
● Use a third party service like zookeeper to keep track of which partition is in which
database and use this to coordinate with either a routing layer or a client on which
database to access
Transactions
Transactions are an abstraction used by databases to reduce all writes to either a successful one
that can be committed, or an erroneous one that can be aborted. While transactions are
somewhat hard to implement in distributed systems (we will discuss later), in a single database
they can be rather useful. They hope to provide the safety guarantees outlined by ACID.
In single object writes, almost all database engines provide guarantees about atomicity and
isolation so that the data for an individual key does not become moot or somehow mixed with
the previous value - atomicity can be implemented using a log for crash recovery and isolation
can be done using a lock on each object.
As mentioned previously, having concurrent writes completely act as if they are sequential ican
take up lots of resources and make the database slow. Instead, some databases protect against
only some concurrency issues.
Serializable Isolation
While the above methods allow for a much faster method of providing some guarantees about
concurrency bugs in the data, the best and most bug-free way to deal with concurrency is just to
provide serializable isolation. If you can deal with the performance penalty, you should do so.
Unreliable Clocks
Oftentimes in distributed computing people attempt to use clocks or timestamps as a way of
synchronizing events and receiving an order out of them. However, for a variety of reasons, this
is not a feasible tactic. There is always a question of whether to use the timestamp from when a
message was sent, or received, the difference of which is unbounded due to an asynchronous
network. Additionally, the clocks on the majority of computers are ever so slightly out of sync
with one another, and get more out of sync as time passes.
For measuring elapsed time, do not use time of day clocks (seconds since epoch):
● They ignore leap seconds
● They may jump back to a previous point in time if they become too out of sync with a
clock server
● Instead, better to use monotonic clock, which just measures relative time on a given
computer (it is always increasing)
● Synchronization with a time server is only as good as a network delay
If you had synchronized clocks, you could potentially use timestamps as transaction IDs for
snapshot isolation in a distributed database (cannot have one monotonically increasing
timestamp across many systems).
Cannot rely on certain parts of code running within some amount of time due to process pauses
(things like garbage collection that stop all running threads, or other context switches that take an
arbitrary length of time).
It is not possible to rely on one single node as a sole source of truth as it is prone to failures and
process pauses. Instead the state of the system as a whole should be voted on by a majority of
nodes.
Byzantine Faults:
● Sometimes nodes may act maliciously on purpose, perhaps by sending a fake fencing
token
● A situation where nodes are lying is known as a Byzantine fault
○ Systems are Byzantine fault tolerant if they can operate correctly if some nodes
are not obeying protocol
○ Generally only have to worry about these when dealing with multiple parties that
do not trust one another, we are only worried about the servers of one
organization
Linearizability
The goal of linearizability is to make it appear as if there were only one copy of the data. Once a
new value has been written or read, all subsequent reads (regardless of replica) see the value that
was written, until it is overwritten again. Consensus algorithms are certainly linearizable, while
the forms of replication discussed previously may not be.
Linearizability has a cost which is mostly seen in the lack of availability (when it comes to
network problems) and speed induced by needing to make up to date reads.
Ordering
In order to achieve linearizability and make it seem as if there is just one copy of the data, we
need to determine some sort of order which every operation on the data occurred in. Having an
ordering allows us to keep track of causality, and see which events depended on others.
However, just keeping track of causality alone does not provide a total order, which provides an
ordering of every single operation on the database - this is what is needed for linearizability.
Note that having a total order may be overkill, oftentimes it is sufficient to just preserve causal
consistency (can be done with the version vectors described earlier) without incurring the large
performance penalties required by linearizability.
We can use Lamport timestamps in order to generate sequence numbers across multiple
machines consistent with causality:
● Each node has a unique identifier, and keeps a counter of the number of operations it has
processed
○ The timestamp is a tuple of the counter and the node ID, use an arbitrary ordering
between the nodes and the counter to create a total order
● Every node and client keeps track of the maximum counter value it has seen so far, and
includes the maximum on every request
○ When a node receives a request or response with a maximum counter greater than
its own counter value, it increases its own counter to that maximum
● Unlike version vectors, Lamport timestamps cannot show which operations were
concurrent with one another, though they do provide a total ordering
Lamport timestamps are still not totally sufficient as they can only provide answers to
concurrency bugs after the fact - in the moment problems like dealing with uniqueness
constraints across different replicas cannot be solved.
To deal with these problems, we can discuss total order broadcast, which is a protocol for
exchanging messages between nodes. This protocol ensures that no messages are lost, and that
messages are delivered to every node in the same order each time. The reason this is stronger
than timestamp ordering is that a node cannot retroactively insert a message into an earlier
position in the order if later messages have been delivered. Total order broadcast is in this way
like a log.
Both linearizable storage and total order broadcast are equivalent to consensus: you need to get
all of the nodes to agree on a value for every operation.
Although we have now spoken about some problems that can be reduced to consensus, it now
seems best to actually discuss some ways that consensus can be achieved. Firstly, we can talk
about two phase commit, which is somewhat inefficient, but solves the problem of atomic
commit (getting all replicas to agree on whether a transaction should be committed or aborted).
Database internal distributed transactions (transactions using only the same database technology)
can actually be pretty quick and optimized, however when using multiple different types of data
systems (like databases, message brokers, email services), you need a transaction API (such as
XA) which is often quite slow.
Unlike two phase commit, good consensus algorithms reach agreement by using a majority
(quorum) of nodes, in order to improve availability. After new leaders are elected in a
subsequent epoch (monotonically increasing in order to prevent split brain), consensus
algorithms define a recovery process which nodes can use to get into a consistent state.
Coordination services such as ZooKeeper are used internally in many other popular libraries, and
are a replicated in memory key value store that allows total order broadcast to your database
replicas.
Batch Processing
Batch processing is useful for when you want to perform some operations on a potentially very
large, fixed size, input set of files and return an output. Typically these files are in some sort of
distributed file store like Hadoop, and can be transformed using certain batch processing tools
like MapReduce to a different output. However, MapReduce is itself not perfect and there are
many optimizations that can be made to speed up batch processing in different scenarios of
computation.
MapReduce:
● Allows passing in files and returns files to be piped into subsequent MapReduce calls
○ Break each file into records, call the mapper function to extract a key and value
from each input record
○ Sort all of the key-value pairs by key
○ Call the reducer function to iterate over the sorted key-value pairs
○ Only write custom code for the mapper and reducers
● Computation parallelized across many machines automatically
● Designed for frequent faults, if a single map or reduce task fails only it gets restarted
○ This is because they are often run on shared servers which sometimes need to take
back resources and will kill a MapReduce job to do so
Joins in MapReduce:
● Resolving all occurrences of some association within a dataset
○ Actually calling the database to resolve every join is too expensive as it requires a
multitude of slow network calls
○ Put a local copy of the database in the distributed file in order to improve locality
● Sort-merge joins (reduce side join, join logic done in reducer)
○ Have a mapper for both relevant database tables, and then send the result of both
mappers to the same reducer nodes, making sure to sort them in between so that
all of the relevant information is next to one another
○ If there are certain keys which are very popular (hot keys), such that they need to
be on more than one partition, randomly assign the partition for each instance of a
given hotkey and replicate the necessary database information to each partition
○ While these are often slower, do not need to make any assumptions about data
● Broadcast hash joins (map side join, join logic done in mapper)
○ If one of the datasets being joined on is so small that it can fit on memory in each
of the mappers
○ Use an in memory hash table on each mapper to do the joining
● Partitioned hash joins (map side join)
○ Same as broadcast hash join but for when each side of the join is partitioned the
same way, do a broadcast hash join for each partition
○ If they are partitioned and sorted in the same way, one dataset does not even be
loaded into memory, because a mapper can do the same merging operation that
would normally be done by a reducer
Alternatives to MapReduce:
● MapReduce is not actually very efficient because when chaining together many
MapReduce jobs, you need to wait for one to completely finish before starting the next,
and there is a lot of time wasted in writing out the intermediate state to disk
○ Bad as certain hotkeys or stragglers can take much longer and delay the whole
thing
● Should be using dataflow engines (like Spark)
○ Parallelize computation to run as quickly as possible over multiple user defined
functions called operators
○ Reduces the amount of unnecessary mappers
○ Data dependencies explicitly declared so that the engine can optimize for these
○ Slightly less fault tolerant because data not materialized to intermediate state, so if
it is lost, it is recomputed from the previous data needed to make the calculations
○ This means that the functions must be deterministic
Batch processing is also frequently used on graph data in order to make things like
recommendation algorithms. MapReduce is not efficient for this because the graph data
infrequently changes, however MapReduce would create an entirely new output dataset. Instead,
it is better to use the Pregel processing model, where one vertex can send a message to another
vertex along an edge of a path. In each iteration of the batch processing, the vertex receives all
the messages sent to it from the prior step, and then sends out new messages. This goes on until
an end condition is met. The nodes remember their state in memory so the entire graph does not
need to be rewritten. By occasionally writing vertex state to disk, a deterministic algorithm
becomes fault tolerant in the event of a crash.
Stream Processing
Unlike batch processing, stream processing is quite similar but instead processes an unbounded
set of data. This is useful for when you want data to be processed as quickly as possible in an
asynchronous manner, as we will see with something like message brokers. While messages can
in theory be sent directly from the producer to a consumer, a message broker allows for more
fault tolerance and handles message loss for you.
Message brokers:
● Kind of database optimized for handling message streams, clients can come and go,
handles durability
● Generally queue up messages if there are too many as opposed to just dropping them
● Many message brokers will delete a message once it has been successfully delivered to
consumers, as a result assume a small queue
● When there are multiple consumers, two main patterns of messaging are used
○ Load balancing delivers each message to one consumer to share the work between
them
○ Fan out delivers the message to all of the consumers
● When a consumer finishes processing a message it sends an acknowledgement back to
the queue
● Unless you are using a separate consumer per queue, the fact that a consumer can crash
while handling a message means that messages may be delivered out of order
Event Sourcing:
● Similar to change data capture, but with some subtle differences
○ The database is not the source of truth, with changes being derived from it
○ Instead have a central append only log of events, which do not log what the
changes are in the database, but instead what the user is actually doing on the site
○ Derive your other data systems from the logs of events
● More maintainable because other schemas can be derived from the same logs of events in
the future, and makes things easier to debug (what actually happened on the client side,
not just the state changes that client actions caused)
● Does require asynchronous writes unless you want a large write penalty
Stream Joins:
● Just like batch processing, streams need to be able to perform joins on other datasets
● However it is a bit more challenging because new events can appear anytime on a stream
● Stream-stream join (good for joining two related streams of data)
○ If two streams are sending related events that you may want to join with one
another, both streams must maintain state (probably for all events within a certain
time window)
○ Can use an index to do so on the join key, and each stream can check the index
when events come in
○ On expiration of an event from said time window you can record that there was
no corresponding other event to join it with (useful in certain analytics cases)
● Stream-table join (good for enriching data before sending to another stream)
○ Joining stream events with information from the database table, querying the
database every time is too slow, should keep a local copy
○ However this table needs to be updated over time which can be done by the
stream processor subscribing to the database changes of the actual database and
updating its local copy
● Table-table join (two streams maintaining local copies of tables that have to do joins)
○ Use change data captures to keep two tables up to date for each stream, and
accordingly take the result of the join and send it where it needs to go
● One issue is that these joins become nondeterministic
○ Can maybe be addressed by using a unique ID for the thing being joined with, but
this requires much more storage
Fault tolerance:
● Want to be able to ensure that each message is processed exactly once, no less and no
more
● Can use micro batching or checkpointing to retain state of stream
○ Break stream into small blocks, each of which is treated as a batch process which
can be retried
○ Can also keep state to generate checkpoints of the stream, so that on crash it can
restart from the most recent checkpoint
○ Not useful for stopping things external to the stream (such as sending an email)
from occurring twice
● Can use atomic transactions
● Can use idempotence
○ An operation that can be done multiple times and has the effect of only being
performed once, use these in processors to avoid processing messages many times
Load Balancing
The load balancer is a necessary component in any distributed system which helps to spread
traffic around a cluster of servers or nodes. It keeps track of the status of said servers so that it
stops sending them requests if they crash. Load balancers can and probably should be added
between any layers where there are multiple nodes running so that traffic can be split up.
Fault tolerance:
● The load balancer exists to help reduce single points of failure but it itself is a single
point of failure
● Can use a cluster of load balancers which send heartbeats to one another such that if one
load balancer fails the other takes over (active and passive load balancer)
Caching
In a distributed system, caching data allows you to store copies of data on either faster hardware,
or just hardware located closer to the end user that will request it. While this can greatly speed
up certain read requests too, caching can at times become complicated, as storing multiple copies
of data will inevitably lead to having to deal with stale (outdated) data. Additionally, as they are
made for faster reads, caches often have far fewer memory than databases, so it is important that
we delete old entries from the cache.
Types of cache:
● Hardware caches
○ On CPU (L1, L2, L3 cache)
○ Often the computer will use memory to cache disk results
● Application server cache
○ Memory on the actual application server that remembers the results of certain
queries, and returns them back if they are requested again
○ Still no guarantee that all requests will hit these, should instead use a global or
distributed cache
● Content Distribution Network (CDN)
○ Serve large amounts of static media
○ Request asks CDN for content, and if it is not there it queries the backend, serve
it, and cache it locally (pull CDN)
○ Also possible to directly upload content, good for sites with either low traffic or
data that does not change very frequently (like a newspaper)
○ Require changing URLs for static content, need to be careful to not serve stale
content
Proxies
Proxies act as an intermediary server between the client and application server that can fulfill a
variety of purposes such as adding information to a request, checking its own cache, or
encrypting a message
Types of proxies:
● Open proxy
○ Accessible by any internet user
○ Anonymous proxies reveal its identity as a server but does not disclose the IP
address of the user
○ Transparent proxies also identify both themselves and the IP address of the user,
good for caching websites
● Reverse proxy
○ Retrieves resources from one or more application servers and then returns the
result to the client as if they were their own
○ Can do things such as encryption and decryption to save application servers from
doing these potentially expensive operations
Consistent Hashing
Consistent hashing is a way of mapping keys to a given partition in a load balancing situation to
ensure that when new nodes are added as few keys as possible are rebalanced to maximize
existing caching and minimize network load and data invalidation.
To do so, create a ring representing the range of a given hash function, and hash each server
partition to it. To find which server a key goes to, take the hash of it, and move clockwise
around the ring until hitting a server hash. If a partition fails, move the keys that were on it to
the next partition in the ring. If a new partition is added, move the existing keys that are just
before it to said partition. Instead of just having one location for each partition on the ring, have
a few locations for each in order to further randomize distribution.
All three of the topics above are methods to provide real time updates to a client from an
application server. However, each has their benefits and drawbacks which we can discuss below.
The worst option is likely polling, in which a client repeatedly makes http requests to a server in
order to try and see if any changes have been made. If many clients are doing so at the same
time, this can overload the server. This being said, it is easy to implement.
Long polling:
● Open an http connection with the server and do not close it until the server has something
to respond with
● Once the server responds the client then makes another long polling request
WebSockets:
● Full duplex communication channel between client and server
● Lower message overhead due to not needing to resend headers
● A bit harder to implement
● Can only have as many open webhook connections as ports, so 65,000
Server Sent Events:
● Only one way events from server to client, keeps an open connection
● Best for if the server is generating data in a loop and will send multiple requests to clients