0% found this document useful (0 votes)
74 views6 pages

03 Back of The Envelope Estimation

This document discusses back-of-the-envelope estimation, which involves estimating system capacity or performance using common performance numbers and thought experiments. It provides tables explaining data volume units using powers of two and latency numbers for typical computer operations. These include operations like L1 cache reference, branch mispredict, sending data over a network, disk seeks, and sending packets across long distances. Understanding these concepts is important for effectively estimating systems at a high level during design interviews.

Uploaded by

srawat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
74 views6 pages

03 Back of The Envelope Estimation

This document discusses back-of-the-envelope estimation, which involves estimating system capacity or performance using common performance numbers and thought experiments. It provides tables explaining data volume units using powers of two and latency numbers for typical computer operations. These include operations like L1 cache reference, branch mispredict, sending data over a network, disk seeks, and sending packets across long distances. Understanding these concepts is important for effectively estimating systems at a high level during design interviews.

Uploaded by

srawat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

ByteByteGo | Technical Interview Prep 31/10/23, 3:40 PM

[yByteByteGo Products Newsletter YouTube Jobs

System Design Interview

00 Foreword 03
01 Join the Community
Back-of-the-envelope Estimation
In a system design interview, sometimes you are asked to estimate system capacity or performance
Scale From Zero To Millions Of requirements using a back-of-the-envelope estimation. According to Jeff Dean, Google Senior Fellow, “back-of-
02
Users the-envelope calculations are estimates you create using a combination of thought experiments and common
performance numbers to get a good feel for which designs will meet your requirements” [1].
Back-of-the-envelope
03 You need to have a good sense of scalability basics to effectively carry out back-of-the-envelope estimation.
Estimation
The following concepts should be well understood: power of two [2], latency numbers every programmer
A Framework For System Design should know, and availability numbers.
04
Interviews

Power of two
05 Design A Rate Limiter
Although data volume can become enormous when dealing with distributed systems, calculation all boils down
to the basics. To obtain correct calculations, it is critical to know the data volume unit using the power of 2. A

06 Design Consistent Hashing byte is a sequence of 8 bits. An ASCII character uses one byte of memory (8 bits). Below is a table explaining
the data volume unit (Table 1).

07 Design A Key-value Store Power Approximate value Full name Short name

10 1 Thousand 1 Kilobyte 1 KB
Design A Unique ID Generator In
08
Distributed Systems 20 1 Million 1 Megabyte 1 MB

30 1 Billion 1 Gigabyte 1 GB
09 Design A URL Shortener
40 1 Trillion 1 Terabyte 1 TB

50 1 Quadrillion 1 Petabyte 1 PB
https://bytebytego.com/courses/system-design-interview/back-of-the-envelope-estimation Page 1 of 6
ByteByteGo | Technical Interview Prep 31/10/23, 3:40 PM

50 1 Quadrillion 1 Petabyte 1 PB

Table 1

Latency numbers every programmer should know


Dr. Dean from Google reveals the length of typical computer operations in 2010 [1]. Some numbers are
outdated as computers become faster and more powerful. However, those numbers should still be able to give
us an idea of the fastness and slowness of different computer operations.

Operation name Time

L1 cache reference 0.5 ns

Branch mispredict 5 ns

L2 cache reference 7 ns

Mutex lock/unlock 100 ns

Main memory reference 100 ns

Compress 1K bytes with Zippy 10,000 ns = 10 µs

Send 2K bytes over 1 Gbps network 20,000 ns = 20 µs

Read 1 MB sequentially from memory 250,000 ns = 250 µs

Round trip within the same datacenter 500,000 ns = 500 µs

Disk seek 10,000,000 ns = 10 ms

Read 1 MB sequentially from the network 10,000,000 ns = 10 ms

Read 1 MB sequentially from disk 30,000,000 ns = 30 ms

Send packet CA (California) ->Netherlands->CA 150,000,000 ns = 150 ms

Table 2

Notes

ns = nanosecond, µs = microsecond, ms = millisecond


https://bytebytego.com/courses/system-design-interview/back-of-the-envelope-estimation Page 2 of 6
ByteByteGo | Technical Interview Prep 31/10/23, 3:40 PM

ns = nanosecond, µs = microsecond, ms = millisecond

1 ns = 10^-9 seconds

1 µs= 10^-6 seconds = 1,000 ns

1 ms = 10^-3 seconds = 1,000 µs = 1,000,000 ns

A Google software engineer built a tool to visualize Dr. Dean’s numbers. The tool also takes the time factor into
consideration. Figures 2-1 shows the visualized latency numbers as of 2020 (source of figures: reference
material [3]).

Figure 1

By analyzing the numbers in Figure 1, we get the following conclusions:

https://bytebytego.com/courses/system-design-interview/back-of-the-envelope-estimation Page 3 of 6
ByteByteGo | Technical Interview Prep 31/10/23, 3:40 PM

Memory is fast but the disk is slow.

Avoid disk seeks if possible.

Simple compression algorithms are fast.


Compress data before sending it over the internet if possible.

Data centers are usually in different regions, and it takes time to send data between them.

Availability numbers
High availability is the ability of a system to be continuously operational for a desirably long period of time. High
availability is measured as a percentage, with 100% means a service that has 0 downtime. Most services fall
between 99% and 100%.

A service level agreement (SLA) is a commonly used term for service providers. This is an agreement between
you (the service provider) and your customer, and this agreement formally defines the level of uptime your
service will deliver. Cloud providers Amazon [4], Google [5] and Microsoft [6] set their SLAs at 99.9% or above.
Uptime is traditionally measured in nines. The more the nines, the better. As shown in Table 3, the number of
nines correlate to the expected system downtime.

Availability % Downtime per day Downtime per week Downtime per month Downtime per year

99% 14.40 minutes 1.68 hours 7.31 hours 3.65 days

99.99% 8.64 seconds 1.01 minutes 4.38 minutes 52.60 minutes

99.999% 864.00 6.05 seconds 26.30 seconds 5.26 minutes

99.9999% 86.40 milliseconds 604.80 2.63 seconds 31.56 seconds

Table 3

Example: Estimate Twitter QPS and storage


requirements
Please note the following numbers are for this exercise only as they are not real numbers from Twitter.

Assumptions:

300 million monthly active users.


50% of users use Twitter daily.
https://bytebytego.com/courses/system-design-interview/back-of-the-envelope-estimation Page 4 of 6
ByteByteGo | Technical Interview Prep 31/10/23, 3:40 PM

50% of users use Twitter daily.

Users post 2 tweets per day on average.

10% of tweets contain media.

Data is stored for 5 years.

Estimations:

Query per second (QPS) estimate:

Daily active users (DAU) = 300 million * 50% = 150 million

Tweets QPS = 150 million * 2 tweets / 24 hour / 3600 seconds = ~3500

Peek QPS = 2 * QPS = ~7000

We will only estimate media storage here.

Average tweet size:

tweet_id 64 bytes
text 140 bytes

media 1 MB

Media storage: 150 million * 2 * 10% * 1 MB = 30 TB per day

5-year media storage: 30 TB * 365 * 5 = ~55 PB

Tips
Back-of-the-envelope estimation is all about the process. Solving the problem is more important than obtaining
results. Interviewers may test your problem-solving skills. Here are a few tips to follow:

Rounding and Approximation. It is difficult to perform complicated math operations during the interview. For
example, what is the result of “99987 / 9.1”? There is no need to spend valuable time to solve complicated
math problems. Precision is not expected. Use round numbers and approximation to your advantage. The
division question can be simplified as follows: “100,000 / 10”.
Write down your assumptions. It is a good idea to write down your assumptions to be referenced later.

Label your units. When you write down “5”, does it mean 5 KB or 5 MB? You might confuse yourself with
this. Write down the units because “5 MB” helps to remove ambiguity.

Commonly asked back-of-the-envelope estimations: QPS, peak QPS, storage, cache, number of servers,
etc. You can practice these calculations when preparing for an interview. Practice makes perfect.

Congratulations on getting this far! Now give yourself a pat on the back. Good job!

https://bytebytego.com/courses/system-design-interview/back-of-the-envelope-estimation Page 5 of 6
ByteByteGo | Technical Interview Prep 31/10/23, 3:40 PM

Reference materials
[1] J. Dean.Google Pro Tip: Use Back-Of-The-Envelope-Calculations To Choose The Best Design:
http://highscalability.com/blog/2011/1/26/google-pro-tip-use-back-of-the-envelope-calculations-to-choo.html

[2] System design primer:


https://github.com/donnemartin/system-design-primer

[3] Latency Numbers Every Programmer Should Know:


https://colin-scott.github.io/personal_website/research/interactive_latency.html

[4] Amazon Compute Service Level Agreement:


https://aws.amazon.com/compute/sla/

[5] Compute Engine Service Level Agreement (SLA):


https://cloud.google.com/compute/sla

[6] SLA summary for Azure services:


https://azure.microsoft.com/en-us/support/legal/sla/summary/

Become a Contributor Be an affiliate Suggest a new topic

Our Team

Privacy Policy

Terms of Service

Copyright ©2023 ByteByteGo Inc. All rights reserved.

https://bytebytego.com/courses/system-design-interview/back-of-the-envelope-estimation Page 6 of 6

You might also like