100% found this document useful (1 vote)

67 views32 pages

URL Shortner

The document discusses requirements for building a URL shortening system. It identifies the need to generate and retrieve short URLs from long URLs. It outlines the data model and APIs needed for the URL shortening microservice and discusses scaling it out in a distributed manner.

Uploaded by

vij

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

67 views32 pages

URL Shortner

Uploaded by

vij

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 32

URL shortner

Functional Requirements
Given a long URL, generate a unique short URL

Given a short URL, retrieve long URL

What character set: A-Z, a-z, 0-9

Length of short URL: 7

TTL of short URL

Analytics
Identify microservices from requirements
1. URL shortner microservice
Dissect Microservice
Each microservice is a set of tiers

Tiers

Application/Web server tier

In-memory/cache tier (Memcached)

Storage server tier (if you need source of truth or persistence)

Each tier is a distributed system

Template
First design the system as if it is a single server system

Data model and APIs (changes from problem to problem)

Why distributed (changes from problem to problem)

Distributed system

Generic architecture (does not change from problem to problem)

Data distribution / sharding + APIs (change from problem to problem)

Replication (generic)

Consistency (generic)
Single server
Tiers

Application server

Cache tier

Source of truth tier

Source of truth
Dat Model + APis

URL shortner table

K-V: K: short URL/unique id: V: long URL + timestamp +TTL

APIs

create(V), read(K), delete(K), update(K, V)

Source of truth:

Row oriented: pros: Write firendly, con: selection of a small number of fields in the
value when the value is arbritrarily large

Column oriented: pro: selection of a small number of fields in the value when the
value is arbritrarily large , con: not write friendly

Write row oriented data in a memtable, and then merge lazily into column
stores (LSM trees)

In our case, we will go with row oriented, with primary key index on the keys
Cache
Hashmap of keys and values

Memory is byte addressable

Application logic
create(V)

Go to source of truth,

Assign an unique id

Store unique id/long URL

Cache the created record

Application server maps the unique id to a short URL

read(K)

Application server maps the short URL back to the unique id

Query cache with the id

If not found in cache, Query source of truth with the id

Map unique id to short URL
With 62 characters and 7 positions, 62^7 permutations ~ 4 trillion

Lets say unique id is 65:

Will do base 62:

65 = 062^6 + 062^5 + …. + 162^1 + 362^0 = ‘0’’0’’0’’0’’0’’1’’3’ =AAAAABD

A-Z:0-25: a-z:26-51: 0-9: 52-61

Why distributed
1. Storage scale out
a. Number of unique K-V * size of K-V
i. Number of unique K-V/second, capacity plan for an year or two
2. Throughput scale out
a. Latency / response time of a single request = x ms (typically p50, p90, p99, SLAs)
b. Throughput required from your system = Y ops/sec
i. My calculation: (30,000 - 60,000)/x ops/sec in a single sever
ii. Number of servers = Y/single server throughput
3. Availability - replication factor
4. Geo-location
5. Reduction of latency (problem specific)
a. Not common but will be needed for some problems
Storage: 6000 URLs/sec * Number of seconds in 2 years* size of each record

Throughput: 6000 writes /sec + 300K reads /sec

Distributed System
Cluster of worker nodes that do the actual work

One or more routers/load balancers

Load balancer sends requests based on load

Routers send request based on state

Cluster manager

To monitor health of the worker nodes, so that requests can be forwarded to a

live worker node

Config store
Mapping of data to partitions
Horizontal sharding

Partitioning by key: Subsets of keys with full values in a single shard or bucket

More common, especially for K-V APis

Vertical sharding

Partitioning by value: All keys but subsets of values

Less common

Range based: pros: con of hash, con: skew

Hash based: pros: uniform distribution, but split or merge of shards is hard
Mapping of data to partitions for this problem
Horizontal hash is the best

I will use horizontal range for visualization

[0 - 512 million] -> Shard id 0-> Servers A, C, E

[512 million - 1 billion]->Shard id 1-> Servers B, D, F

Mapping of partitions to servers
Consistent hashing
Partition is the granularity at which data is replicated, moved, rebalanced

Replication brings in challenges of consistency

CAP theorem

Consistency, availability and network partition tolerance does not come all
three together
Document search problem
Requirements
Given a search string of terms, return all document ids that contain all the terms in
the string

We do not have to deal with documents. Some preprocessing is done to

create a data model dealing with terms and doc ids

Relevance does not matter

Static data set

Single server approach for in-memory tier
Data model + API

K-V

K: term: value: sorted list of doc ids (inverted index)

O (nklogk), n = size of list, k = number of terms, n is the dominating factor

API: search(string)

Get list of doc ids for all terms

Distributed system
1. Storage scaleout
2. Throughput scaleout
3. Availability
4. Geo location
5. Reduce the latency of an individual request by several orders of magnitude
Sharding
Horizintal sharding

[Aa - af] -> Shard 0 -> Servers A, C, E

[ag - ]
Stream Processing
Problem statement
Imagine a data center having hundreds of servers each emitting thousands of
statistics per second (such as CPU utilization, memory utilization)

We need to build a system that serves a dashboard

1. Given a server id, return min, max, avg of all stats within a time window of 30
minutes
2. Given a statistic id, return min, max, avg of all hosts within a time window of
30 minutes
3. Given a server id and a time range, return min, max, avg of all stats
4. Given a statistic id and a time range, return min, max, avg of all hosts

Data will live for an year

Services
1. Data collection - pub/sub
2. Data aggregation
Single server approach
Tiers: Web tier, in-memory for 30 minutes, storage for an year

Data Model and API

K-V: K: Server id, stat id, timestamp/minute: V: min, max, avg

APIs

Collection get APIs

Table_1 minute:

Time series
How distributed

Horizontal hash
News Feed, Uber, Netflix
Recommendations
Workflow
OLTP: Simple workloads but with high concurrency

Tweet generation

Uber: vehicle ingestion

Netflix Recommendation system: watching pattern, clicking pattern, endorsements

OLAP:

Traditional applications: reporting, agggerate over a period of time

Twitter: Timeline, but aggregation is more near line

Cache of relevant friends

Uber: Vehicle location dashboard

OLTP-> non frequent ETL -> OLAP

System Design Interview Fundamentals
100% (4)
System Design Interview Fundamentals
412 pages
Mimix PDF
No ratings yet
Mimix PDF
46 pages
Avamar Data Domain Integration Lab Guide 2014
No ratings yet
Avamar Data Domain Integration Lab Guide 2014
41 pages
System Design Interview Preparation
50% (2)
System Design Interview Preparation
8 pages
SQ L Server 2008 Fail Over Cluster
No ratings yet
SQ L Server 2008 Fail Over Cluster
185 pages
Distributed Systems: Chapter 1 - Introduction
100% (2)
Distributed Systems: Chapter 1 - Introduction
74 pages
Mysql Server 5.7 by CJ
100% (1)
Mysql Server 5.7 by CJ
42 pages
Code Migration
67% (3)
Code Migration
8 pages
Data Domain Replication Commands
100% (1)
Data Domain Replication Commands
3 pages
System Design - ML Design 1 PDF
100% (1)
System Design - ML Design 1 PDF
24 pages
Dump: Exam 000-253 - IBM WebSphere Application Server Network Deployment V6.1, Core Administration
100% (1)
Dump: Exam 000-253 - IBM WebSphere Application Server Network Deployment V6.1, Core Administration
55 pages
System Design: Interview Prep
No ratings yet
System Design: Interview Prep
30 pages
Unit 5 Managing Data Resources
No ratings yet
Unit 5 Managing Data Resources
9 pages
Systems Design Interview Study Guide
100% (1)
Systems Design Interview Study Guide
18 pages
Database System Architectures: 10/12/2016 1 Md. Golam Moazzam, Dept. of CSE, JU
No ratings yet
Database System Architectures: 10/12/2016 1 Md. Golam Moazzam, Dept. of CSE, JU
85 pages
Golden Rules To Answer in A System Design Interview
100% (2)
Golden Rules To Answer in A System Design Interview
33 pages
Docu31579 Using VNX Replicator 7.0
No ratings yet
Docu31579 Using VNX Replicator 7.0
220 pages
Hitachi Datasheet For AMS 2500
No ratings yet
Hitachi Datasheet For AMS 2500
2 pages
Introduction To Nosql: Gabriele Pozzani
No ratings yet
Introduction To Nosql: Gabriele Pozzani
49 pages
System Design Cheat Sheet
No ratings yet
System Design Cheat Sheet
6 pages
HBD102 Landscape Transformation PDF
No ratings yet
HBD102 Landscape Transformation PDF
35 pages
CS6601-Distributed Systems
No ratings yet
CS6601-Distributed Systems
12 pages
Vceexamstest 1z0 908 Mysql 8.0 Database Administrator Verified Questions Answers by Moreno 24-05-2024 7qa
No ratings yet
Vceexamstest 1z0 908 Mysql 8.0 Database Administrator Verified Questions Answers by Moreno 24-05-2024 7qa
12 pages
All1 7ForMidTerm PDF
No ratings yet
All1 7ForMidTerm PDF
97 pages
Rick Copeland @rick446 Arborian Consulting, LLC
No ratings yet
Rick Copeland @rick446 Arborian Consulting, LLC
32 pages
Veeam Backup Replication vs. Zerto Report From PeerSpot 2023-09-02 1qs9
No ratings yet
Veeam Backup Replication vs. Zerto Report From PeerSpot 2023-09-02 1qs9
58 pages
Case Study
100% (2)
Case Study
4 pages
Bigdata and Nosql DBS: Piyushgupta July2013
No ratings yet
Bigdata and Nosql DBS: Piyushgupta July2013
27 pages
Dynamo: Amazon's Highly Available Key-Value Store
No ratings yet
Dynamo: Amazon's Highly Available Key-Value Store
21 pages
Big Data IN A Gist
No ratings yet
Big Data IN A Gist
16 pages
System Design Handbooks
No ratings yet
System Design Handbooks
13 pages
Replicate and Bundle (RNB) : Kalyani Khandezod, Nitin Raut, Abdulla Shaik
No ratings yet
Replicate and Bundle (RNB) : Kalyani Khandezod, Nitin Raut, Abdulla Shaik
6 pages
64 Prerna Jain Dspractassg11
No ratings yet
64 Prerna Jain Dspractassg11
8 pages
An Overview of Cloud Computing at Yahoo!: Raghu Ramakrishnan
No ratings yet
An Overview of Cloud Computing at Yahoo!: Raghu Ramakrishnan
68 pages
017A039 Unit 1.2.2
No ratings yet
017A039 Unit 1.2.2
21 pages
Nosql Overview: Implementation Free
No ratings yet
Nosql Overview: Implementation Free
40 pages
System Design Golden Rules
No ratings yet
System Design Golden Rules
37 pages
Dynamo: Amazon'S Highly Available Key-Value Store: Csci 8101: Advanced Operating Systems Presented By: Chaithra KN
No ratings yet
Dynamo: Amazon'S Highly Available Key-Value Store: Csci 8101: Advanced Operating Systems Presented By: Chaithra KN
23 pages
System Design Terms
No ratings yet
System Design Terms
52 pages
System Design
No ratings yet
System Design
150 pages
System Design
No ratings yet
System Design
54 pages
Contacts Log
No ratings yet
Contacts Log
320 pages
Google: Designs, Lessons and Advice From Building Large Distributed Systems
100% (3)
Google: Designs, Lessons and Advice From Building Large Distributed Systems
73 pages
System Design Theory Book
No ratings yet
System Design Theory Book
128 pages
SC All Flash Spec Sheet HK
No ratings yet
SC All Flash Spec Sheet HK
8 pages
TSI2929 Lab Guide v5-3
0% (1)
TSI2929 Lab Guide v5-3
118 pages
S11 - System Architecture
No ratings yet
S11 - System Architecture
79 pages
System Design
No ratings yet
System Design
150 pages
System Design
No ratings yet
System Design
32 pages
Grokking The System Design Interview
No ratings yet
Grokking The System Design Interview
25 pages
System Design
No ratings yet
System Design
56 pages
HP P6000 CV Kit Contents
No ratings yet
HP P6000 CV Kit Contents
13 pages
Module 3 Challenge Lab - Creating A Static Website For The Café
0% (1)
Module 3 Challenge Lab - Creating A Static Website For The Café
6 pages
System Design
No ratings yet
System Design
56 pages
Nalsd Workbook A4
No ratings yet
Nalsd Workbook A4
7 pages
Cheat Sheet v2
No ratings yet
Cheat Sheet v2
3 pages
Chapter 13
No ratings yet
Chapter 13
26 pages
90 Must Know Interview Questions
No ratings yet
90 Must Know Interview Questions
90 pages
System Design Importnat Concepts
No ratings yet
System Design Importnat Concepts
16 pages
Indian Bank Improves Services Availability With IT Management
No ratings yet
Indian Bank Improves Services Availability With IT Management
5 pages
NCP Mci 6 5 - 12082024
No ratings yet
NCP Mci 6 5 - 12082024
157 pages
DFS OS Final
No ratings yet
DFS OS Final
28 pages
Site Recovery Manager Technical Overview
No ratings yet
Site Recovery Manager Technical Overview
30 pages
Data Engineering Unit 3
No ratings yet
Data Engineering Unit 3
4 pages
DP-203 Exam - Free Actual Q&As, Page 7 - ExamTopics
No ratings yet
DP-203 Exam - Free Actual Q&As, Page 7 - ExamTopics
11 pages
System Design
No ratings yet
System Design
30 pages
Big Data Slides
No ratings yet
Big Data Slides
26 pages
System Design
No ratings yet
System Design
385 pages
A Study of The Internal and External Effects of Concurrency Bugs
No ratings yet
A Study of The Internal and External Effects of Concurrency Bugs
10 pages
DC 2 QA Unit III
No ratings yet
DC 2 QA Unit III
12 pages
No SQL
No ratings yet
No SQL
12 pages
System Design Principles
No ratings yet
System Design Principles
1 page
System Design
No ratings yet
System Design
6 pages
3 Most Asked System Design Questions 1682754519
No ratings yet
3 Most Asked System Design Questions 1682754519
34 pages
Unit 4
No ratings yet
Unit 4
13 pages
Algomasterio System Design Interview Handbook
No ratings yet
Algomasterio System Design Interview Handbook
19 pages
Irs Unit-4
No ratings yet
Irs Unit-4
19 pages
A Case Study On Different Applications and Security Issues in Distributed Systems
No ratings yet
A Case Study On Different Applications and Security Issues in Distributed Systems
10 pages
Class 6 - Caching
No ratings yet
Class 6 - Caching
3 pages
Unit-4 DFS-1
No ratings yet
Unit-4 DFS-1
9 pages
2 Designing A URL Shortening Service Like TinyURL
No ratings yet
2 Designing A URL Shortening Service Like TinyURL
23 pages
System Design Basics: Key Concepts
No ratings yet
System Design Basics: Key Concepts
35 pages
D PVM OE 01 Demo Questions
No ratings yet
D PVM OE 01 Demo Questions
8 pages
Straw Man
No ratings yet
Straw Man
25 pages
System Design Golden Rules
No ratings yet
System Design Golden Rules
37 pages
Ops Center Protector Quick Start Guide 1
No ratings yet
Ops Center Protector Quick Start Guide 1
23 pages
Different Ways of Caching and Maintaining Cache Consistency
No ratings yet
Different Ways of Caching and Maintaining Cache Consistency
10 pages
Distributed Caching - A System Design Interview Guide
No ratings yet
Distributed Caching - A System Design Interview Guide
8 pages

URL Shortner

Uploaded by

URL Shortner

Uploaded by

URL shortner

Given a short URL, retrieve long URL

What character set: A-Z, a-z, 0-9

Length of short URL: 7

TTL of short URL

Application/Web server tier

In-memory/cache tier (Memcached)

Storage server tier (if you need source of truth or persistence)

Each tier is a distributed system

Data model and APIs (changes from problem to problem)

Why distributed (changes from problem to problem)

Generic architecture (does not change from problem to problem)

Data distribution / sharding + APIs (change from problem to problem)

Source of truth tier

URL shortner table

K-V: K: short URL/unique id: V: long URL + timestamp +TTL

create(V), read(K), delete(K), update(K, V)

Memory is byte addressable

Store unique id/long URL

Cache the created record

Application server maps the unique id to a short URL

Application server maps the short URL back to the unique id

Query cache with the id

If not found in cache, Query source of truth with the id

Lets say unique id is 65:

Will do base 62:

65 = 0*62^6 + 0*62^5 + …. + 1*62^1 + 3*62^0 = ‘0’’0’’0’’0’’0’’1’’3’ =AAAAABD

A-Z:0-25: a-z:26-51: 0-9: 52-61

Throughput: 6000 writes /sec + 300K reads /sec

One or more routers/load balancers

Load balancer sends requests based on load

Routers send request based on state

To monitor health of the worker nodes, so that requests can be forwarded to a

More common, especially for K-V APis

Partitioning by value: All keys but subsets of values

Range based: pros: con of hash, con: skew

I will use horizontal range for visualization

[0 - 512 million] -> Shard id 0-> Servers A, C, E

[512 million - 1 billion]->Shard id 1-> Servers B, D, F

Replication brings in challenges of consistency

We do not have to deal with documents. Some preprocessing is done to

Relevance does not matter

Static data set

K: term: value: sorted list of doc ids (inverted index)

O (nklogk), n = size of list, k = number of terms, n is the dominating factor

Get list of doc ids for all terms

[Aa - af] -> Shard 0 -> Servers A, C, E

We need to build a system that serves a dashboard

Data will live for an year

Data Model and API

K-V: K: Server id, stat id, timestamp/minute: V: min, max, avg

Collection get APIs

Uber: vehicle ingestion

Netflix Recommendation system: watching pattern, clicking pattern, endorsements

Traditional applications: reporting, agggerate over a period of time

Twitter: Timeline, but aggregation is more near line

Cache of relevant friends

Uber: Vehicle location dashboard

OLTP-> non frequent ETL -> OLAP

You might also like

65 = 062^6 + 062^5 + …. + 162^1 + 362^0 = ‘0’’0’’0’’0’’0’’1’’3’ =AAAAABD