0% found this document useful (0 votes)
18 views108 pages

Repl333 2

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views108 pages

Repl333 2

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 108

Replit end to end

Beginner to advance
Giving some Context
Good to haves
Pre requisites
1. Docker / Containerization
1. Basic Coding (loops, if else, variables) 2. Kubernetes
2. Node.js 3. AWS ASGs

What we’ll learn

Basic Advance
1. Backend communication 1. Kubernetes
2. Docker / Containerization 2. Pseudo Terminals
3. Isolated environments 3. Nix
4. Remote code execution
5. repl.it system design/architecture
Before we start - Disclaimer
We’ll be taking 3 approaches to solve this problem

Beginner friendly Cloud speci c Autoscaling constructs Cloud native Approach

1. How I would implement it


from rst principles back 1. Mid approach 1. Good approach
in college 2. Great for a new startup 2. Autoscales, secure
2. Highly insecure 3. Secure, autoscales 3. How I would build replit
3. If you use this approach in an interview 4. Uses cloud dependant constructs
you’ll be rejected (ASG, ECS)
4. Good to know what’s happening though
5. Doesn’t scale
fi
fi
How did I think of this architecture?

First principles (~8 years of coding)


Reverse engineering repl.it (~3 hours)
Going through their bog posts, GitHub and videos on YT

All are linked in the description


What we’re building
Online IDE for long running backend/frontend apps
What we’re building

1. Online IDE for long running backend/frontend apps


2. Ability to start a fresh environment (in Rust, Go,
Python, Node, React …)
3. Ability to autoscale servers with users
4. Run code in an isolated environment
What we’re NOT building, but good to haves

1. Authentication (Login with google)


2. Extremely good UI
Requirements
User should be able to start a new environment in a selected stack
Requirements
User should be taken to a basic boilerplate repl for that environment
Requirements
User should be able to edit and save their code somewhere
Requirements
User should be able to run code (both long running and short running)
We’ll be taking 3 approaches to solve this problem

Callout -
You can use an external service/codebase that does
a lot of this for you.
If you’re a startup, you probably want to
pick one of these services and not build this from scratch

https://github.com/coder/code-server
(Github codespaces)
Beginner friendly

Great way to build intuition on how you can build something like this
DO NOT use it in production/in an interview

Downsides
1. Insecure Remote code execution
2. Single server setup, doesn’t autoscale
3. Port con icts between two users (every user is sharing resources on the same server)
4. Terminal is extremely rudimentary
5. Very ugly package management
fl
Why is building repl.it hard?
Why is building repl.it hard?

1. Remote code execution

repl.it server

Your browser can’t run C++/Rust/Go/React (Not technically true)


Why is building repl.it hard?
You need to give user VCPU guarantees

1. Remote code execution


2. Long running processes

Example of short running process - Leetcode


Why is building repl.it hard?

1. Remote code execution


2. Long running processes
3. Shell access inside browser
Why is building repl.it hard?

1. Remote code execution


2. Long running processes
3. Shell access inside browser
4. File storage (not di cult)

Example of not requiring le storage - Leetcode


ffi
fi
Why is building repl.it hard?

Let’s see how we can achieve all of these one by one

1. Remote code execution


2. Long running processes
3. Shell access inside browser
4. File storage (not di cult)
ffi
You will have 1 big monolith for this part

Execution service

Execute

1. Remote code execution


2. Long running processes
3. Shell access inside browser
4. File storage (not di cult)
ffi
Execution service
Step 0.1 - Keeping a copy of base projects in S3

AWS S3
Execution service
Step 0.2 - Bringing all languages you support (Node, Go, Rust) to this machine

Mac machine/AWS Server (Execute service)

Install node
Install rust
Install Golang

Execution service
Step 1 - Initialising the repl
Execution service
Step 1 - Initialising the repl

Execute
Execution service
Step 1 - Initialising the repl
Copy over the base image to
s3://images/{id}

Execute
Execution service
Step 2 - Taking the user to the edit screen
Execution service
Step 3 - Initialise a ws connection

Websocket
Execute
Execution service
Step 4 -Bring the users code to the VM

S3

Websocket Pulls latest


Execute
code
Execution service
Step 4 -Bring the users code to the VM

S3

Websocket Pulls latest


Execute
code

Send the lesystem


over to the user
lazily
fi
Execution service
Step 5 - Let the user edit les

S3

Websocket Pulls latest


Execute
code
fi
Execution service
Step 5 - Let the user edit les

Websocket Execute
fi
Execution service
Step 5 - Let the user edit les

Websocket Execute
fi
Execution service
Step 5 - Let the user edit les

Websocket Execute
fi
Execution service
Step 5 - Let the user edit les

Websocket

S3
Callouts - Debounce these saves
You can mount a directory to S3 as well, although need to make sure node_modules don’t reach S3
fi
Execution service
Logic to add and delete les also remains the same!
Validation of les ( le format, size) is something you should take into consideration

Websocket Execute

S3
fi
fi
fi
Execution service
Step 6 -Running/Executing the code

npm run dev Execute


Execution service

3000
npm
run dev

npm run dev Execute

Streaming logs
Execution service

3000
npm
run dev

Execute
Streaming logs
Execution service

Disconnects
Execute
Clean up resources
1. Wait for a bit before
removing the folder
2. Flush to S3
3. Stop any lingering
process

You can wait for a bit


before doing this
Downsides

1. Remote code execution


2. Single server setup, doesn’t autoscale
3. Port con icts between two users (every user is sharing resources on the same server)
4. Terminal is very ad-hoc/ rst principles
fl
fi
Code

Disclaimers
1. I’m using Node.js. Keep it simple
2. I’m using socket.io. Keep it simple
3. I’ll be writing code in TS, but nothing too strict. Keep it simple
4. I will not be adding any extra u that’s not needed for this tutorial (eslint, prettier).
5. No monorepos - code repetition

Should you create a zig based well linted 100% tested CI/CD implemented system?
Maybe

But we’re limiting the scope of this tutorial


fl
ff
Introducing PTY

If you want to do a terminal inside a browser


(or create your own terminal for eg)
What you can do is create a `pseudo terminal`
that your browser can talk to

xterm.js is a library that lets you fetch and


forward keystrokes to a `pseudo terminal` that
you can spawn on your server

https://github.com/xtermjs/xterm.js

https://github.com/replit/ruspty
Introducing PTY

Old approach

Exec(npm run dev)


Execute

Stream logs
Introducing PTY

New approach

PTY

Execute

Starts a pseudo terminal


Introducing PTY

New approach

PTY

Stream
Execute
Stream
Part 2 | The good solution
Part 2 | The good solution

What were the biggest problems in approach #1 ?


1. Remote code execution isn’t safe
2. Doesn’t autoscale
Part 2 | The good solution
There are two approaches you can take

Cloud speci c Autoscaling Container Orchestration

AWS - Auto scaling Groups, ECS Kubernetes


fi
Cloud speci c Autoscaling

2 cpu
Server 1
10 GB

Browser 1

2 cpu
Server 2
Browser 2 10 GB

Browser 3
2 cpu
Server 3
10 GB
fi
Cloud speci c Autoscaling

2 cpu
Server 1
10 GB

Browser 1

2 cpu
Server 2
Browser 2 10 GB

Browser 3
2 cpu
Server 3
10 GB

Browser 4

2 cpu
Server 4
10 GB
fi
Cloud speci c Autoscaling

2 cpu
Server 1
10 GB
Upsides
1. Easy to do
Browser 1
2. Provides you a way to securely run code
3. Autoscales
4. No port con icts
2 cpu
Server 2
Browser 2 10 GB

Downsides Browser 3
1. Bootup time (not a huge problem) 2 cpu
Server 3
2. Over provisioned servers 10 GB
3. Not cloud agnostic

Browser 4

2 cpu
Server 4
10 GB
fl
fi
Kubernetes
Kubernetes

Before we get into why this approach is good, let’s understand


what
1. Containers are
2. k8s is
Containers
Containers

Mac
Containers

Mac

Filesystem Filesystem

JS Code Golang Code

Filesystem Filesystem

React Code Rust code


Containers

Mac

Port 3000 Port 3001

Port 3002 Port 3004


Containers

Mac

Network Network

JS Code Golang Code

Network Network

React Code Rust code


Container Orchestration - Kubernetes

What if you want to run multiple such containers,


and autoscale your servers?
Container Orchestration - Kubernetes

Mac Linux

What if you could have a cluster of machines


and describe in a single le how to
run multiple containers
Windows Ubuntu
fi
Container Orchestration - Kubernetes

What if you could have a cluster of machines Mac Linux


and describe in a single le how to
run multiple containers

React React

Windows Ubuntu

Node
fi
Container Orchestration - Kubernetes

Not just a le, you could Mac Linux


add and remove containers
by doing a function call

React React React

Windows Ubuntu

Node
React
fi
Container Orchestration - Kubernetes

Some Kubernetes jargon

1. Nodes
Container Orchestration - Kubernetes

Some Kubernetes jargon


React React
1. Nodes
2. Pods

Node
React
Container Orchestration - Kubernetes

Some Kubernetes jargon

1. Nodes
2. Pods
3. Services
4. Ingress
Container Orchestration - Kubernetes

Services and ingress control how your application is exposed


How can people access your services running inside pods?
How does a user that has created a repl, reach the container
where their code is present?
Container Orchestration - Kubernetes

Services let you expose your pods 44.2.11.3


Service React
1. You can either expose them on the internet
2. Or you can let other people access your pods
3. Each pod can have an associated service
Service

131.44.11.22
Service Node
Container Orchestration - Kubernetes

Node
A service can also load balance across pods

44.2.11.3
Service

Node
Container Orchestration - Kubernetes

Let’s go through the replit runner and see


how they expose services
What do you think they do?

Approach #1 Approach #2

44.2.11.3
Service Node Node

44.2.11.3
Service

1.33.14.1
Service React React
Container Orchestration - Kubernetes

You might feel like they do this , and they do Node


But services don’t let you do path based routing

They will load balance, but they won’t let 44.2.11.3


you control where the request should go Service
based on the host URL

React
Container Orchestration - Kubernetes

Service Node

pod1.repl.it

How can we do path based routing? pod1.repl.it


?
pod2.repl.it
pod2.repl.it

Service
React
Container Orchestration - Kubernetes

Ingress

Service Node

pod1.repl.it

pod1.repl.it
Ingress
Controller
pod2.repl.it
pod2.repl.it

Service
React
Container Orchestration - Kubernetes

Ingress Controller and Ingress

Node
pod1.repl.it Service

Ingress
pod1.repl.it

Ingress
Controller
pod2.repl.it
Ingress

Service
pod2.repl.it
React
Container Orchestration - Kubernetes

Ingress Controller and Ingress

Node
pod1.repl.it Service

Ingress
pod1.repl.it

Service Pod
pod2.repl.it
Ingress

Service
pod2.repl.it
React
Container Orchestration - Kubernetes
They can exist on di erent nodes as well
As long as they are in the same cluster, Ingress controller should be able to route tra c

Service Node
pod1.repl.it

Ingress
pod1.repl.it

Service Pod
pod2.repl.it
Ingress

Service
pod2.repl.it
React
ff
ffi
Container Orchestration - Kubernetes

Given all this information, can you guess the nal architecture?

fi
Container Orchestration - Kubernetes

Step 1 - Start a k8s cluster, set some autoscaling policies on the nodes
Container Orchestration - Kubernetes

Step 2 - Attach an ingress controller to the cluster (one time)

Service Pod
Container Orchestration - Kubernetes

Step 3 - Point your DNS to the IP of the ingress controller

pod1.repl.it

Service Pod
pod2.repl.it
Container Orchestration - Kubernetes

Step 3 - Point your DNS to the IP of the ingress controller


Container Orchestration - Kubernetes
Step 4 - As people start repls, start a
pod, service and ingress for them

Service Node
pod1.repl.it

Ingress
pod1.repl.it

Service Pod
pod2.repl.it
Container Orchestration - Kubernetes
Step 4 - As people start repls, start a
pod, service and ingress for them

Service Node
pod1.repl.it

Ingress
pod1.repl.it

Service Pod
pod2.repl.it

Ingress

Service
pod2.repl.it
React
Container Orchestration - Kubernetes
Step 4 - As people leave repls, stop the respective
pod, service and ingress

pod1.repl.it

Service Pod
pod2.repl.it

Ingress

Service
pod2.repl.it
React
Container Orchestration - Kubernetes

Step 4 - Cluster will autoscale based on the policies you added

React

pod1.repl.it Service

Service Pod
pod2.repl.it

Ingress

pod2.repl.it
Container Orchestration - Kubernetes

You don’t NEED Kubernetes for what we’re doing today


Kubernetes gives you a bunch of things that we don’t need today (for
example deployments, automatic restarts, recycling containers …
Part 2 | The good solution
Some more jargon

Basic Jargon Advance Jargon (not needed for this tutorial)


1. Reproducible builds - Nix
1. Containers (Docker)
2. Network volumes
2. Container Orchestration (Kubernetes)
3. Caching dependencies
Part 2 | The good solution

My proposed solution
Part 2 | The good solution
We have 3 services, and a k8s cluster

Orchestrator 32 cpus 32 cpus


Simple HTTP API
(http or ws) 100gb 100gb

32 cpus 32 cpus
100gb 100gb
Runner ws server
1. Simple HTTP API
Step 1 - Initialising the repl
Copy over the base image to
s3://images/{id}

Simple HTTP API


2. Runner Service

This is the same as the web socket service


we built in the last section

3000
npm
run dev

Execute
Runner ws server

Ws connection
What happens after the user starts the repl?
We need to start an independent runner for them
While it starts, the user sees the loading screen
3. Orchestrator
Step 3 -Tell the orchestrator to start a pod

http
Orchestrator
(http or ws)
Orchestrator
Step 3 -Tell the orchestrator to start a pod

Runner
S3

Websocket/http
Orchestrator
(http or ws)
Execution service
Step 3 -Tell the orchestrator to start a pod
Tells it to pull the code from S3

Runner
S3

Websocket/http
Orchestrator
(http or ws)

runner_addr
Token

Callout -
1. Caching is super helpful here
2. You can maintain a warm pool of pods that
you can auto assign immediately
Execution service
Step 3 -Tell the orchestrator to start a pod

runner_addr Ingress
Token

Websocket
Runner

Callout -
1. Caching is super helpful here
2. You can maintain a warm pool of pods that
you can auto assign immediately
Execution service
Step 4 - Let the user edit a le, send over di
over the ws layer

runner_addr
Token
Runner
Websocket
fi
ff
Execution service
Step 4 - Let the user edit a le, send over di
over the ws layer

runner_addr
Token
Runner
Websocket
fi
ff
Execution service
Step 4 - Let the user edit a le, send over di
over the ws layer

runner_addr
Token
Runner
Websocket

S3
fi
ff
Execution service
Step 5 - Terminal access

startSession
Runner
Websocket
Execution service
Step 5 - Terminal access

Starts a pseudo terminal

startSession
Runner
Websocket
Execution service
Step 5 - Terminal access

Relay keystrokes/commands
Runner
Execution service
Step 6 - Accessing a process

Ingress

3000
Runner
Execution service
Step 7 - Destroying the pod
If the process has 0 ws conns alive for
~5 minutes, it can kill itself.

Ingress

3000
Runner
Execution service

Why is this secure?

Remote code runs in a pod 3000


Pod has restricted permissions
Runner
Pod has resource limits (only 2CPUs for example)
Pod has it’s own network (no port con ict)
fl
Execution service

Why is this scalable?

3000 3000 3000


Runner Runner Runner

Starting a new runner is as simple as starting a pod


As the CPU usage goes up and more pods get added,
All you have to do is increase the number of nodes in your cluster
3000
Runner

3000 3000
Runner Runner
Few callouts that might feel like good to haves

1. User can inspect your ws codebase (not good). They can stop this process as well.
Fixing this might involve starting a parent process which takes inputs from the user
and forwards it to the container through a socket
2. User can still reboot your machine (can do the same on replit)
Few advance things that repl.it does

1. Nix for reproducible builds


2. Network mounts to allow users to get 250Gb space on each repl
3. repl.it doesn’t let the user directly connect to the pod, but most probably relays
information via a relay ws server

You might also like