Building a Python
web service with Ray
Philipp Moritz
September 30, 2020
What this talk is about
What this talk is about
Show design patterns for building Python web services with
Ray using Ray tasks and actors
What this talk is about
Show design patterns for building Python web services with
Ray using Ray tasks and actors
Show how we are building Anyscale as a Ray application
What this talk is about
Show design patterns for building Python web services with
Ray using Ray tasks and actors
Show how we are building Anyscale as a Ray application
Show how to address practical challenges like type checking,
testing, tracing, monitoring and deployment
Requirements for a web service
Needs to be available 24/7
Requirements for a web service
Needs to be available 24/7
Needs to be scalable according to user demand
Requirements for a web service
Needs to be available 24/7
Needs to be scalable according to user demand
Needs to integrate external Python libraries and frameworks,
e.g. web serving frameworks or machine learning libraries
Traditional Python web service architecture
Flask server
aiohttp server
fastAPI server
Web logic
Traditional Python web service architecture
Redis
Flask server
Celery
Redis Queue
aiohttp server Multiprocessing
Service 1 Service 2 Service 3
fastAPI server
Web logic Business logic
Traditional Python web service architecture
Database
Redis
Flask server
Celery
Redis Queue
aiohttp server Multiprocessing Blob store
Service 1 Service 2 Service 3
fastAPI server
Web logic Business logic Data
Traditional Python web service architecture
Database
Redis
Flask server
Celery
Redis Queue
aiohttp server Multiprocessing Blob store
Service 1 Service 2 Service 3
fastAPI server
Challenges: Programming, scaling,
Web logic Business logic Data
monitoring, tracing, fault tolerance
Ray web service architecture
Database
Redis
Flask server
Celery
Redis Queue
aiohttp server Multiprocessing Blob store
Service 1 Service 2 Service 3
fastAPI server
Web logic Business logic Data
Ray web service architecture
Database
Service 1 Service 2 Service 3
Flask server
Task Actor Task
Task Actor Task
aiohttp server Blob store
Actor Actor Task
Actor Actor
fastAPI server Task
Web logic Business logic Data
Ray web service architecture
Database
Service 1 Service 2 Service 3
Advantages of Ray:
Flask server
Task Actor Task
● Unified programming model
Task Actor Task
aiohttp server Blob store
Actor Actor Task
Actor Actor
fastAPI server Task
Web logic Business logic Data
Ray web service architecture
Database
Service 1 Service 2 Service 3
Advantages of Ray:
Flask server
Task Actor Task
● Unified programming model
Task Actor Task
● Automatic scheduling,
aiohttp server resource management Blob store
Actor Actor Task
Actor Actor
fastAPI server Task
Web logic Business logic Data
Ray web service architecture
Database
Service 1 Service 2 Service 3
Advantages of Ray:
Flask server
Task Actor Task
● Unified programming model
Task Actor Task
● Automatic scheduling,
aiohttp server resource management Blob store
● Autoscaling Actor Actor Task
Actor Actor
fastAPI server Task
Web logic Business logic Data
Ray web service architecture
Database
Service 1 Service 2 Service 3
Advantages of Ray:
Flask server
Task Actor Task
● Unified programming model
Task Actor Task
● Automatic scheduling,
aiohttp server resource management Blob store
● Autoscaling Actor Actor Task
● Built-in facilities for monitoring
Actor Actor
fastAPI server Task
Web logic Business logic Data
Ray web service architecture
Database
Service 1 Service 2 Service 3
Advantages of Ray:
Flask server
Task Actor Task
● Unified programming model
Task Actor Task
● Automatic scheduling,
aiohttp server resource management Blob store
● Autoscaling Actor Actor Task
● Built-in facilities for monitoring
● Great
fastAPIsupport
server for ML
Actor Actor Task
Web logic Business logic Data
Reminder: The Anyscale Platform
1. Laptop experience with the power of a cluster
2. Serverless experience without serverless limitations
3. Real-time collaboration
Architecture of Anyscale
fastAPI server Service 1 Service 2 Database
Task Task
Task Task
Actor Actor
Web logic Business logic Data
Scaling up with Ray tasks
fastAPI server Sessions Websockets Database
Task Actor Session 1
Session 2
Web logic Business logic Data
/api/v2/session/start
Session 1
Session 2
Scaling up with Ray tasks
fastAPI server Sessions Websockets Database
Task Actor Session 1
Task Session 2
Task
Web logic Business logic Data
/api/v2/session/1/execute
/api/v2/session/1/execute
/api/v2/session/1/execute Session 1
Session 2
Scaling up with Ray tasks
fastAPI server Sessions Websockets Database
Task Actor Session 1
Task Session 2
Task
Task
Web logic Business logic Data
Web logic Business logic
/api/v2/session/1/execute
/api/v2/session/1/execute
/api/v2/session/1/execute Session 1
/api/v2/session/2/execute
Session 2
Managing state with Ray actors
Update
fastAPI server Sessions Notifications
Actor Actor
Task
Update
Web logic Business logic
/api/v2/session/start
Session
Writing an API server with fastAPI
● Makes it easy to define a REST API
● Schema validation
● Typing
@router.get(“/{command_id}/execution_logs”)
async def get_execution_logs(
command_id: int, ...) ->
Response[LogOutput]:
Ray asyncio support
Object Reference
Ray object references are awaitable!
async def get_execution_logs(session_record, session_command_id, log_params):
log = await session_tasks_service.get_execution_log.remote(
session_record["id"],
session_command_id,
logs_params
)
Ray asyncio support
Object Reference
Ray object references are awaitable!
async def get_execution_logs(session_record, session_command_id, log_params):
log = await session_tasks_service.get_execution_log.remote(
@ray.remote
session_record["id"],
class WebSocketActor:
session_command_id, def __init__(self) -> None:
self.sio = socketio.AsyncServer()
logs_params
async def emit(self, message_name: str,
) data: Dict[str, Any]) -> None:
await self.sio.emit(message_name, data)
Ray actors can also be
Typing
fastAPI server Service 1 Service 2
Task Task
Task Task
Actor Actor
Web logic Business logic
Frontend
executeCommand({ async def execute_command( @ray.remote
sessionId, session_id: int, options: Options): def execute_command(
options: { command_record = db.create_command( command: CommandRecord):
command: input session_id, options) runner = AnyscaleSessionRunner()
} execute_command.remote( runner.execute_command(command)
}) command_record)
TypeScript Python Python
Testing
Unit testing: Use the Ray local mode for unit testing:
ray.init(local_mode=True)
Everything runs in a single process -> can mock out interfaces
Integration testing: Use a Ray instance running on the laptop/CI server,
testing web logic, business logic and database
End-to-end testing: Test full functionality in staging environment
Stress testing: Test scalability limits of the system
Metrics and Monitoring
Use Ray’s built in metrics API:
from ray.experimental import metrics
self.create_cluster_stats = metrics.Histogram(
"Anyscale_create_cluster", "Num of seconds took to create cluster",
"second",
[float(i) for i in range(10, 300, 10)],
["step"],
Metrics and Monitoring
Use Ray’s built in metrics API:
from ray.experimental import metrics
self.create_cluster_stats = metrics.Histogram(
"Anyscale_create_cluster", "Num of seconds took to create cluster",
"second",
[float(i) for i in range(10, 300, 10)],
["step"],
Tracing
We use OpenTelemetry for tracing
Can generate detailed traces for a number of Python libraries, including database
clients, web frameworks. requirements.txt:
opentelemetry-api
opentelemetry-sdk
opentelemetry-ext-asgi
Automatic tracing for Ray tasks and actors
opentelemetry-ext-asyncpg
opentelemetry-ext-botocore
opentelemetry-instrumentatio
Can also add custom traces: n-starlette
Tracing
We use OpenTelemetry for tracing
It can generate detailed traces automatically for a number of Python libraries,
including database clients, web frameworks.
requirements.txt
opentelemetry-api
opentelemetry-sdk
Full automatic tracing for Ray tasks and actors opentelemetry-ext-asgi
opentelemetry-ext-asyncpg
opentelemetry-ext-botocore
opentelemetry-instrumentation-starlette
Can also add custom traces:
Deployment
The cloud environment for our web service is set up with
Terraform, to make the setup easily reproducible for
● Development
● Staging,
● Production
The web service is deployed on Docker and Kubernetes, which
integrate well with Ray
Summary
We showed how the Python web serving ecosystem
integrates with Ray
We showed how Ray makes it easy to scale up your web
services and manage their state
We showed how to type, test, monitor and deploy your web
service with Ray
Thanks to the Team @ Anyscale