Modul - Distributed LLM

The project aims to deploy a SaaS-based AI assistant for English language learning, requiring the setup of a multi-region infrastructure for a large language model (LLM). Key tasks include configuring AWS services such as VPC, EFS, security groups, and deploying both frontend and backend components using Next.js and JavaScript. The architecture must ensure security, scalability, and efficient access across regions while adhering to specific technical requirements and naming conventions.

Uploaded by

Pepi Mulyawati

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views

Modul - Distributed LLM

Uploaded by

Pepi Mulyawati

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Description of project and tasks

This module is eight hours - Day 2 – Project 1: Distributed LLM.

The objective of this project is to deploy a SaaS-based AI assistant for English language learning,
including the necessary infrastructure for a multi-region large language model (LLM). Your role
involves setting up the infrastructure to support a dependable, secure, and cost-effective AI assistant.
You must ensure that the architecture is resilient and efficient, enabling users to easily access the AI
assistant from their chosen region. The goal is to create a stable, secure, and scalable learning
environment that is also economical.

Task
1. Read the documentation thoroughly (Outlined below).
2. Please read and understand the application architecture in the Architecture section.
3. Please carefully read the technical details section.
4. Please carefully read the application details.
5. Log in to the AWS console.
6. Setup VPC configurations. The VPC configuration details are in the Network Architecture - Service
Details section.
7. Setup Storage with EFS. Read more details in Storage – Service Details.
8. Setup a security group. You can read more details in Security – Service Details.
9. Setup LLM and Load Balancer Worker for instance. You can read more in LLM – Service Details.
10. Setup Relational Database. Read more in Database – Service Details.
11. Setup API Gateway and Lambda for backend. You can read more in API Gateway and Lambda –
Service Details.
12. Setup Client Application. You can read more in Client App – Service Details.
13. Conduct a thorough testing of the entire infrastructure and ensure that everything operates as
desired.
14. Configure necessary application monitoring and metrics in CloudWatch.

Technical Details
1. Coverage regions available for this project are us-east-1 (N. Virginia) and us-west-2 (Oregon).
2. Use LabRole for all IAM Role needs across every service, including EC2 instances where you can
use LabInstanceProfile.
3. All the necessary resource source code is available on GitHub Repository at
https://github.com/betuah/lks-llm.
4. Any service that requires security should not be exposed directly to 'anywhere'. Implement
appropriate security measures based on the requirements, as this will improve your points.
5. The operating system allowed for this project is Ubuntu, with a minimum version of 20.04.
6. All configurations that use playbooks are not allowed to be manually configured via SSH.
7. Every service you create should follow the naming format with the prefix 'lks-', for example, 'lks-
vpc-zone-a', 'lks-api-gateway', and so on. Judges will only review services that use this naming
convention.
8. When running an LLM, you might experience slower responses due to the fact that the maximum
instance type available is Large whereas LLM computations would perform better with GPUs or
larger vCPUs. This is not a problem; just ensure that the LLM operates correctly and remains
accessible, even if it's slow.
9. Ensure that you label each AWS service you create, except for those that were automatically
generated. Paying attention to these details will contribute to earning more points.
10. Remember to always provide clear descriptions so that your work can be easily understood. This
might earn you additional points.
11. Before the project ends, review your work and delete all unnecessary services to avoid confusion
with the results of your work. This will help you avoid losing many points.
12. The programming language utilized in this project is JavaScript, using the NodeJS version 18 or
later.

Architecture

The above example illustrates one possible architectural design for the English AI Assistant Apps. This
is not the final architecture that you may follow. This architecture shows the design system built by
the application development team to make it easier to comprehend how the application operates.
Please read the Application detail.

Application Details
This project involves deploying a SaaS application designed for an English AI Assistant. The application
aims to enhance English language learning through interactive dialogues with an AI. It will identify
language mistakes, suggest more accurate words, and provide usage tips during conversations. The
application is built using Next.js 14 and an open-source LLM model.

You will be responsible for deploying both the frontend and backend components of the application.
The frontend, or client application, should be deployed using AWS Amplify to ensure proper
functionality and seamless access to the backend endpoints. Additionally, you need to deploy the
essential LLM infrastructure, including LLM Workers, Scoring Workers, and Chat Workers, across the
N. Virginia and Oregon regions. These workers will serve as the core engines for LLM processing.
Furthermore, you must set up API endpoints for the LLM and create an endpoint for storing
conversation data, ensuring these endpoints are publicly accessible by client applications (front-end).

The architecture diagram is provided in the Architecture section. The source code can be accessed at
the following repository link: https://github.com/betuah/lks-llm.

Service Details
Client App
The client application for LLM uses Next.js version 14 and will connect to AWS Cognito for
authentication and to obtain an ID token for authorization with the backend API. Set up a Cognito User
Pool with the following specifications:

• Use email as the attribute for sign-in.

• Do not use temporary passwords in the password policy.
• Implement single authentication factor.
• Required attributes during sign-up are name and email.
• Allow the application to use refresh tokens and user passwords for authentication.

You must be deploying client apps to AWS Amplify as the platform for this client application, and all
installation and environment setup requirements are detailed in the README.md of the client
repository.
Note: When a user has already signed up, you may need manually confirm their registration in the Cognito.

Network Architecture
In this project, you are required to create a multi-region network with two zones: lks-zone-a
(172.32.0.0/23), lks-zone-b (10.10.0.0/23). Every zone is represented the VPC. Here are the details:
• Zone A is located in the us-east-1 region, while Zones B is located in the us-west-2 region.
• Ensure VPC A can connect to VPCs B and so on.
• Each Zone should have 3 Subnets: 1 Public Subnet and 2 Private Subnets.
• Subnet allocation for Public Subnets, use the first network in the range, providing up to 200 hosts
• Use the second and third networks in the range as private subnets.
• Each Public Subnet should be in Availability Zone 1a, and Private Subnets should be in Availability
Zones 1a and 1b.
• Use only two route tables to manage the private (lks-rtb-private) and public (lks-rtb-public)
subnets for each VPC.
• Ensure each private subnet can access the internet. To minimize costs, consider using a NAT
instance rather than Nat Gateway. You can read how to create instance as a NAT in NAT Instance
Details.
NAT Instance
You can use an Ubuntu OS instance with type `t2.micro` as the NAT Instance and here are the
configuration details:
1. Enable IP forwarding by editing the `sysctl.conf` file and add or uncomment the following line:
net.ipv4.ip_forward=1
2. Apply the changes:
sudo sysctl -p
3. Install iptables-persistent to ensure your NAT rules persist after reboots:
sudo apt install iptables-persistent -y
4. Set up the NAT rules using `iptables`:
iptables -t nat -A POSTROUTING -o [interface] -j MASQUERADE
iptables -F FORWARD
Notes: All the ec2-instances and network interfaces have source and destination check enabled by default, this means that
instances can only send or receive traffic for their own IP address and do not support transitive routing.

Storage
You will utilize Elastic File System (EFS) as shared storage for storing LLM models. You are required to
create and deploy the EFS within a VPC in the us-east-1 region. Configure the performance settings to
enable bursting, and make sure automatic backups are enabled. Mount the EFS on each LLM Worker,
setting the mount point to /share. Remember, you are not permitted to create EFS in any other region;
you are only allowed to create it in the specified region.

Security
Security is a crucial part of this project. Do not expose any service requiring a security group to
'anywhere.' Here are the security details to observe:

• Ensure the database is accessible only from Lambda functions (lks-sg-db).

• Ensure LLM Workers instances are accessible only through port 11434 from the Load Balancer (lks-
sg-llm).
• Ensure the Load Balancer is accessible only through port 80 (lks-sg-lb).
• Ensure the NAT Instance can only perform traffic forwarding, remember don’t allow all traffic from
internet (lks-sg-nat).
• Ensure EFS is accessible only from the private subnets or LLM security group of each VPC zone (lks-
sg-efs).
Remember that granting excessive access will create security vulnerabilities in your architecture.
There should be no more than 5 security groups in the us-east-1 region and 2 security groups in the
us-west-2 region.
Notes: Make sure to clean up any unused security groups in each region
LLM (Large Language Model)
There will be three types of workers: LLM Workers, Scoring Workers, and Chat Workers.
• LLM Workers: These are responsible for embedding, pulling models, retrieving the list of models,
and other LLM task.
• Scoring Workers: These are dedicated to generating scoring feedback for conversations within the
application.
• Chat Workers: These are dedicated handle streaming chat conversations.
You will deploy these workers in each VPC Zone within the private subnets, ensuring that the
computation for LLM, scoring, and chat handling is separated within each VPC Zone. Use the t2.large
instance type for each Workers instance.
Configure each worker in each VPC Zone using the Ansible playbook provided in the source repository
for configuration provisioning. Here are the details:
• Each worker will use the same model, which will be stored on EFS. Update the EFS Vars in the
playbook and enter the address of your EFS.
• Execute the configuration playbook with State Manager in AWS Systems Manager (SSM).
• Specify the targets only based on tags in State Manager.
• You may need to install Ansible as an agent on each worker.
After completing the configuration successfully, verify it by accessing the endpoint of one of the
workers on port 11434. For example, use the following command:
curl -i http://WORKER_IP:11434/api/tags
If the response is HTTP 200, the Worker is functioning correctly. You can pull the model using:
curl http://WORKER_IP:11434/api/pull -d '{"name": "orca-mini"}'
If the download is successful, check the downloaded model by accessing:
http://WORKER_IP:11434/api/tags
Do this for each worker. If all workers have the same model, it indicates that your configuration is
correct, and the storage was mounted successfully. You need to configure a private application load
balancer listening on port 80 and route the endpoints according to the following rules:
• All `/api` endpoint will be used for LLM Worker except `/api/generate` and `/api/chat`.
• `/api/generate/` endpoint will be redirected to Scoring Worker.
• `/api/chat` endpoint will be redirected to Chat Worker.
• The root path must return the message 'its worked!' with an HTTP response status of 200 for health
check.
Ensure that you have backup instances in different Availability Zones for each worker.
Notes:
- You can read the LLM API Endpoint detail in the GitHub Repository.
- Make sure orca-mini, llama3, and nomic-embed-text models have been downloaded, see in the repository docs how to
pull the model.

Relational Database
In this project, the client application will store all conversation histories in a relational database and
use them as vectors. You may use PostgreSQL as the database with the pgvector plugin to store
history data along with its vector versions. The Lambda function may will handle it for enabling
pgvector extension. These conversation vectors will be used for real-time feedback evaluation of
the ongoing conversations. The vectors will be updated as conversations increase. Configure the
database to be as secure as possible.
Lambda Function
In this project, a Lambda function is required to handle requests from the API Gateway related to the
/conversations endpoint. You can either use a single Lambda function or separate them based on the
method. Although you might need to refactor the existing code, your decision will impact cost-
effectiveness and performance, and it will definitely affect your score. The entire repository for the
necessary Lambda functions can be found at source code repository (Refer to the Technical
Description) with path `/serverless/src/function`. The Lambda function will process API requests for
the /conversations endpoint, performing CRUD (Create, Read, Update, Delete) operations on
conversation data stored in the database. The Lambda function will need to interact with a database
to store and retrieve conversation data. To facilitate this, the function will use environment variables
for database configuration:
• DB_USER: The username used to authenticate and access the database. This should be a valid user
account with the necessary permissions to interact with the database.
• DB_PASSWORD: The password associated with the DB_USER. This is used in conjunction with the
username to securely authenticate and access the database.
• DB_HOST: The hostname or IP address of the database server. This specifies the location where
the database is hosted and must be reachable from the Lambda function.
• DB_PORT: The port number on which the database server is listening. This allows the Lambda
function to connect to the correct port on the database server.
• DB_NAME: The name of the database to which the Lambda function will connect. This specifies
which database within the server the function will interact with.

API Gateway
API Gateway is a crucial component in this project. You should use a public API Gateway that will be
accessible by client apps from any region. In this project, you are advised to create a REST API Gateway
and only allow users who are registered with Cognito to access all endpoints on the API Gateway. The
client application will use the Authorization header as credentials. Here are the API endpoint
requirements needed for the client LLM application:
Endpoint Method Path Parameters Body Payload (JSON Format)
/conversations/(uid) GET uid None
/conversations/(uid)/(id) GET uid, id None
/conversations/(uid) POST uid - id (required)
- title (required)
- conversation (required)
- embedding (not required)
/conversations/(uid)/(id) PUT uid, id - title (not required)
- conversation (not required)
- embedding (not required)
/conversations/(uid) DELETE uid None
/conversations/(uid)/(id) DELETE uid, id None

For the LLM endpoint, you are required to create two endpoints: `/us-east-1` and `/us-west-2`,
each targeting the LLM Load Balancer in the respective region. Note that the client application will
only access the LLM endpoint without the `/api` prefix (Refer to the LLM Section Detail for endpoint
access). For example, to access the LLM endpoint in the `us-east-1` region, the endpoint would be
`https://api_gateway_endpoint_url/us-east-1/tags` without `/api/tags`, and the
allowed methods are POST and GET. You may need VPC Link with NLB (Network Load Balancer) to
connect the API Gateway and LLM Load Balancer.

Endpoint Method Target

/us-east-1/* GET LLM LB Region N.Virginia
/us-east-1/* POST LLM LB Region N.Virginia
/us-west-1/* GET LLM LB Region Oregon
/us-west-1/* POST LLM LB Region Oregon

(S.I. No. 51) The Legal Practitioners' Practice Rules, 2002
No ratings yet
(S.I. No. 51) The Legal Practitioners' Practice Rules, 2002
23 pages
Unit 3 Financial Credit Risk Analytics
No ratings yet
Unit 3 Financial Credit Risk Analytics
14 pages
Element On Premise Documentation 0703 0905 PDF
No ratings yet
Element On Premise Documentation 0703 0905 PDF
71 pages
Chat Application Through Client
No ratings yet
Chat Application Through Client
26 pages
Sketch The Architecture of Iot Toolkit and Explain Each Entity in Brief
100% (1)
Sketch The Architecture of Iot Toolkit and Explain Each Entity in Brief
9 pages
C# for Beginners: Learn in 24 Hours
From Everand
C# for Beginners: Learn in 24 Hours
Alex Nordeen
No ratings yet
Following God's Plan
100% (1)
Following God's Plan
2 pages
Lks Cc Nasional Modul1 2021
No ratings yet
Lks Cc Nasional Modul1 2021
12 pages
Soal Provinsi Modul2-2023
No ratings yet
Soal Provinsi Modul2-2023
11 pages
Modul 1
No ratings yet
Modul 1
10 pages
Soal Uji Coba LKS Provinsi Jatim
No ratings yet
Soal Uji Coba LKS Provinsi Jatim
33 pages
Micro Services
No ratings yet
Micro Services
29 pages
Minimum Bandwidth Reservation: 1.1 Organization Profile
No ratings yet
Minimum Bandwidth Reservation: 1.1 Organization Profile
34 pages
Opnet - Sec - 3
No ratings yet
Opnet - Sec - 3
16 pages
Lks Cc Nasional Modul3 2021
No ratings yet
Lks Cc Nasional Modul3 2021
7 pages
LTRSEC-3001 FTD Integration in ACI Aug 2019
No ratings yet
LTRSEC-3001 FTD Integration in ACI Aug 2019
68 pages
AWS Solutions Architect Associate Master Cheat Sheet
No ratings yet
AWS Solutions Architect Associate Master Cheat Sheet
124 pages
Nginx Vs Apache: Application Server Web Server
No ratings yet
Nginx Vs Apache: Application Server Web Server
6 pages
CV: Nagendra KS Via
No ratings yet
CV: Nagendra KS Via
5 pages
Questions-Iran
No ratings yet
Questions-Iran
4 pages
Q&A
No ratings yet
Q&A
6 pages
FULL DOC of Hiding Sensitive
No ratings yet
FULL DOC of Hiding Sensitive
62 pages
Smart Lab Using Saas Technology
No ratings yet
Smart Lab Using Saas Technology
4 pages
Connecting Customized Ip To The Microblaze Soft Processor Using The Fast Simplex Link (FSL) Channel
No ratings yet
Connecting Customized Ip To The Microblaze Soft Processor Using The Fast Simplex Link (FSL) Channel
12 pages
Project Report of Hotel Team C
No ratings yet
Project Report of Hotel Team C
18 pages
API Comparsions
No ratings yet
API Comparsions
6 pages
C#Web Basics SQL
No ratings yet
C#Web Basics SQL
113 pages
Everything All at Once
No ratings yet
Everything All at Once
3 pages
Abstract On Socket Programming
100% (2)
Abstract On Socket Programming
33 pages
SmartEdgeOpen ImplementationGuide
No ratings yet
SmartEdgeOpen ImplementationGuide
16 pages
CloudComputing TP Day1 Actual en
No ratings yet
CloudComputing TP Day1 Actual en
7 pages
ccse-r77.30-lab-setup
No ratings yet
ccse-r77.30-lab-setup
35 pages
Project Report
No ratings yet
Project Report
55 pages
DevNet Associate Version 1 0 DevNet Associate 1 0 Final Exam Answers
No ratings yet
DevNet Associate Version 1 0 DevNet Associate 1 0 Final Exam Answers
32 pages
Releasenotes
No ratings yet
Releasenotes
5 pages
Supraja DevOps Interview Preparation
No ratings yet
Supraja DevOps Interview Preparation
52 pages
ArtGalleryFullDoc
No ratings yet
ArtGalleryFullDoc
55 pages
Professional Cloud DevOps Engineer V12.35
No ratings yet
Professional Cloud DevOps Engineer V12.35
17 pages
.Net
No ratings yet
.Net
13 pages
NF Registration
No ratings yet
NF Registration
18 pages
Servlets Notes
No ratings yet
Servlets Notes
14 pages
Software As A Service (Saas) Multi-Tenant in Cloud Computing
100% (1)
Software As A Service (Saas) Multi-Tenant in Cloud Computing
5 pages
CLOUD COMPUTING Notes
No ratings yet
CLOUD COMPUTING Notes
10 pages
Chatapplicationthroughclientservermanagementsystemproject
No ratings yet
Chatapplicationthroughclientservermanagementsystemproject
30 pages
HCIP-Datacom-Network Automation Developer V1.0 Training Material
No ratings yet
HCIP-Datacom-Network Automation Developer V1.0 Training Material
617 pages
Unit 5 Short Answers: 1. Write Short Notes Wamp? Ans: Wamp For Iot
No ratings yet
Unit 5 Short Answers: 1. Write Short Notes Wamp? Ans: Wamp For Iot
21 pages
5.3. Feasibility Study
No ratings yet
5.3. Feasibility Study
24 pages
Implementation UCD AU19B1014
No ratings yet
Implementation UCD AU19B1014
26 pages
Sample AZ-104
No ratings yet
Sample AZ-104
7 pages
Online Java Compiler
No ratings yet
Online Java Compiler
3 pages
Modul 2 - IoT Application - Update
No ratings yet
Modul 2 - IoT Application - Update
10 pages
70-646 Exam Questions Full
50% (2)
70-646 Exam Questions Full
237 pages
SMU Synopsis
0% (1)
SMU Synopsis
13 pages
AWS_Cloud_Computing_Unit_4
No ratings yet
AWS_Cloud_Computing_Unit_4
16 pages
Design Docment Smart Parking System: Ma'am Fakhra Aftab
No ratings yet
Design Docment Smart Parking System: Ma'am Fakhra Aftab
13 pages
Online Movie Ticket Booking System
100% (1)
Online Movie Ticket Booking System
72 pages
CCC C C CCCCC C CCC CCCCCC CCCCCCCC CCC !" C#C$ CC C
No ratings yet
CCC C C CCCCC C CCC CCCCCC CCCCCCCC CCC !" C#C$ CC C
8 pages
Curriculum Vitae N. Durga Prasad Raju: 0 9949891734 Email:: Career Objective
No ratings yet
Curriculum Vitae N. Durga Prasad Raju: 0 9949891734 Email:: Career Objective
5 pages
Budget Analysis Report
No ratings yet
Budget Analysis Report
62 pages
A Introduction To The New Set of Microsoft Software Technologies
No ratings yet
A Introduction To The New Set of Microsoft Software Technologies
55 pages
OCI 2023 Architect Associate 1Z0 1072 23
No ratings yet
OCI 2023 Architect Associate 1Z0 1072 23
89 pages
Web Services Notes
No ratings yet
Web Services Notes
36 pages
Professional Heroku Programming
From Everand
Professional Heroku Programming
Chris Kemp
4/5 (2)
Chapter 1
No ratings yet
Chapter 1
9 pages
Trainspotting (Film) Essay
No ratings yet
Trainspotting (Film) Essay
8 pages
Level-2 Ingles
100% (1)
Level-2 Ingles
55 pages
Xaam - In-Improving Essay Writing by Ias Topper Gaurav Aggrawal
No ratings yet
Xaam - In-Improving Essay Writing by Ias Topper Gaurav Aggrawal
1 page
Golden Sky Stories (Jump Version G.01)
No ratings yet
Golden Sky Stories (Jump Version G.01)
32 pages
Dbms Lab Manual
No ratings yet
Dbms Lab Manual
19 pages
Syntor Brochure 2014
No ratings yet
Syntor Brochure 2014
19 pages
Mark-On, Mark-Down, Mark-Up, and Margins: Lesson 3.1
No ratings yet
Mark-On, Mark-Down, Mark-Up, and Margins: Lesson 3.1
80 pages
Continue: Sociology Notes in Kannada PDF Download
100% (4)
Continue: Sociology Notes in Kannada PDF Download
2 pages
Research Journal Elcano's Group
No ratings yet
Research Journal Elcano's Group
7 pages
Kofi's Doucments
No ratings yet
Kofi's Doucments
33 pages
Creative Nonfiction
No ratings yet
Creative Nonfiction
11 pages
Smiles For Ukraine 1 Lesson Plan Sample
No ratings yet
Smiles For Ukraine 1 Lesson Plan Sample
2 pages
JBA 2019 Bio Shaheen-Mistri
No ratings yet
JBA 2019 Bio Shaheen-Mistri
3 pages
The Human Experience Is A Magnificent Tapestry Woven With Threads of Countless Stories
No ratings yet
The Human Experience Is A Magnificent Tapestry Woven With Threads of Countless Stories
2 pages
18. Consolidation - II
No ratings yet
18. Consolidation - II
9 pages
Academic Dishonesty Final
No ratings yet
Academic Dishonesty Final
18 pages
Cbjesspu 04
No ratings yet
Cbjesspu 04
10 pages
Lesson 1 Bitstrips
No ratings yet
Lesson 1 Bitstrips
4 pages
IFS19
No ratings yet
IFS19
18 pages
H Female Healing Renaissance Europe
No ratings yet
H Female Healing Renaissance Europe
47 pages
Bar Ops Checklist
No ratings yet
Bar Ops Checklist
2 pages
The ESC Textbook of Cardiovascular Imaging 3rd Edition Jose Luis Zamorano - The complete ebook is available for download with one click
100% (1)
The ESC Textbook of Cardiovascular Imaging 3rd Edition Jose Luis Zamorano - The complete ebook is available for download with one click
52 pages
Supreme Court
No ratings yet
Supreme Court
8 pages
DC Pandey Mechanics Volume 1 (Crackjee - Xyz) - Removed
No ratings yet
DC Pandey Mechanics Volume 1 (Crackjee - Xyz) - Removed
500 pages
KSG 10000 Part 1
No ratings yet
KSG 10000 Part 1
318 pages
About Us Product Services Spare Parts Dealer: Find Your Hino!
100% (2)
About Us Product Services Spare Parts Dealer: Find Your Hino!
3 pages