0% found this document useful (0 votes)

62 views4 pages

Backend Engineering Take-Home Assignment

The assignment requires building a Retrieval Augmented Generation (RAG) system using Weaviate as a vector database, focusing on document ingestion, embedding generation, and efficient retrieval. Key tasks include implementing an ingestion pipeline for various document formats, creating a question-answer API, optimizing performance, and deploying the application on a cloud platform. Deliverables include a code repository, live deployment, comprehensive documentation, and optional design write-up, with a completion timeline of 5-7 days.

Uploaded by

saketh reddy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

62 views4 pages

Backend Engineering Take-Home Assignment

Uploaded by

saketh reddy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Backend Engineering Take-Home

Assignment
Overview
In this assignment, you will build a Retrieval Augmented Generation (RAG)
system that leverages Weaviate as its vector database. The primary goal is to
design a performant system that efficiently retrieves answers from uploaded
documents in various formats. This project will assess your ability to develop a
robust backend system focused on data ingestion, embedding generation,
indexing, and retrieval.

Project Requirements
1. Document Ingestion & Embedding Generation

● Supported Formats:
○ PDF
○ DOCX
○ JSON
○ TXT

● Ingestion Pipeline:
○ Document Upload: Implement functionality to upload documents
in any of the above formats.
○ Embedding Creation: For each uploaded document, generate
embeddings using an appropriate model (e.g., OpenAI’s
text-embedding, Hugging Face models, etc.).
○ Note: I should be able to upload a new doc with the same name
and that should clear the earlier embeddings stored for that doc
and replace it with the new document embeddings.
○ Storage: Store the generated embeddings within Weaviate.
○ Automation: Develop an automated pipeline that:
1. Monitors for new document uploads.
2. Processes and generates embeddings.
3. Indexes the embeddings in Weaviate.
2. Question-Answer API Endpoint

● Functionality:
○ APIs to ingest documents and update a document. Note: Update
is equivalent to re-uploading(via API, no need of an UI, but if you
can quickly spin it up then great) the entire doc with the changes.
○ Create an API endpoint that accepts queries against individual
documents.
○ The system should retrieve the most relevant text snippet(s) from
the queried document stored in Weaviate.
● Response Requirements:
○ Return the answer with associated metadata such as:
■ Snippet of the retrieved text.
■ Document ID or other relevant identifiers.

3. Performance Optimization

● Efficient Retrieval:
○ Ensure that the RAG system is optimized for quick and accurate
retrieval.
○ Implement best practices such as:
■ Document chunking for handling large documents.
■ Precomputed embeddings to reduce latency during query
time.

4. Deployment

● Platform:
○ Deploy the application on a cloud platform of your choice (e.g.,
AWS, GCP, Azure, Render, Railway, etc.).
● Accessibility:
○ Provide publicly accessible endpoints for testing.
○ Include clear deployment instructions in your documentation.

5. JSON Data RAG Extension (Bonus)

● Enhanced Functionality (Optional):

○ Extend the system to support structured queries for JSON data.
○ Capabilities may include:
■ Retrieving the maximum or minimum value of a specified
field.
■ Performing aggregations (e.g., sum, average) on numerical
fields.

Deliverables
1. Code Repository:

○ The repository should contain:

■ The document ingestion pipeline.
■ API endpoint implementation.
■ Deployment scripts.
○ Include a README.md file with comprehensive setup and usage
instructions.
○ You’re free to use GenAI tools like GPT/Claude.
2. Live Deployment:
○ Provide a publicly accessible URL for testing the system.
3. Documentation:
○ A detailed explanation of the system architecture and workflow.
○ API documentation that explains the endpoints and their usage.
4. Design Write-Up (Optional):

○ Summarize your design choices, including any trade-offs.

○ Highlight potential improvements and challenges encountered
along with their solutions.

Evaluation Criteria
● Correctness & Completeness:
○ Does the system correctly ingest documents and answer queries?
● Code Quality & API Design:
○ Is the code well-organized, maintainable, and documented?
● Deployment:
○ Is the application successfully deployed and easily testable via
public endpoints?
● Bonus Features:
○ Are the extended JSON data aggregation capabilities
implemented effectively?
● Clarity & Documentation:
○ Are the provided instructions clear, comprehensive, and easy to
follow?

Timeline & Submission

● Timeframe: Please complete the assignment within 5-7 days.
● Submission:
○ Provide a link to your GitHub/GitLab repository (or a ZIP file if
necessary).
○ Include deployment details and API documentation to facilitate
testing.

Thank you for taking on this assignment. It is designed to evaluate your skills
in backend system development, data ingestion, indexing, and retrieval.
Should you have any questions during the process, please do not hesitate to
reach out.

Good luck!

Note: Below are the sample files that can be used to generate embeddings.

whistleblower-policy-ba-revised.pdf
example.json

List of Employers in Australia
No ratings yet
List of Employers in Australia
45 pages
Problem Solution and Tech Stack
No ratings yet
Problem Solution and Tech Stack
22 pages
Full Stack Developer Assignmnet - PanScience Innovations
No ratings yet
Full Stack Developer Assignmnet - PanScience Innovations
3 pages
Problem Statement 1 - Real-Time Collaborative Document Editing System
No ratings yet
Problem Statement 1 - Real-Time Collaborative Document Editing System
3 pages
9.structural Behaviour and Design Criteria of Concrete Box-Girder Bridges - JRC
No ratings yet
9.structural Behaviour and Design Criteria of Concrete Box-Girder Bridges - JRC
16 pages
Rag 1708257109
No ratings yet
Rag 1708257109
5 pages
Typeface - Project Assignment Questions
No ratings yet
Typeface - Project Assignment Questions
3 pages
Considerații Privind Restaurarea Unei Icoane Rusesti Din Sec Al XIX-lea
100% (2)
Considerații Privind Restaurarea Unei Icoane Rusesti Din Sec Al XIX-lea
11 pages
Backend Developer Assignment
No ratings yet
Backend Developer Assignment
3 pages
Product Manager Task334
No ratings yet
Product Manager Task334
2 pages
Assignment For Applied AI Engineer (RAG Pipeline) Role
No ratings yet
Assignment For Applied AI Engineer (RAG Pipeline) Role
4 pages
Gearbox
100% (1)
Gearbox
5 pages
Implementing A Retrieval-Augmented Generation System
No ratings yet
Implementing A Retrieval-Augmented Generation System
3 pages
Purchase Receipt
No ratings yet
Purchase Receipt
3 pages
RAG Syllabus R&D
No ratings yet
RAG Syllabus R&D
6 pages
Document RAG Assignment
No ratings yet
Document RAG Assignment
4 pages
One-Month Crash Course - Implementing RAG Architecture With Python, FastAPI, and Vector Search
No ratings yet
One-Month Crash Course - Implementing RAG Architecture With Python, FastAPI, and Vector Search
4 pages
Sda A1
No ratings yet
Sda A1
5 pages
BDIA Fall2024 Assignment2 3
No ratings yet
BDIA Fall2024 Assignment2 3
4 pages
Mars Open Projects 2025
No ratings yet
Mars Open Projects 2025
7 pages
Take-Home Challenge
No ratings yet
Take-Home Challenge
3 pages
Wa0000.
No ratings yet
Wa0000.
5 pages
Engineering Onboarding & Tech Stack Overview
No ratings yet
Engineering Onboarding & Tech Stack Overview
5 pages
Python Coding Exercise
No ratings yet
Python Coding Exercise
2 pages
Full-Stack Developer Assignment
No ratings yet
Full-Stack Developer Assignment
3 pages
Practical RAG
No ratings yet
Practical RAG
127 pages
Ragbuilder Env
No ratings yet
Ragbuilder Env
7 pages
OpenMic Ai AI Product Engineer (Full Stack Engineer
No ratings yet
OpenMic Ai AI Product Engineer (Full Stack Engineer
4 pages
Rag Pipeline
No ratings yet
Rag Pipeline
10 pages
Assignment (Full-Stack Development) : Todocontroller /get/ (Id) /getall /put/ (Id) /create/ (Id)
No ratings yet
Assignment (Full-Stack Development) : Todocontroller /get/ (Id) /getall /put/ (Id) /create/ (Id)
2 pages
Questionaire
No ratings yet
Questionaire
6 pages
Technical Interview Task
No ratings yet
Technical Interview Task
3 pages
Coding Exercise
No ratings yet
Coding Exercise
4 pages
MID 039 - CID 1846 - FMI 09: Pantalla Anterior
No ratings yet
MID 039 - CID 1846 - FMI 09: Pantalla Anterior
6 pages
Interview Task 1
No ratings yet
Interview Task 1
2 pages
Python - Backend and AI - ML - Job Description
No ratings yet
Python - Backend and AI - ML - Job Description
2 pages
LLM Specialist Assignment - PanScience Innovations
No ratings yet
LLM Specialist Assignment - PanScience Innovations
2 pages
IDEH Assignment
No ratings yet
IDEH Assignment
4 pages
Ue21cs421ac1 20240924233834
No ratings yet
Ue21cs421ac1 20240924233834
54 pages
PROJECT
No ratings yet
PROJECT
32 pages
NCM 120 - Maternal Concept
No ratings yet
NCM 120 - Maternal Concept
19 pages
Different Kinds of Photography
No ratings yet
Different Kinds of Photography
3 pages
SECTION 2 Course Outline Managerial Economics MGCR 293 002 Dr. K. Salmasi (Fall 2017)
No ratings yet
SECTION 2 Course Outline Managerial Economics MGCR 293 002 Dr. K. Salmasi (Fall 2017)
12 pages
Child, You Have To Do It Now
No ratings yet
Child, You Have To Do It Now
69 pages
Chapter#1 - Introduction To Web Engineering
No ratings yet
Chapter#1 - Introduction To Web Engineering
54 pages
CHAPTER 8 Auditing Marketing and Sales
No ratings yet
CHAPTER 8 Auditing Marketing and Sales
26 pages
Studentinfo Homework
No ratings yet
Studentinfo Homework
11 pages
Prof Ed 106 Written Report 2.1
No ratings yet
Prof Ed 106 Written Report 2.1
12 pages
Exhibit 0224
No ratings yet
Exhibit 0224
16 pages
Tyler Hoge Resume
No ratings yet
Tyler Hoge Resume
1 page
Mathematics BSC FYUP Syllabus 2024
No ratings yet
Mathematics BSC FYUP Syllabus 2024
36 pages
How To Trade The IV Flush Strategy
No ratings yet
How To Trade The IV Flush Strategy
4 pages
Module 2 Inverse Functions
No ratings yet
Module 2 Inverse Functions
3 pages
Infosys Questions - 2
No ratings yet
Infosys Questions - 2
21 pages
Concert Mri Datasheet
No ratings yet
Concert Mri Datasheet
3 pages
SAC Assessment v2 2025 05 07
No ratings yet
SAC Assessment v2 2025 05 07
3 pages
TSC7320 Controller Manual
No ratings yet
TSC7320 Controller Manual
51 pages
Chapter 22
No ratings yet
Chapter 22
54 pages
AS8002 Datasheet en v1
No ratings yet
AS8002 Datasheet en v1
25 pages
Plutopia Chapters 19-20, 22, 30
No ratings yet
Plutopia Chapters 19-20, 22, 30
3 pages
Harshit Sinha: Deloitte Financial Advisory Services India Private Limited (USI)
No ratings yet
Harshit Sinha: Deloitte Financial Advisory Services India Private Limited (USI)
1 page
MMG 301 Final March18
No ratings yet
MMG 301 Final March18
143 pages
What Makes A Garden
No ratings yet
What Makes A Garden
11 pages
System Programming Essentials with Go: System calls, networking, efficiency, and security practices with practical projects in Golang
From Everand
System Programming Essentials with Go: System calls, networking, efficiency, and security practices with practical projects in Golang
Alex Rios
No ratings yet
Learning Docker
From Everand
Learning Docker
Pethuru Raj
5/5 (5)
Go Programming Blueprints - Second Edition
From Everand
Go Programming Blueprints - Second Edition
Mat Ryer
4.5/5 (3)
Professional Plone 4 Development
From Everand
Professional Plone 4 Development
Martin Aspeli
3.5/5 (5)
The Ultimate Django Guide: From Beginner to Advanced Web Development
From Everand
The Ultimate Django Guide: From Beginner to Advanced Web Development
Jiho Seok
No ratings yet
Getting Started with Oracle Data Integrator 11g: A Hands-On Tutorial
From Everand
Getting Started with Oracle Data Integrator 11g: A Hands-On Tutorial
David Hecksel
5/5 (2)
Practical C++ Backend Programming
From Everand
Practical C++ Backend Programming
Justin Barbara
No ratings yet
Nginx Troubleshooting
From Everand
Nginx Troubleshooting
Alex Kapranoff
No ratings yet
Learning Nagios - Third Edition
From Everand
Learning Nagios - Third Edition
Wojciech Kocjan
No ratings yet
Learn C++
From Everand
Learn C++
Aishik Dutta
No ratings yet
Python Basics Made Simple: A Practical Guide with Examples
From Everand
Python Basics Made Simple: A Practical Guide with Examples
William E. Clark
No ratings yet
Angular Workshop: From Beginner to Pro, Creating Applications for the Real World
From Everand
Angular Workshop: From Beginner to Pro, Creating Applications for the Real World
Abdelfattah Ragab
No ratings yet
FuelPHP Application Development Blueprints
From Everand
FuelPHP Application Development Blueprints
Sébastien Drouyer
No ratings yet
Mastering RethinkDB
From Everand
Mastering RethinkDB
Shahid Shaikh
No ratings yet
Professional Heroku Programming
From Everand
Professional Heroku Programming
Chris Kemp
4/5 (2)
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
From Everand
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
alasdair gilchrist
5/5 (1)
Practical C++ Backend Programming: Crafting Databases, APIs, and Web Servers for High-Performance Backend
From Everand
Practical C++ Backend Programming: Crafting Databases, APIs, and Web Servers for High-Performance Backend
Justin Barbara
No ratings yet
Study Guide Cisco 300-735 SAUTO Automating and Programming Cisco Security Solutions Exam
From Everand
Study Guide Cisco 300-735 SAUTO Automating and Programming Cisco Security Solutions Exam
Anand Vemula
No ratings yet
Study Guide Cisco 300-535 SPAUTO Automating and Programming Cisco Service Provider Solutions
From Everand
Study Guide Cisco 300-535 SPAUTO Automating and Programming Cisco Service Provider Solutions
Anand Vemula
No ratings yet
C++ Basics for New Programmers: A Practical Guide with Examples
From Everand
C++ Basics for New Programmers: A Practical Guide with Examples
William E. Clark
No ratings yet
Python OOP Step by Step: A Practical Guide with Examples
From Everand
Python OOP Step by Step: A Practical Guide with Examples
William E. Clark
No ratings yet
Python Programming: Learn, Code, Create
From Everand
Python Programming: Learn, Code, Create
Sachin Naha
No ratings yet
Visual Basic 2010 Coding Briefs Data Access
From Everand
Visual Basic 2010 Coding Briefs Data Access
Kevin Hough
5/5 (1)
C# 2010 Coding Briefs Data Access
From Everand
C# 2010 Coding Briefs Data Access
Kevin Hough
No ratings yet
Microsoft AZ-400: Designing and Implementing Microsoft DevOps Solutions - Certification Exam Prep
From Everand
Microsoft AZ-400: Designing and Implementing Microsoft DevOps Solutions - Certification Exam Prep
Steve Brown
No ratings yet
C# for Beginners: Learn in 24 Hours
From Everand
C# for Beginners: Learn in 24 Hours
Alex Nordeen
No ratings yet
Java / J2EE Interview Questions You'll Most Likely Be Asked
From Everand
Java / J2EE Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet

Backend Engineering Take-Home Assignment

Uploaded by

Backend Engineering Take-Home Assignment

Uploaded by

Backend Engineering Take-Home

5. JSON Data RAG Extension (Bonus)

●​ Enhanced Functionality (Optional):

○​ The repository should contain:

○​ Summarize your design choices, including any trade-offs.

Timeline & Submission

You might also like

● Enhanced Functionality (Optional):

○ The repository should contain:

○ Summarize your design choices, including any trade-offs.