Assignment
Assignment
Summary of task:
You need to make a User Facing chat style RAG Agentic System in which the user will ask queries to LLM
and LLM will provide the answer Using given tools (tool call/function call). The condition is that data
must be up to date daily, ie: you will also need to make a data pipeline which will update data daily into
your local MySQL or Vector Database ( VectorDB is out of the scope of task but if you do it , it's good).
In Querying the Database Avoid ORMs, Use RAW SQL Queries, it is not needed that you know
hard and every query. simple select , data filtering and basic table joining is sufficient
3. Creating Basic APIs in python (FastAPI will be Simplest choice here.) or websockets based
interface with LLM.
For this task , Agent is simply LLM with having predetermined tools/functions [ex:
fetch_result_from_google(query: str), function which agent can use to query google by
providing us query arguments] at it's disposal to solve the tasks.
if you are good developer and are familiar with everything except this agentic system, then
just understand it as in LLM system prompt we suggest LLM to use functions based on user
query and give LLM function schema (function schema is just definition of function basically
name of the function ,arguments function takes and what are the type of the argument).
LLM returns us which function to use and with what argument in JSON and we parse the
json , use python eval() to execute the function and return the result back to LLM, which then
LLM summarizes the results and gives the answer to the user.
what is data pipeline?: In simplest definition it is nothing but automation of data processing
from data download to clean to pushing to database without near 0% manual input after
data pipeline creation.
Page 1 of 5
you do not need to setup cronjob/schedulers for this task , during demo we will ask you to
just run the pipeline and will wait for the data update in DB.
6. Ollama For running local LLM ( use qwen2.5 0.5b or 1b ) if you don't have GPT key. we advise you for
testing purpose use local LLM as OpenAI APIs will be costly.
Note: it is not needed to have any fancy good looking UI , simplest plain white HTML page
with result box and input box will be perfect as long as we can chat with Agent(LLM) through
API. You can use AI for this portion. Streamlit , gradio , solara all works.
Conditions
Page 2 of 5
Compulsory Conditions
1. Everything must be async and non blocking (In Agent and User Agent communication interface.
for pipeline it is ok to go with sync as this is just a demo).
Task Breakdown.
Task is divided mainly into 4 parts.
1. Data pipeline
2. Agent Creation with tool calls functionality
3. API/Communication interface with Agent
4. Basic UI.
1. Data pipeline
Choice of Data will determine your system and end project. here are few data sources which meets all
the conditions of required Data.
of course you can use whatever Data you see fit as long as it meets conditions.
1. Federal Registry
This is USA federal registry data API which contains executive documents and other registry
related data. if used this data then user queries for example would be what are the
new executive orders by president donald trump this month and
summarize them for me.
note: queries won't be related to semantic similarity as we have suggested to not setup
vector database. example what are the executive documents related to
artificial intelligence and security in all the past years
Page 3 of 5
This data is daily updating and you can query the API to fetch date specific updates after ,
before data. and it is free.
you only need to get the 2025 dataset.
You can use any dataset from them , related to adverse effects of drugs , reported incidents,
disease related.
You will find many tutorials for this and you can use any framework.
if you do plan to use barebone OpenAI python client or direct aiohttp request to
openai/local_ollama endpoint , you will need to make a system to render/execute LLMs generated
tool calls, which might get little extra but it is not that hard. all you need to do is parse the tool call
request and execute the function and return the result.
make sure to only execute functions which you have defined. otherwise eval() will be very
dangerous to use.
basic overview is that
User Query -> LLM , LLM thinks and decide if need tool to use and yes then which tool -> tool call
response -> Your system executes the tool -> returns result back to LLM -> Again LLM Decide need
to response (here it means: answer to user query) or need to execute another tool , say it decides
to make response -> returns Response -> User.
if you plan to use an agentic library then also it is ok.
Page 4 of 5
Pretty simple and straightforward, You just need to connect users with LLM using API or
websocket.
In simple terms user enters query to UI -> UI calls Your API -> it gives user query to LLM -> and
process happens same as (2) -> API returns returns result to UI -> UI renders it and shows the
response to user.
Overall
Data Pipeline -> MySQL : will daily fill the data.
User Query -> YOUR INTERFACE API -> LLM -> tool call to use MySQL to get user requested data -> LLM
summary -> Response -> API returns the result to UI.
Page 5 of 5