PythonAI LLMs ForSharing
PythonAI LLMs ForSharing
AI
Python + AI
🧠 3/11: LLMs
↖️ 3/13: Vector
embeddings
🔍 3/18: RAG
3/20: Vision models
3/25: Structured outputs
3/27: Quality
Register & Safety
aka.ms/PythonAI/serie
Python + AI
🧠 Large Language Models
Pamela Fox
Python Cloud Advocate
www.pamelafox.org
Today we'll cover...
• How LLMs work
• Model options
• Using LLMs from Python
• Improving LLM output
• LLM libraries
• Building LLM-powered apps
How LLMs work
You've probably used an LLM...
ChatGPT, GitHub Copilot, Bing Copilot, and many other products are
powered by LLMs.
LLMs: Large Language Models
An LLM is a machine learning model that is so large that it
achieves general-purpose language understanding &
generation.
Natural
Model language
Get results Decoding + output
Post-processing
Probability
distribution
Pre-processing input: Tokenizaton
Tokens:
platform.openai.com/tokenizer
Model inputs and outputs
butterflie
Mon arch lay eggs on milk
s
0.02 milk
0.00
plants
3
butterfli
Mon arch
es
lay eggs on 0.01
leaves the
3
0.07 groun
7 d 1 token
n tokens in
…
out
0.00
zux
6
Natural
Model language
Get results Decoding + output
Post-processing
Probability
distribution
Model options
Hosted models
Hosted LLMs can be accessed via API, from a company
hosting the model and infrastructure for you.
Host Models
GPT-3.5
OpenAI.com GPT-4
GPT-4o
OpenAI GPT models, Meta models,
Azure AI
Cohere, Mistral, DeepSeek,...
github.com/marketplace/models
Local models
Local models have open weights and can run on personal machines.
SLMs ("Small Language Models") are smaller models, < 100 B paramete
github.com/pamelafox/python-openai-demos
For GitHub models, set base URL to GitHub models host and
set key to PAT (personal access token):
client = openai.OpenAI(
base_url="https://models.inference.ai.azure.com",
api_key=os.environ["GITHUB_TOKEN"])
print(response.choices[0].message.content)
completion = client.chat.completions.create(
model=MODEL_NAME,
stream=True,
messages=[
{"role": "system", "content": "You are a helpful assistant
that makes lots of cat references and uses emojis."},
{"role": "user", "content": "please write a haiku about a
hungry cat that wants tuna"},
])
Cons:
• Creative 😖
• Makes stuff up (unknowingly)
• Limited context window (4K-128K)
• More tokens = more $, more time
Improving LLM output
Ways to improve LLM output
• Prompt engineering: Request a specific tone and format
• Few-shot examples: Demonstrate desired output format
• Chained calls: Get the LLM to reflect, slow down, break it down
😊 Covering today!
response = client.chat.completions.create(model=MODEL_NAME,
messages=[{"role": "user",
"content": f"You are an experienced editor. Review the following
explanation and provide detailed feedback on clarity, coherence, and
engagement.\n Explanation: \n {explanation}"}])
feedback = response.choices[0].message.content
response = client.chat.completions.create(model=MODEL_NAME,
messages=[{"role": "user",
"content": f"Revise the article using the following feedback, but keep it
to a single paragraph. \n Explanation:\n {explanation} \n\n Feedback:\
n{feedback}"}])
final_article = response.choices[0].message.content
Full example: chained_calls.py
Popular LLM libraries
LLM libraries
Many Python packages offer a layer of abstraction for working with LLMs:
• Langchain: Orchestration
• Llamaindex: Orchestration for RAG and agents
• PydanticAI: Production-grade AI application development
• Semantic Kernel: Orchestration
• Autogen: Orchestration for agentic flows
• Litellm: Wrapper over multiple hosts
llm = ChatOpenAI(
model_name=os.environ["GITHUB_MODEL"],
openai_api_base="https://models.inference.ai.azure.com",
openai_api_key=os.environ["GITHUB_TOKEN"]
)
prompt = ChatPromptTemplate.from_messages(
[("system", "You're a helpful assistant that makes cat
references."),
("user", "{input}")]
)
chain = prompt | llm
response = chain.invoke(
{"input": "write a haiku about a hungry cat that wants tuna"})
Full example: chat_langchain.py
Llamaindex
Connecting to GitHub models with Llamaindex:
from llama_index.llms.openai_like import OpenAILike
llm = OpenAILike(
model=os.environ["GITHUB_MODEL"],
api_base="https://models.inference.ai.azure.com",
api_key=os.environ["GITHUB_TOKEN"],
is_chat_model=True,
)
chat_msgs = [
ChatMessage(role=MessageRole.SYSTEM,
content="You are a helpful assistant that makes cat references."),
ChatMessage(role=MessageRole.USER,
content="Write a haiku about a hungry cat who wants tuna"),
]
response = llm.chat(chat_msgs)
llm = OpenAIModel(
model="gpt-4o",
base_url="https://models.inference.ai.azure.com",
api_key=os.environ["GITHUB_TOKEN"]
)
Features:
• Streaming
• Speech I/O
• Image upload*
Code:
aka.ms/chat-vision-app
User @bp.post("/chat/stream")
question async def chat_handler()
Model
Streamed
response
Transfer-Encoding:
Chunked
{"content": "He"}
{"content": "llo"}
{"content": "It's"}
{"content": "me"}
Streaming responses in Quart
@bp.post("/chat/stream")
async def chat_handler():
request_json = await request.get_json()
@stream_with_context
async def response_stream():
chat_coroutine = bp.openai_client.chat.completions.create(
model=os.environ["OPENAI_MODEL"],
messages=request_json["messages"],
stream=True,
)
async for event in await chat_coroutine:
event_dict = event.model_dump()
yield json.dumps(event_dict["choices"][0], ensure_ascii=False) + "\n"
return Response(response_stream())
blog.pamelafox.org/2023/09/best-practices-for-openai-chat-apps_1
6.html
Q: Why does the backend use Quart?
Many developers start with Flask, the most popular web framework.
blog.pamelafox.org/2023/09/best-practices-for-openai-chat-apps.ht
ml
A: Why does the backend use Quart?
Quart is the async version of Flask.
An async framework can handle new requests while waiting on I/O:
Async frameworks for Python
There are several popular
options:
Framework Example apps for Azure AI
Quart aka.ms/azai/chat
aka.ms/chat-vision-app
aka.ms/ragchat
FastAPI aka.ms/azai/fastapi
aka.ms/rag-postgres
Aiohttp aka.ms/voicerag/repo
Django with async
https://blog.pamelafox.org/2024/07/should-you-use-quart-or-fastapi-for-ai.html
Next steps 🧠 3/11: LLMs
Join upcoming streams! →
↖️ 3/13: Vector
embeddings
Come to office hours on
Thursdays in Discord: 🔍 3/18: RAG
aka.ms/pythonai/oh
3/20: Vision models
Get more Python AI 3/25: Structured outputs
resources 3/27: Quality & Safety
aka.ms/thesource/Python_A Register @ aka.ms/PythonAI/series
I
Thank you!