0% found this document useful (0 votes)
12 views

PythonAI LLMs ForSharing

The document provides an overview of using Large Language Models (LLMs) with Python, covering how LLMs work, model options, and methods to improve LLM output. It discusses various libraries and tools for integrating LLMs into applications, as well as API authentication and practical examples. Upcoming sessions on related topics are also mentioned, encouraging further exploration of Python and AI integration.

Uploaded by

38o69zmdn
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

PythonAI LLMs ForSharing

The document provides an overview of using Large Language Models (LLMs) with Python, covering how LLMs work, model options, and methods to improve LLM output. It discusses various libraries and tools for integrating LLMs into applications, as well as API authentication and practical examples. Upcoming sessions on related topics are also mentioned, encouraging further exploration of Python and AI integration.

Uploaded by

38o69zmdn
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 47

Python +

AI
Python + AI
🧠 3/11: LLMs
↖️ 3/13: Vector
embeddings
🔍 3/18: RAG
3/20: Vision models
3/25: Structured outputs
3/27: Quality
Register & Safety
aka.ms/PythonAI/serie
Python + AI
🧠 Large Language Models
Pamela Fox
Python Cloud Advocate
www.pamelafox.org
Today we'll cover...
• How LLMs work
• Model options
• Using LLMs from Python
• Improving LLM output
• LLM libraries
• Building LLM-powered apps
How LLMs work
You've probably used an LLM...

ChatGPT, GitHub Copilot, Bing Copilot, and many other products are
powered by LLMs.
LLMs: Large Language Models
An LLM is a machine learning model that is so large that it
achieves general-purpose language understanding &
generation.

Source: "Characterizing Emergent Phenomena in LLMs"


blog.research.google/2022/11/characterizing-emergent-phenomena-in.html
How language models work
Natural
language
input Pre-
processing
(Tokenization) Tokens

Natural
Model language
Get results Decoding + output
Post-processing

Probability
distribution
Pre-processing input: Tokenizaton

Input "Monarch butterflies lay eggs on "


:

Tokens:

Token 7088 1417 110255 15634 27226 402


IDs: 220

platform.openai.com/tokenizer
Model inputs and outputs

butterflie
Mon arch lay eggs on milk
s

n tokens in 1 tokens out


Expanding context window
Monarch butterflies lay
eggs on the
Monarch butterflies lay eggs
on the leaves
Monarch butterflies lay eggs on the
leaves of
Monarch butterflies lay eggs on the
leaves of
Monarch butterflies lay eggs on the mil
leaves of k
Monarch butterflies lay eggs on the weed
leaves of milk
Monarch butterflies lay eggs on the leaves of
milkweed plants
Monarch butterflies lay eggs on the leaves of .
milkweed plants.
Monarch butterflies lay eggs on the leaves of
milkweed plants.
Token selection
0.01 the

0.02 milk

0.00
plants
3
butterfli
Mon arch
es
lay eggs on 0.01
leaves the
3

0.07 groun
7 d 1 token
n tokens in

out

0.00
zux
6

Try it yourself with "The Transformer Explainer" p


poloclub.github.io/transformer-explainer/
LLMs are next token predictors
Natural
language
input Pre-
processing
(Tokenization) Tokens

Natural
Model language
Get results Decoding + output
Post-processing

Probability
distribution
Model options
Hosted models
Hosted LLMs can be accessed via API, from a company
hosting the model and infrastructure for you.
Host Models
GPT-3.5
OpenAI.com GPT-4
GPT-4o
OpenAI GPT models, Meta models,
Azure AI
Cohere, Mistral, DeepSeek,...

GitHub Models (FREE) Azure AI models


Google Gemini 1, 1.5
Anthropic Claude 3 family
Demo: GitHub Models
Any GitHub user can use the models and playgrounds,
for free!

github.com/marketplace/models
Local models
Local models have open weights and can run on personal machines.
SLMs ("Small Language Models") are smaller models, < 100 B paramete

Local model Popular SLMs:


runners:
• Llama 3 • Mistral
• Ollama series
• Llava
• Llamafile • Phi4 series
• Gemma
• LM Studio • DeepSeek-
R1
Demo: Ollama
Ollama is a tool for easily running local LLMs on your computer.
Using LLMs from Python
Python packages for Azure AI models
These packages directly support models hosted on Azure/GitHub:

Package Supported models


OpenAI.com models, Azure OpenAI models,
openai
OpenAI-compatible models

azure-ai-inference Azure OpenAI, GitHub Models, Azure AI models

azure-ai-agents Azure AI Agents service models

We'll discuss wrapper/orchestrator packages later.


OpenAI package
The official openai Python package is available on PyPI:

pip install openai

It includes support for chat completions, embeddings, and more.

The openai package is compatible with models hosted many


places:
•Openai.com account
•Azure OpenAI account
•GitHub models
•Local model with OpenAI-compatible API (Ollama/llamafile)
OpenAI demos repo
We'll use this repo today or demos:

github.com/pamelafox/python-openai-demos

Use these links to open it with your OpenAI host of choice:


• aka.ms/python-openai-github (FREE!)
• aka.ms/python-openai-openai
• aka.ms/python-openai-azure
• aka.ms/python-openai-ollama
API authentication for OpenAI hosts
For openai.com OpenAI, set your API key:
client = openai.OpenAI(api_key=os.environ["OPENAI_KEY"])

For Azure OpenAI (keyless auth), use Azure default


credentials:
token_provider = azure.identity.get_bearer_token_provider(
DefaultAzureCredential(),
"https://cognitiveservices.azure.com/.default"
)
client = openai.AzureOpenAI(
api_version=os.environ["AZURE_OPENAI_VERSION"],
azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],
azure_ad_token_provider=token_provider,
)
https://learn.microsoft.com/azure/developer/ai/keyless-connections
API authentication for OpenAI-like
APIs
For Ollama, set base URL to localhost and key to any value:
client = openai.OpenAI(
base_url="http://localhost:11434/v1",
api_key="nokeyneeded",
)

For GitHub models, set base URL to GitHub models host and
set key to PAT (personal access token):
client = openai.OpenAI(
base_url="https://models.inference.ai.azure.com",
api_key=os.environ["GITHUB_TOKEN"])

👉🏽 In GitHub Codespaces, GITHUB_TOKEN will always store a PAT. If running locally,


create a new PAT.
Call the Chat Completion API
response = client.chat.completions.create(
model="gpt-4o",
temperature=0.7,
max_tokens=30,
n=1,
messages=[
{"role": "system", "content": "You are a helpful assistant
that makes lots of cat references and uses emojis."},
{"role": "user", "content": "Write a haiku about a hungry
cat who wants tuna"}
])

print(response.choices[0].message.content)

Full example: chat.py


Stream the response

completion = client.chat.completions.create(
model=MODEL_NAME,
stream=True,
messages=[
{"role": "system", "content": "You are a helpful assistant
that makes lots of cat references and uses emojis."},
{"role": "user", "content": "please write a haiku about a
hungry cat that wants tuna"},
])

for event in completion:


print(event.choices[0].delta.content, end="", flush=True)

Full example: chat_stream.py


LLMs: Pros and Cons
Pros:
• Creative 😊
• Great with patterns
• Good at syntax (natural and programming)

Cons:
• Creative 😖
• Makes stuff up (unknowingly)
• Limited context window (4K-128K)
• More tokens = more $, more time
Improving LLM output
Ways to improve LLM output
• Prompt engineering: Request a specific tone and format
• Few-shot examples: Demonstrate desired output format
• Chained calls: Get the LLM to reflect, slow down, break it down
😊 Covering today!

• Retrieval Augmented Generation (RAG): Supply just-in-time


context
🔍 Covering on 3/18!

• Function calling & structured outputs


Covering on 3/25!

• Fine tuning: Teach LLM new facts/syntax by permanently


altering weights
Prompt engineering
The first message sent to the model is called the "system
message" or "system prompt". Use it for overall guidance
and formatting rules.
response = client.chat.completions.create(
model=MODEL_NAME,
temperature=0.7,
messages=[
{"role": "system", "content": "Respond like Yoda"},
{"role": "user", "content": "What is an LLM?"},
]
)

Full example: prompt_engineering.py


Few-shot examples
Another way to guide a language model is to provide "few shots", a
sequence of example question/answers that demonstrate how it
should respond.
response = client.chat.completions.create(model=MODEL_NAME,
messages=[
{"role": "system", "content": "You're a tutor that gives clues, not answers."},
{"role": "user", "content": "What is the capital of France?"},
{"role": "assistant", "content": "Can you think of city known for the Eiffel
Tower?"},
{"role": "user", "content": "What is the square root of 144?"},
{"role": "assistant", "content": "What number multiplied by itself equals
144?"},
{"role": "user", "content": "What is the atomic number of oxygen?"},
{"role": "assistant", "content": "How many protons does an oxygen atom have?"},
{"role": "user", "content": "What's largest planet in solar system?"},
Full example: few_shot_examples.py
])
Chained calls
response = client.chat.completions.create(model=MODEL_NAME,
messages=[{"role": "user",
"content": "Explain how LLMs work in a single paragraph."}])
explanation = response.choices[0].message.content

response = client.chat.completions.create(model=MODEL_NAME,
messages=[{"role": "user",
"content": f"You are an experienced editor. Review the following
explanation and provide detailed feedback on clarity, coherence, and
engagement.\n Explanation: \n {explanation}"}])
feedback = response.choices[0].message.content

response = client.chat.completions.create(model=MODEL_NAME,
messages=[{"role": "user",
"content": f"Revise the article using the following feedback, but keep it
to a single paragraph. \n Explanation:\n {explanation} \n\n Feedback:\
n{feedback}"}])
final_article = response.choices[0].message.content
Full example: chained_calls.py
Popular LLM libraries
LLM libraries
Many Python packages offer a layer of abstraction for working with LLMs:
• Langchain: Orchestration
• Llamaindex: Orchestration for RAG and agents
• PydanticAI: Production-grade AI application development
• Semantic Kernel: Orchestration
• Autogen: Orchestration for agentic flows
• Litellm: Wrapper over multiple hosts

...and many more!


Langchain
Connecting to GitHub models with Langchain:
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
model_name=os.environ["GITHUB_MODEL"],
openai_api_base="https://models.inference.ai.azure.com",
openai_api_key=os.environ["GITHUB_TOKEN"]
)
prompt = ChatPromptTemplate.from_messages(
[("system", "You're a helpful assistant that makes cat
references."),
("user", "{input}")]
)
chain = prompt | llm
response = chain.invoke(
{"input": "write a haiku about a hungry cat that wants tuna"})
Full example: chat_langchain.py
Llamaindex
Connecting to GitHub models with Llamaindex:
from llama_index.llms.openai_like import OpenAILike

llm = OpenAILike(
model=os.environ["GITHUB_MODEL"],
api_base="https://models.inference.ai.azure.com",
api_key=os.environ["GITHUB_TOKEN"],
is_chat_model=True,
)
chat_msgs = [
ChatMessage(role=MessageRole.SYSTEM,
content="You are a helpful assistant that makes cat references."),
ChatMessage(role=MessageRole.USER,
content="Write a haiku about a hungry cat who wants tuna"),
]
response = llm.chat(chat_msgs)

Full example: chat_llamaindex.py


Pydantic AI
Connecting to GitHub models with Pydantic AI:
from pydantic_ai.models.openai import OpenAIModel

llm = OpenAIModel(
model="gpt-4o",
base_url="https://models.inference.ai.azure.com",
api_key=os.environ["GITHUB_TOKEN"]
)

agent = Agent(model, system_prompt="Be concise: 1 sentence


only."))

result = agent.run_sync("Where does 'hello world' come from?")

Full example: chat_pydanticai.py


Choosing an LLM library
Consider:
• Does it support every LLM/host that you need?
• Does it support the features you need (streaming, tools, etc)?
• How easy is it to use and debug?
• Are there any security issues, like SQL injection or secret leaking?
• Is it actively maintained?
• Are the trade-offs worth the benefit?
Building LLM-powered
apps
Chat + vision app
Supported model hosts:
• Azure OpenAI
• GitHub Models

Features:
• Streaming
• Speech I/O
• Image upload*

Code:
aka.ms/chat-vision-app

We'll dive deep into vision models


in the 3/20 session!
App architecture
Frontend Python backend
(HTML, JavaScript) (Quart, Uvicorn)

User @bp.post("/chat/stream")
question async def chat_handler()

Model

Streamed
response
Transfer-Encoding:
Chunked
{"content": "He"}
{"content": "llo"}
{"content": "It's"}
{"content": "me"}
Streaming responses in Quart
@bp.post("/chat/stream")
async def chat_handler():
request_json = await request.get_json()

@stream_with_context
async def response_stream():

chat_coroutine = bp.openai_client.chat.completions.create(
model=os.environ["OPENAI_MODEL"],
messages=request_json["messages"],
stream=True,
)
async for event in await chat_coroutine:
event_dict = event.model_dump()
yield json.dumps(event_dict["choices"][0], ensure_ascii=False) + "\n"

return Response(response_stream())

blog.pamelafox.org/2023/09/best-practices-for-openai-chat-apps_1
6.html
Q: Why does the backend use Quart?
Many developers start with Flask, the most popular web framework.

A synchronous framework (like Flask) can only handle 1 request per


worker:

blog.pamelafox.org/2023/09/best-practices-for-openai-chat-apps.ht
ml
A: Why does the backend use Quart?
Quart is the async version of Flask.
An async framework can handle new requests while waiting on I/O:
Async frameworks for Python
There are several popular
options:
Framework Example apps for Azure AI
Quart aka.ms/azai/chat
aka.ms/chat-vision-app
aka.ms/ragchat
FastAPI aka.ms/azai/fastapi
aka.ms/rag-postgres
Aiohttp aka.ms/voicerag/repo
Django with async

https://blog.pamelafox.org/2024/07/should-you-use-quart-or-fastapi-for-ai.html
Next steps 🧠 3/11: LLMs
Join upcoming streams! →
↖️ 3/13: Vector
embeddings
Come to office hours on
Thursdays in Discord: 🔍 3/18: RAG
aka.ms/pythonai/oh
3/20: Vision models
Get more Python AI 3/25: Structured outputs
resources 3/27: Quality & Safety
aka.ms/thesource/Python_A Register @ aka.ms/PythonAI/series
I
Thank you!

You might also like