0% found this document useful (0 votes)

16 views41 pages

PythonAI VisionModels ForSharing

The document discusses the integration of Python with AI, focusing on multimodal large language models (LLMs) that can process text and images. It covers popular use cases, methods for sending images, and the implementation of vision models in applications. Additionally, it provides resources for building apps and using Azure AI services for tasks like vector embeddings and retrieval-augmented generation (RAG).

Uploaded by

38o69zmdn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views41 pages

PythonAI VisionModels ForSharing

Uploaded by

38o69zmdn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 41

Python +

AI
Python + AI
🧠 3/11: LLMs
↖️ 3/13: Vector
embeddings
🔍 3/18: RAG
3/20: Vision models
3/25: Structured outputs
3/27:
RegisterQuality & Safety
aka.ms/PythonAI/serie
@s
Catch up aka.ms/PythonAI/recordings
Python + AI
Vision Models
Pamela Fox
Python Cloud Advocate
www.pamelafox.org
Today we'll cover...
• Multimodal LLMs
• Popular use cases
• Chat with uploaded images
• Multimodal embedding models
• RAG with vision models
Want to follow along?
1. Open this GitHub repository:
https://github.com/Azure-Samples/openai-chat-vision-qu
ickstart
2. Use "Code" button to create a GitHub Codespace:

3. Wait a few minutes for Codespace to start up

Multimodal LLMs
What's a multimodal LLM?
In addition to text, a multimodal LLM can accept images, sometimes
video/audio. 85417
Natural 139521
language 7088 1417 110255 15634 27226 402 49243 98213 Image
input Tokenizati 220 61745
Encodin
on Tokens
175938
gs

Natural
Model language
Decoding + output
Post-processing

Probability
distribution

https://magazine.sebastianraschka.com/p/understanding-multimodal-ll
Multimodal LLMs on Azure/GitHub
Creator Models How to access?

OpenAI GPT-4o, Azure OpenAI,

GPT-4o-mini GitHub Models

Microsoft Phi3.5-vision, Azure AI Services,

Phi4-multimodal- GitHub Models
instruct
Meta Llama-3.2-Vision- Azure AI Services,
instruct GitHub Models
https://azure.microsoft.com/products/ai-services/openai-service
https://github.com/marketplace/mod
els
https://ai.azure.com/
Sending images with OpenAI SDK in
Python
messages = [{
"role": "user",
"content": [
{"type": "text",
"text": "Who is pictured in this thumbnail?",
},
{"type": "image_url",
"image_url": {"url": "https://i.ytimg.com/vi/toR644E--w8/hq720.jpg"}
}],
}
]
response = openai_client.chat.completions.create(
model="gpt-4o",
messages=messages
)

https://aka.ms/chat-vision-app : notebooks/chat_vision.ipynb
Sending images with base64-encoded
URI
def open_image_as_base64(filename):
with open(filename, "rb") as image_file:
image_data = image_file.read()
image_base64 = base64.b64encode(image_data).decode("utf-8")
return f"data:image/png;base64,{image_base64}"

messages = [{
"role": "user",
"content": [
{"type": "text",
"text": "What animal is pictured?.",
},
{"type": "image_url",
"image_url": {"url": open_image_as_base64("ur.png")}
}],
}
]

https://aka.ms/chat-vision-app : notebooks/chat_vision.ipynb
Popular use cases
Accessibility
• Suggest alternative text for images
• Example: Accessibility check for Powerpoint
• Provide assistance for vision-impaired users
• Example: Be my eyes mobile app

More efficient business processes

• Insurance claims: Flag suspicious claims
• Data analysis: Generate insights based on graphs and tables
• Customer support: Answer questions about product images

https://aka.ms/chat-vision-app :
Sending documents that aren't
images (yet)
Most vision models can only handle JPEG, PNG, and static GIF files,
but you may have visuals in non-image documents like PDFs.

Approaches:

1. Identify image part of document, crop, and save as image

https://aka.ms/document-intelligence-figure-extraction
2. Convert entire document to an image (See next slide)

You can use either Python libraries or hosted Azure services.

Converting PDFs to images
PDF
import pymupdf
from PIL import Image

filename = "plants.pdf"
doc = pymupdf.open(filename)
for i in range(doc.page_count): pymupdf
doc = pymupdf.open(filename)
page = doc.load_page(i)
pix = page.get_pixmap()
original_img = Image.frombytes("RGB",
[pix.width, pix.height],
pix.samples) pillow
original_img.save(f"page_{i}.png")
https://aka.ms/chat-vision-app :
Multimodal LLMs for OCR?
OCR = "Optical Character Recognition"

An OCR tool can extract text from an image, including handwritten

text, using a machine learning model trained specifically for the
task.
Many multimodal LLMs can be used for OCR, but:
• A multimodal LLM is generative, so it can hallucinate text that
isn't there
• An OCR tool is only extractive, so it can make mistakes but not
hallucinate

For OCR, also consider Azure AI OCR and Azure Document

Intelligence.
Building apps with vision
models
Open-source template: chat with
vision
Azure OpenAI with gpt-4o
Python backend (Quart)

Repo:
https://aka.ms/chat-vision-app

Demo:
https://aka.ms/chat-vision-app/d
emo
Chat with vision: Flow
Those appear to be
crocodiles, based
Alligators or on their V-shaped
crocodiles? snouts.

Backen LLM
d

User
Question
Chat with vision: App architecture
Frontend Python backend
(HTML, JavaScript) (Quart, Uvicorn)
@bp.post("/chat/stream")
async def chat_handler()
base64 image
+
user
question Model

Streamed
response
Transfer-Encoding:
Chunked
{"content": "He"}
{"content": "llo"}
{"content": "It's"}
{"content": "me"}
Encoding images in frontend
(Simplified)
const toBase64 = file => new Promise((resolve, reject) => {
const reader = new FileReader();
reader.readAsDataURL(file);
reader.onload = () => resolve(reader.result);
reader.onerror = reject;
});

form.addEventListener("submit", async function(e) {

const file = document.getElementById("file").files[0];
const fileData = file ? await toBase64(file) : null;

// Get all messages and send with file to backend

const result = await client.getStreamedCompletion(messages, {
context: {file: fileData, file_name: file ? file.name : null}
});

// ...

https://aka.ms/chat-vision-app :
Handling images in backend
(Simplified)
@bp.post("/chat/stream")
async def chat_handler():
request_json = await request.get_json()
request_messages = request_json["messages"]
image = request_json["context"]["file"]
all_messages = request_messages[0:-1]
if image:
all_messages.append(
{"role": "user",
"content": [
{"text": request_messages[-1]["content"], "type": "text"},
{"image_url": {"url": image, "detail": "auto"}, "type": "image_url"}]})
else:
all_messages.append(request_messages[-1])

chat_coroutine = bp.openai_client.chat.completions.create(
model=os.environ["OPENAI_MODEL"], messages=all_messages)

https://aka.ms/chat-vision-app : src/quartapp/chat.py
Alternate ways to handle image
upload
• Send POST request from the frontend using multipart
form data https://aka.ms/chat-vision-multipart

• Upload images separately:

1. Store image in file storage (e.g. Azure Blob storage)
2. Send down file identifier to frontend
3. Frontend sends file identifier back to backend
⚠️Make sure other users can't access each other's files!
Multimodal embedding
models
Want to follow along?
1. Open this GitHub repository:
https://github.com/pamelafox/vector-embeddings-demo
s
2. Use "Code" button to create a GitHub Codespace:

3. Wait a few minutes for Codespace to start up

Azure AI Vision: Multimodal
embeddings API
Use Azure AI Vision API to generate embeddings from the Florence
model.
[
3.3652344,
/vectorizeImage 0.8413086,
1.2783203,
...],

[
"a beach-themed
-0.027022313,
tealight candle /
-0.011945606,
holder" vectorizeText
0.019690325,
...],
Azure AI Vision: Calling API from
Python
def get_image_embedding(image_file):
mimetype = mimetypes.guess_type(image_file)[0]
url = f"{AZURE_AI_VISION_URL}:vectorizeImage"
headers = get_auth_headers()
headers["Content-Type"] = mimetype
response = requests.post(url, headers=headers,
params=get_model_params(), data=open(image_file, "rb"))
return response.json()["vector"]

def get_text_embedding(text):
url = f"{AZURE_AI_VISION_URL}:vectorizeText"
response = requests.post(url, headers=get_auth_headers(),
params=get_model_params(), json={"text": text})
return response.json()["vector"]

Notebook: multimodal_vectors.ipynb
Vector search with multimodal
embeddings

/vectorizeImage Vector search

alligator /vectorizeText Vector search

Open source template: Image search
Azure AI Vision +
Azure AI Search

Code:
https://aka.ms/aisearch-images-ap

Demo:
https://aka.ms/aisearch-images-app/d
o
RAG with vision models
Open-source template: RAG with vision
support
Azure OpenAI +
Azure AI Search +
Azure AI Vision

Main repo:
https://aka.ms/ragchat

Setup guide:
https://aka.ms/ragchat/visi
on

Demo:
https://aka.ms/ragchat/vision/de
mo
Enable "GPT vision" in Settings
RAG with vision: Flow
Yes, there is a correlation
Is there any correlation between oil prices and
between oil prices and stock stock market trends2
market trends?

“Is there any…” “Is there any…”

[[0.0014615238
“Is there any…” , -0.015594152,
OpenAI -0.0072768144, Financial Market
-0.012787478, Analysis 2023-
text …] 6.png

User embedding Azure AI This section

examines the OpenAI
Search
correlations

Question between stock

indices,
gpt-4o
cryptocurrency
[[0.0021338, - prices...
“Is there any…”
0.01123152, -
0.0238144, -
0.0123478,…]
AI vision API
/vectorizeText
Data ingestion with vision models
These steps are done in addition to standard textual ingestion steps.

Azure Blob Azure AI Azure

Python Storage Vision AI
Search
Split PDFs Upload Vectorize Index images
into pages images images
• Store embedding in
imageEmbedding
Necessary for
& rendering
field
• Store image
citations in filename in
Generate the app UI sourcepage field

image of each
page
Example ingestion: Generate page
images
In order for the model to provide an answer with
citations, we must bake the filename into the image:

draw = ImageDraw.Draw(new_img)
text = f"SourceFileName:
{blob_name}"
draw.text((10, 10), text)
https://aka.ms/ragchat: blobmanager.py
Example ingestion: Store page
images
https://
stfvid7hrxoifmi.blob.core.windows.
net/content/Keystone-Plant-Signs-
Sunflower-8.5x11-1.png?st=2025-
01-
Azure Blob
22T21%3A30%3A27Z&se=2025-
Storage 01-
23T21%3Asdf30%3A27Z&sp=r&sv
=2024-08-
04&sr=b&skoid=c589476e-
841sdfs7-4bb9-bc33-
output = io.BytesIO() 15d1eb1c2503&sktid=c37ef95c-
new_img.save(output, format="PNG") b0dsfcd-4200-a919-
output.seek(0) 2e04c2f8c2cfsd

container_client.upload_blob(
blob_name, output)
https://aka.ms/ragchat: blobmanager.py
Example ingestion: Vectorize image
https://
stfvid7hrxoifmi.blob.core.windows.ne
t/content/Keystone-Plant-Signs- [-3.203125,
Sunflower-8.5x11-1.png?st=2025- 1.5576172 ...
01-22T21%3A30%3A27Z&se=2025-
01- Azure AI
+1022 more]
23T21%3Asdf30%3A27Z&sp=r&sv=
Vision
2024-08-04&sr=b&skoid=c589476e-
841sdfs7-4bb9-bc33-
15d1eb1c2503&sktid=c37ef95c-
b0dsfcd-4200-a919-2e04c2f8c2cfsd
endpoint = urljoin(self.endpoint, "computervision/retrieval:vectorizeImage")
embeddings: List[List[float]] = []
async with aiohttp.ClientSession(headers=headers) as session:
for blob_url in blob_urls:
body = {"url": blob_url}
async with session.post(url=endpoint, params=params, json=body) as resp:
resp_json = await resp.json()
embeddings.append(resp_json["vector"])

https://aka.ms/ragchat: embeddings.py
Example ingestion: Index image &
text
{
"id": "file-Financial_Market_Analysis_Report_2023_pdf-
46696E616E6369616C204D61726B657420416E616C7973697320526
5706F727420323032332E706466-page-2",
"content": "Cryptocurrency Market Dynamics\nPrice
Azure Fluctuations of Bitcoin and Ethereum (Last 12 Months)\
n47500\n45000\n42500\n40000\n37500\n35000\n32500\
AI n30000\n27500\n25000\n22500\n20000\n17500\n15000\
Search n12500\n10000\n7500\n5000\n2500\n0\nJan\nFeb\nMar\nApr\
nMay\nJun\nJul\nAug\nSep\nOct\nNov\nDec\n-Bitcoin -
Ethereum\nCryptocurrencies have emerged as a new asset
class, captivating investors with their potential for
high returns and their role in the future of finance.
This section explores the price dynamics of major
cryptocurrencies like Bitcoin and Ethereum...",
"embedding": "[-0.009356536, -0.035459142 ...+1534
more]",
"imageEmbedding": "[-0.94384766, 1.5185547 ...+1022
more]",
"sourcepage": "Financial Market Analysis Report 2023-
4.png",
"sourcefile": "Financial Market Analysis Report
https://aka.ms/ragchat: searchmanager.py 2023.pdf"
}
RAG with vision: Flow (revisited)
Yes, there is a correlation
Is there any correlation between oil prices and
between oil prices and stock stock market trends2
market trends?

“Is there any…” “Is there any…”

[[0.0014615238
“Is there any…” , -0.015594152,
OpenAI -0.0072768144, Financial Market
-0.012787478, Analysis 2023-
text …] 6.png

User embedding Azure AI This section

examines the OpenAI
Search
correlations

Question between stock

indices,
gpt-4o
cryptocurrency
[[0.0021338, - prices...
“Is there any…”
0.01123152, -
0.0238144, -
0.0123478,…]
AI vision API
/vectorizeText
RAG with vision: System prompt
You are an intelligent assistant helping analyze the Annual Financial Report
of Contoso Ltd. The documents contain text, graphs, tables and images.
Each image source has the file name in the top left corner of the image with
coordinates (10,10) pixels and is in the format SourceFileName:<file_name>
Each text source starts in a new line and has the file name followed by
colon and the actual information
Always include the source name from the image or text for each fact you use
in the response in the format: [filename]
Answer the following question using only the data provided in the sources
below.
If asking a clarifying question to the user would help, ask the question.
Be brief in your answers.
The text and image source can be the same file name, don't use the image
title when citing the image source, only use the file name as mentioned.
If you cannot answer using the sources below, say you don't know. Return
just the answer without any input texts.
RAG with vision: Chat completion
messages
Send a multi-part user message with both text and images:

{ "role": "user",
"content": [
{ "type": "text", "text": "how have bitcoin and ethereum been
fluctuating?" },
{ "type": "image_url",
"image_url": { "url":
"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgA..."} },
{ "type": "image_url",
"image_url": { "url":
"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAA..."} },
{ "type": "image_url",
"image_url": { "url":
"data:image/png;base64,iVBORw0KGgoAAAAA8AAAAJECAI..."} },
{ "type": "text",
"text": "Sources:\n\nFinancial Market Analysis Report 2023-3.png:
Commodities The global financial market is a vast and intricate network of
exchanges, instruments, and assets, ranging from traditional stocks and bonds
RAG with vision: Considerations
👍🏼 Benefits
• The search step will find anything that semantically matches either the
text of the document or or the images in the document.
• The model will have access to the full image at inference-time, so they can
reference details of the image that aren't in the text.

👎🏼 Drawbacks
• Increased latency and cost during RAG flow (extra vector search, more
tokens)
• Limits your model choice to only those with multimodal support

💡 Learn more
• Read Pamela's blog post on the process: https://aka.ms/ragchat/vision/blog
• Watch Pamela's talk on Multimedia RAG: https://aka.ms/ragdeepdive/watch
Next steps 🧠 3/11: LLMs
Join upcoming streams! →
↖️ 3/13: Vector
Come to office hours on
Thursdays in Discord embeddings
aka.ms/pythonai/oh 🔍 3/18: RAG
Sign up for AI Agents 3/20: Vision models
Hackathon
aka.ms/agentshack
3/25: Structured outputs
3/27: @
Register Quality & Safety
aka.ms/PythonAI/series
Get more Python AI resources
Catch up aka.ms/PythonAI/recordings
aka.ms/thesource/Python_AI @
Thank you!

Introduction To Computer Questions and Answers PDF - 1
100% (1)
Introduction To Computer Questions and Answers PDF - 1
15 pages
Mathworks - Yann Debray - GPT-4o
No ratings yet
Mathworks - Yann Debray - GPT-4o
17 pages
Vision - OpenAI API
No ratings yet
Vision - OpenAI API
8 pages
Images and Vision - OpenAI API
No ratings yet
Images and Vision - OpenAI API
8 pages
DAY2 Lap2
No ratings yet
DAY2 Lap2
13 pages
ML Interview Ke Pehle Padhna Hai
No ratings yet
ML Interview Ke Pehle Padhna Hai
59 pages
Build App With ChatGPT
100% (1)
Build App With ChatGPT
96 pages
Icrcct24 001
No ratings yet
Icrcct24 001
6 pages
Create AI Model Guide
No ratings yet
Create AI Model Guide
14 pages
Examplee
No ratings yet
Examplee
8 pages
5 Multimodal AI Models That Are Actually Open Source - The New Stack
No ratings yet
5 Multimodal AI Models That Are Actually Open Source - The New Stack
10 pages
Ch5 - A Snapchat-Like AR Filter On Android - Touched HH
No ratings yet
Ch5 - A Snapchat-Like AR Filter On Android - Touched HH
26 pages
PythonAI RAG ForSharing
No ratings yet
PythonAI RAG ForSharing
47 pages
GenAI Pjs
No ratings yet
GenAI Pjs
2 pages
Deep Learning With PyTorch
No ratings yet
Deep Learning With PyTorch
19 pages
LLaVA - Large Multimodal Model
No ratings yet
LLaVA - Large Multimodal Model
15 pages
Own Your AI - Tech Deck
No ratings yet
Own Your AI - Tech Deck
75 pages
Flowise AI Tutorial #3 File Loaders, Text Splitters, Embeddings & Vector Stores
No ratings yet
Flowise AI Tutorial #3 File Loaders, Text Splitters, Embeddings & Vector Stores
3 pages
AI Trends of May 2023 You Need To Know by Gonzalo Recio Medium
No ratings yet
AI Trends of May 2023 You Need To Know by Gonzalo Recio Medium
1 page
Unlocking Rapid Data Extraction: Groq + OCR and Claude Vision - by Júlio Almeida - Python in Plain E
No ratings yet
Unlocking Rapid Data Extraction: Groq + OCR and Claude Vision - by Júlio Almeida - Python in Plain E
17 pages
Academic Research Assistance 1716570959
No ratings yet
Academic Research Assistance 1716570959
13 pages
Exploring HuggingFace
No ratings yet
Exploring HuggingFace
16 pages
UNIT VI Gen-AI ASP Notes
No ratings yet
UNIT VI Gen-AI ASP Notes
11 pages
Azure OpenAI BYOD
No ratings yet
Azure OpenAI BYOD
6 pages
Generative AI Mini Projects
No ratings yet
Generative AI Mini Projects
39 pages
ChatGPT LLM Website and AI Python Guide
No ratings yet
ChatGPT LLM Website and AI Python Guide
3 pages
Chapter 1. An Introduction To Generative Media: A Note For Early Release Readers
No ratings yet
Chapter 1. An Introduction To Generative Media: A Note For Early Release Readers
17 pages
Chat GPT Is Not All You Need Paper Review
No ratings yet
Chat GPT Is Not All You Need Paper Review
31 pages
Document Classification With LayoutLMv3
No ratings yet
Document Classification With LayoutLMv3
25 pages
Azure Cognitive Services Openai PDF
No ratings yet
Azure Cognitive Services Openai PDF
246 pages
19L038 - Deep Learning - Assignment Presentation
No ratings yet
19L038 - Deep Learning - Assignment Presentation
24 pages
Chatbot Systems For Document Interaction
No ratings yet
Chatbot Systems For Document Interaction
3 pages
PythonAI LLMs ForSharing
100% (2)
PythonAI LLMs ForSharing
47 pages
Mini Project Docubot Power Point
No ratings yet
Mini Project Docubot Power Point
17 pages
Video Api Endpoint N
No ratings yet
Video Api Endpoint N
7 pages
Creating A Programming Language From Scratch Specifically Designed To Create and Manage AI Documents
No ratings yet
Creating A Programming Language From Scratch Specifically Designed To Create and Manage AI Documents
3 pages
Gen AI Content
No ratings yet
Gen AI Content
7 pages
DocuMorph AI Project Cloud 100 Page Formatter
No ratings yet
DocuMorph AI Project Cloud 100 Page Formatter
6 pages
Exhibit 20
No ratings yet
Exhibit 20
22 pages
Projects
No ratings yet
Projects
8 pages
Week 3 Day 1
No ratings yet
Week 3 Day 1
8 pages
Langchain App Design
No ratings yet
Langchain App Design
7 pages
Ollama Sensemakers Meetup 15may
No ratings yet
Ollama Sensemakers Meetup 15may
40 pages
✅ Bigthinx Interview Prep Sheet
No ratings yet
✅ Bigthinx Interview Prep Sheet
33 pages
Unlocking The Conversion of Web Screenshots Into HTML Code With The WebSight Dataset
No ratings yet
Unlocking The Conversion of Web Screenshots Into HTML Code With The WebSight Dataset
9 pages
Introduction
No ratings yet
Introduction
17 pages
Fine-Tuned Vs RAG Short Notes ?
No ratings yet
Fine-Tuned Vs RAG Short Notes ?
25 pages
Lohitha Paper
No ratings yet
Lohitha Paper
4 pages
Generative AI For Business With Microsoft Azure Open AI Program
100% (1)
Generative AI For Business With Microsoft Azure Open AI Program
17 pages
Week 3 Day 2
No ratings yet
Week 3 Day 2
6 pages
Building RAG Apps
No ratings yet
Building RAG Apps
32 pages
Microsoft Azure AI Certified AI-900 - Yatharth Chauhan
No ratings yet
Microsoft Azure AI Certified AI-900 - Yatharth Chauhan
28 pages
Ai-900 Whi Zcar D: Quick Exam Reference - Hand-Picked For You
No ratings yet
Ai-900 Whi Zcar D: Quick Exam Reference - Hand-Picked For You
5 pages
How I Built A Basic RAG For PDF QA in A Few Lines of Python Code - by DR Julija - Medium
No ratings yet
How I Built A Basic RAG For PDF QA in A Few Lines of Python Code - by DR Julija - Medium
8 pages
Hands-On Lab - CU
No ratings yet
Hands-On Lab - CU
4 pages
CV NguyenVanTuan
No ratings yet
CV NguyenVanTuan
3 pages
Research Project
No ratings yet
Research Project
5 pages
Session 16 Building Application Using Gen AI - Case Studies
No ratings yet
Session 16 Building Application Using Gen AI - Case Studies
27 pages
BCC 1 NIELIT Basic Computer Course Content
No ratings yet
BCC 1 NIELIT Basic Computer Course Content
141 pages
Bonetown System Requirements
No ratings yet
Bonetown System Requirements
18 pages
IGCSE ICT Chapter 2 Input and Output Devices - ICT WITH TATCHEN
No ratings yet
IGCSE ICT Chapter 2 Input and Output Devices - ICT WITH TATCHEN
39 pages
AI For Assistive Technology in Digital Learning Toolkit
No ratings yet
AI For Assistive Technology in Digital Learning Toolkit
15 pages
Mini-System Test Case
No ratings yet
Mini-System Test Case
5 pages
(Amir Hussain Shah) (Amir Hussain Shah) (Amir Hussain Shah) : Course Code Tutor Address Tutor Address Tutor Address
No ratings yet
(Amir Hussain Shah) (Amir Hussain Shah) (Amir Hussain Shah) : Course Code Tutor Address Tutor Address Tutor Address
25 pages
VF 4 Zs 3 Udc BX FCo TI
No ratings yet
VF 4 Zs 3 Udc BX FCo TI
15 pages
11th Computer Science Volume 1 New School Books Download English Medium
No ratings yet
11th Computer Science Volume 1 New School Books Download English Medium
168 pages
Opentext Information Extraction Service Ies Ebook en
No ratings yet
Opentext Information Extraction Service Ies Ebook en
8 pages
PC Hardware
No ratings yet
PC Hardware
39 pages
CYDOCS Brochure
No ratings yet
CYDOCS Brochure
8 pages
Vaidhi Ayush Gurkirat Jatin Project Synopsis Format
No ratings yet
Vaidhi Ayush Gurkirat Jatin Project Synopsis Format
6 pages
Module1 Notes
No ratings yet
Module1 Notes
54 pages
Selected Topics in Computer Science CH
No ratings yet
Selected Topics in Computer Science CH
24 pages
ASCII Chart
No ratings yet
ASCII Chart
3 pages
Syncfusion File Formats
No ratings yet
Syncfusion File Formats
229 pages
Rais12 SM CH03
No ratings yet
Rais12 SM CH03
46 pages
T&L Industry Guide English
No ratings yet
T&L Industry Guide English
20 pages
Ict Notes
100% (3)
Ict Notes
63 pages
Abbreviation
No ratings yet
Abbreviation
21 pages
Computer Fundamental Notes: Download Practice MCQ
No ratings yet
Computer Fundamental Notes: Download Practice MCQ
91 pages
Cappe 3 E
No ratings yet
Cappe 3 E
86 pages
HP Color Laserjet Managed MFP E87660 Series
No ratings yet
HP Color Laserjet Managed MFP E87660 Series
5 pages
Deep Learning The Indus Script (Satish Palaniappan & Ronojoy Adhikari, 2017)
No ratings yet
Deep Learning The Indus Script (Satish Palaniappan & Ronojoy Adhikari, 2017)
19 pages
Fox It Phantom 21 Manual
No ratings yet
Fox It Phantom 21 Manual
220 pages
Dms Link 4.0 For Enterprise: Automate Paper-Intensive Workflows
No ratings yet
Dms Link 4.0 For Enterprise: Automate Paper-Intensive Workflows
2 pages
Konica Minolta Bizhub C550i Brochure
No ratings yet
Konica Minolta Bizhub C550i Brochure
4 pages
PC Packages - I 16CACCA1A I Bcom Ca: Study Material Prepared by .
No ratings yet
PC Packages - I 16CACCA1A I Bcom Ca: Study Material Prepared by .
111 pages
AI Summit Workshop Slides
No ratings yet
AI Summit Workshop Slides
22 pages

PythonAI VisionModels ForSharing

Uploaded by

PythonAI VisionModels ForSharing

Uploaded by

Python +

3. Wait a few minutes for Codespace to start up

OpenAI GPT-4o, Azure OpenAI,

Microsoft Phi3.5-vision, Azure AI Services,

More efficient business processes

1. Identify image part of document, crop, and save as image

You can use either Python libraries or hosted Azure services.

An OCR tool can extract text from an image, including handwritten

For OCR, also consider Azure AI OCR and Azure Document

form.addEventListener("submit", async function(e) {

// Get all messages and send with file to backend

• Upload images separately:

3. Wait a few minutes for Codespace to start up

/vectorizeImage Vector search

alligator /vectorizeText Vector search

“Is there any…” “Is there any…”

User embedding Azure AI This section

Question between stock

Azure Blob Azure AI Azure

“Is there any…” “Is there any…”

User embedding Azure AI This section

Question between stock

You might also like