0% found this document useful (0 votes)

10 views46 pages

PythonAI-VectorEmbeddingsForSharing

The document provides an overview of vector embeddings, including their generation, similarity calculations, and applications in AI. It discusses various distance metrics, quantization methods, and the importance of vector search in tasks like recommendation systems and fraud detection. Additionally, it highlights the use of different embedding models and their respective dimensions, along with practical coding examples for implementation.

Uploaded by

38o69zmdn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views46 pages

PythonAI-VectorEmbeddingsForSharing

Uploaded by

38o69zmdn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 46

Python +

AI
Python + AI
🧠 3/11: LLMs
↖️ 3/13: Vector
embeddings
🔍 3/18: RAG
3/20: Vision models
3/25: Structured outputs
3/27: Quality
Register & Safety
aka.ms/PythonAI/serie
Python + AI
↖️Vector embeddings
Pamela Fox
Python Cloud Advocate
www.pamelafox.org
Today we'll cover...
• What are vector embeddings?
• Vector similarity space
• Vector search
• Vector distance metrics
• Vector quantization
• Dimension reduction
Vector embeddings 101
Want to follow along?
1. Open this GitHub repository:
https://github.com/pamelafox/vector-embeddings-demo
s
2. Use "Code" button to create a GitHub Codespace:

3. Wait a few minutes for Codespace to start up

Vector embeddings
An embedding encodes an input as a list of floating-point numbers.
"dog" → [0.017198, -0.007493, -0.057982,…]

Different embedding models output different embeddings, with varying

lengths.
Vector
Embedding model Encodes MTEB Avg.
length
word2vec words 300
text (up to ~400
Sbert (Sentence-Transformers) 768
words)
text (up to 8191
OpenAI text-embedding-ada-00 1536 61.0%
2 tokens)
text (up to 8191
OpenAI text-embedding-3-small 256 - 1536 62.3%
tokens)
MTEB: text (up to 8191
OpenAIhttps://huggingface.co/spaces/mteb/leaderb
text-embedding-3-large 256 - 3072 64.6%
oard tokens)
Generating an embedding with
OpenAI SDK
Use the OpenAI SDK with OpenAI.com, Azure, Ollama, or
GitHub Models:
openai_client = openai.OpenAI(
base_url="https://models.inference.ai.azure.com",
api_key=os.environ["GITHUB_TOKEN"]
)

Generate embeddings for single or multiple inputs:

embeddings_response = openai_client.embeddings.create(
model ="text-embedding-3-small",
dimensions=1536,
input="hello world"
)
print(embeddings_response.data[0].embedding)

Notebook: generate_embedding.ipynb
Vector embeddings vary across
models
"queen "queen "queen
" " "
word2vec-google-news-300 text-embedding-ada-002 text-embedding-3-small
300 dimensions 1536 dimensions 1536 dimensions
[0.0052490234375, [-0.00449855113402009, [0.04379640519618988,
-0.1435546875, -0.006737332791090012, -0.03982372209429741,
-0.0693359375,...] - 0.044741131365299225, ...]
0.002418933203443885, ...]

Notebook: comparison.ipynb
Vector similarity
We compute embeddings so that we can calculate similarity between inputs.
The most common distance measurement is cosine similarity.

def cosine_similarity(v1, v2):

dot_product = sum(
[a * b for a, b in zip(v1,
v2)])

magnitude = (
sum([a**2 for a in v1]) *
sum([a**2 for a in v2])) ** 0.5

return dot_product / magnitude

Notebook: similarity.ipynb
Similarity space varies across models

text-embedding-ada-002 text-embedding-3-small (1536)

cosin cosin
word word
e e
1.000 1.000
dog dog
0 0
anima 0.885 anima 0.661
l 5 l 9
0.866 0.650
god cat
0 2
0.863 0.618
cat car
5 5
0.856 0.592
fish horse
6 7
0.855 0.573
bird boat
5 7
0.853 0.565
Similarity values range across models
Cosine similarity of "dog" to 1000 other words across two models.

text-embedding-ada-002 text-embedding-3-small (1536)

Business uses for vector similarity
Recommendation system:

https://learn.microsoft.com/azure/postgresql/flexible-server/generative-ai-recommendation-system

Fraud detection:

https://www.redpanda.com/blog/fraud-detection-pipeline-redpanda-
pinecone
Vector search
Vector search
1 Compute the embedding vector for the query
2 Find K closest vectors for the query vector
• Search exhaustively or using approximations

Query Query vector K closest vectors

Compute Search
embedding vector existing vectors

[-0.003335318, - [[“snake”, [-0.122, ..],

“tortoise” 0.0176891904,…] Search [“frog”, [-0.045, ..]]]
OpenAI
create embedding existing vectors
Exhaustive vector search in Python
An exhaustive search checks every single vector for the closest
one.
def exhaustive_search(query_vector, vectors):

similarities = []
for title, vector in vectors.items():
similarity = cosine_similarity(query_vector,
vector)
similarities.append((title, similarity))

similarities.sort(key=lambda x: x[1], reverse=True)

return similarities
Notebook: search.ipynb
ANN (Approximate Nearest Neighbor)
search
There are multiple ANN search algorithms that can speed up
search time.
Algorithm Python package Example database support

HNSW hnswlib PostgreSQL pgvector extension

Azure AI Search
Chromadb
Weaviate
DiskANN diskannpy Cosmos DB

IVFFlat faiss PostgreSQL pgvector extension

Faiss faiss None, in-memory index only

HNSW: Hierarchical Navegable Small
Worlds
The HNSW algorithm is great for situations where your index may
be frequently updated, and scales logarithmically even with large
indexes.
import hnswlib

p = hnswlib.Index(space='cosine', dim=1536)
p.init_index(
max_elements=len(movies),
ef_construction=200,
M=16)

vectors = list(movies.values())
ids = list([i for i in range(len(vectors))])
p.add_items(vectors, ids)

p.set_ef(50)
From HNSW research paper:
https://github.com/nmslib/hnswlib
Business use: Retrieval Augmented
Generation
Vector search can greatly improve the retrieval step in RAG.

Azure OpenAI +
Azure AI Search +
Azure AI Vision +
Azure App Service +

Code:
aka.ms/ragchat

Demo:
aka.ms/ragchat/demo

Join upcoming stream on RAG on 3/18! aka.ms/PythonAI/series

Vector distance metrics
Common distance metrics
Four common distance metrics between two vectors are:

1. Euclidean distance
2. Manhattan distance
3. Inner product
4. Cosine distance

The metric that we pick may depend on whether the vectors are unit
vectors.
Notebook: distance_metrics.ipynb
Unit vectors
A unit vector is a vector with a magnitude of 1.
def magnitude(vector):
return sum([a**2 for a in vector]) ** 0.5

Two vectors with same magnitude After normalization,

of 3.7416573867739413: two vectors with same magnitude
of 1:

[1, 2, 3] [0.26726124 0.53452248 0.80178373]

[3, 1, 2] [0.80178373 0.26726124 0.53452248]
Euclidean distance
The straight-line distance between two points in Euclidean space.

def euclidean(v1, v2):

return magnitude(v1 - v2)

euclidean(
[0.26726124 0.53452248
0.80178373],
[0.80178373 0.26726124
0.53452248]
)

0.65
5
Manhattan distance
The "taxicab" distance between two points in Euclidean space.

def manhattan(v1, v2):

return sum(abs(a - b)
for a, b in zip(v1, v2))

manhattan(
[0.26726124 0.53452248
0.80178373],
[0.80178373 0.26726124
0.53452248]
)

1.07
Dot product
The sum of products of corresponding vector elements.

def dot_product(v1, v2):

return sum(a * b
for a, b in zip(v1, v2))

x1y1 + x2y2 + x3y3

dot_product(
[0.26726124 0.53452248
0.80178373],
[0.80178373 0.26726124
0.53452248]
)

0.78
6
Cosine distance
The complement of the cosine of the angle between two vectors in
Euclidean space.
def cosine_similarity(v1, v2):
return dot_product(v1, v2) /
(magnitude(v1) * magnitude(v2))

def cosine_distance(v1, v2):

return 1 - cosine_similarity(v1,
v2)

cosine_distance(
[0.26726124 0.53452248
0.80178373],
[0.80178373 0.26726124
0.53452248]
)

0.21
Cosine similarity vs. Dot product
For unit vectors, the cosine similarity is the same as the dot
product.

>> cosine_similarity(v1, v2) == dot_product(v1, v2)

True

>>> 1 - cosine_distance(v1, v2) == dot_product(v1, v2)

True

In some vector databases, the dot product operator will be slightly faster
than cosine distance operators, since it does not need to calculate the
magnitude.

If your embeddings are unit vectors, consider using dot product as the metric.
OpenAI embedding models currently all output unit vectors!
Vector quantization
Vector quantization
Most vector embeddings are stored as floating point numbers (64-bit in
Python). We can use quantization to reduce the size of the embeddings.

 Scalar quantization: Reduce each number to an integer

[0.03265173360705376, [53, 40, 20, ...]

0.01370371412485838,
-0.017748944461345673,...]

 Binary quantization: Reduce each number to a single bit

[0.03265173360705376, [1, 1, 0, ...]

0.01370371412485838,
-0.017748944461345673,...]

Notebook: quantization.ipynb
Scalar quantization: The process
float3 int8
2
[0.03265173360705376, 0.01370371412485838, ...] [53, 40,...]
[-0.00786194484680891, - [27, 19, ...]
0.018985141068696976, ...] [29, 44, ...]
[-0.0039056178648024797,
0.019039113074541092, ...]
1. Calculate the min/max of all the embeddings
2. Normalize each embedding's values to [0, 1] range
3. Map normalized values into integer buckets from -128 to
+127
Min float Max float

~Min ~Max
observed observed
value value

-128 127
Scalar quantization: Before & after
"Moan
float3 a" int8
quantizati
2
[0.03265173360705376, [53, 40, 20, ...]
on
0.01370371412485838,
-0.017748944461345673,...]
Scalar quantization: Affects on
similarity
float3 int8
2
[0.03265173360705, 0.013703...] [53, 40,...]
[-0.00786194484680891, -0.0189...] [27, 19, ...]
[-0.0039056178648024797, [29, 44, ...]
0.0190...]
movie similarity movie similarity
Moana 1.000000 Moana 1.000000
Mulan 0.546800 ✅ Mulan 0.903532
Lilo & Stitch 0.502114 The Little Mermaid 0.894227
The Little Mermaid 0.498209 Lilo & Stitch 0.893718
Big Hero 6 0.491800 ✅ Big Hero 6 0.890959
Monsters University 0.484857 Monsters University 0.890915
✅
The Princess and the Frog 0.471984 ✅ The Princess and the Frog 0.889009

Finding Dory 0.471386 ✅ Finding Dory 0.888350

Maleficent 0.461029 Ice Princess 0.885539
Ice Princess 0.457817 Maleficent 0.885364
Binary quantization: The process
float3 bit
2
[0.03265173360705376, 0.01370371412485838, ...] [1, 1,...]
[-0.00786194484680891, - [0, 0, ...]
0.018985141068696976, ...] [0, 1, ...]
[-0.0039056178648024797,
0.019039113074541092, ...]

1. Pick a center C based on average, sample, or offline

knowledge
2. If value is >= C, map to 1, otherwise map to 0

0 C 1
Binary quantization: Before & after
"Moan
float3 a" bit
quantizati
2
[0.03265173360705376, [1, 1, 0, ...]
on
0.01370371412485838,
-0.017748944461345673,...]
Binary quantization: Affects on
similarity
float3 bit
2
[0.03265173360705, 0.013703...] [1, 1,...]
[-0.00786194484680891, -0.0189...] [0, 0, ...]
[-0.0039056178648024797, [0, 1, ...]
0.0190...]
movie similarity movie similarity
Moana 1.000000 Moana 1.000000
Mulan 0.546800 ✅ Mulan 0.686634
Lilo & Stitch 0.502114 The Little Mermaid 0.666260
The Little Mermaid 0.498209 The Princess and the Frog 0.659825
Big Hero 6 0.491800 Lilo & Stitch 0.657599
Monsters University 0.484857 ❌ Big Hero 6 0.655869

The Princess and the Frog 0.471984 Ice Princess 0.648046

Finding Dory 0.471386 ✅ Finding Dory 0.643830

Maleficent 0.461029 The Lion King 0.643088
Ice Princess 0.457817 Maleficent 0.642270
Quantization: effects on storage size
float3 int8 bit
2
[0.03265173360705,...] [53, 40,...] [1, 1,...]
[- [27, [0,
0.00786194484680891,...] 19, ...] 0, ...]
[- [29, [0,
0.00390561786480247,...] 44, ...] 1, ...]
Python built-
in number 12728 12728 12728
type
numpy typed
arrays 12400 1648 1648

Databases with vector storage support can often save more space
with bits,
using techniques such as bit packing.
Quantization: effects on index size in AI
Search
Azure AI Search supports quantization as a way to reduce vector storage
space needed.
float3 int8 bit
2
[0.03265173360705,...] [53, 40,...] [1, 1,...]
[- [27, [0,
0.00786194484680891,...] 19, ...] 0, ...]
[- [29, [0,
0.00390561786480247,...] 44, ...] 1, ...]
Vector
index size 1177.12 298.519 41.8636
(MB)

74.64% reduction! 96.44% reduction!

AI Search has two storage locations for vectors: the HNSW index used for
searching, and the actual data storage. The stats above are for index size.

Learn more in RAG time https://aka.ms/rag-time/journey3

series:
MRL dimension reduction
MRL: Matryoshka Representation
Learning
MRL is a technique that lets you reduce the dimensions of a
vector,
while still retaining much of the original semantic
The OpenAI text-embedding-3-large
representation. 3072
...
model
has default dimensions of 3072, 102 ...
4
but can be truncated all the way to
256. 512 ...
⚠️Only some models support MRL!
You can truncate either:
• when first generating embeddings 256 ...
• or when storing in database (if
supported)
Dimension reduction with OpenAI SDK
Specify dimensions when generating an embedding:
embeddings_response = openai_client.embeddings.create(
model ="text-embedding-3-small",
input="hello world",
dimensions=256
)

print(embeddings_response.data[0].embedding)

Notebook: dimension_reduction.ipynb
Dimension reduction: Before & after
"Moan "Moan
a" a"
dimensions=153 dimensions=256
6
[0.03265173360705376, [0.06316128373146057,
0.01370371412485838, 0.02650836855173111,
-0.017748944461345673,...] -0.03433343395590782,...]
Dimension reduction: Affects on
similarity
dimensions=1536 dimensions=256
[0.03265173360705376, [0.03265173360705376,
0.01370371412485838, 0.01370371412485838,
-0.017748944461345673,...] -0.017748944461345673,...]

movie similarity movie similarity

Moana 1.000000 Moana 1.000000
Mulan 0.546800 The Little Mermaid 0.587367
Lilo & Stitch 0.502114 Mulan 0.583428

The Little Mermaid 0.498209 Lilo & Stitch 0.575990

Big Hero 6 0.491800 ✅ Big Hero 6 0.574590

Monsters University 0.484857 The Princess and the Frog 0.568726
❌
The Princess and the Frog 0.471984 Finding Dory 0.549391
Finding Dory 0.471386 The Lion King 0.521125
Maleficent 0.461029 Tangled 0.513131
Ice Princess 0.457817 Maleficent 0.511412
❌
Dimension reduction plus
quantization
For maximum vector compression, combine both techniques!
...

1 MRL
Dimension To keep high accuracy,
Reduction only compress vectors in
... index,
oversample when retrieving,
2 Scalar or Binary and rescore using originals.
Quantization
. That's how Azure AI Search
can handle billions of vectors
Learn more in RAG time https://aka.ms/rag-time/journey3
Dive even deeper into vector
embeddings!
Vector embeddings 101 Quantization:
• Embedding projector • Scalar quantization 101
• Why are Cosine Similarities of Text embe • Product quantization 101
ddings almost always positive? • Binary and scalar quantiz
• Expected Angular Differences in Embeddi ation
ng Random Text?
• Embeddings: What they are and why the
y matter MRL dimension reduction:
• Unboxing Nomic Embed v
ANN algorithms 1.5: Resizable Production
• HNSW tutorial Embeddings with MRL
• Video: HNSW for Vector Search Explained • MRL from the Ground Up

Distance metrics:
• Two Forms of the Dot Product
• Is Cosine-Similarity of Embeddings Really
About Similarity?
Next steps 🧠 3/11: LLMs
Join upcoming streams! →
↖️ 3/13: Vector
embeddings
Come to office hours on
Thursdays in Discord: 🔍 3/18: RAG
aka.ms/pythonai/oh
3/20: Vision models
Get more Python AI 3/25: Structured outputs
resources 3/27: Quality & Safety
aka.ms/thesource/Python_A Register @ aka.ms/PythonAI/series
I
Thank you!

Amazon Web Services (AWS) Interview Questions and Answers
From Everand
Amazon Web Services (AWS) Interview Questions and Answers
Tech Interviews
4.5/5 (3)
KNN Assignment
No ratings yet
KNN Assignment
4 pages
AWS Solution Architect Certification Exam Practice Paper 2019
From Everand
AWS Solution Architect Certification Exam Practice Paper 2019
Tech Interviews
3.5/5 (3)
Lemony Snicket's All The Wrong Questions: "When Did You See Her Last?" (Book 2)
78% (18)
Lemony Snicket's All The Wrong Questions: "When Did You See Her Last?" (Book 2)
44 pages
Cosine Similarity in Machine Learning
No ratings yet
Cosine Similarity in Machine Learning
14 pages
RAGHack-AzureAISearch-Spanish
No ratings yet
RAGHack-AzureAISearch-Spanish
85 pages
Embeddings - A Simple Guide To Rag
No ratings yet
Embeddings - A Simple Guide To Rag
10 pages
Vector Space Model
No ratings yet
Vector Space Model
4 pages
LLM✅✅
No ratings yet
LLM✅✅
8 pages
Embeddings, Vector Databases, and Search in LLM
No ratings yet
Embeddings, Vector Databases, and Search in LLM
38 pages
PG1
No ratings yet
PG1
5 pages
Lin Alg
No ratings yet
Lin Alg
71 pages
Embeddings
No ratings yet
Embeddings
13 pages
Deeplearning - Ai Deeplearning - Ai
No ratings yet
Deeplearning - Ai Deeplearning - Ai
49 pages
Documents Similarity
No ratings yet
Documents Similarity
6 pages
Chapter 4- Part II
No ratings yet
Chapter 4- Part II
44 pages
IR-Lab Manual A1
No ratings yet
IR-Lab Manual A1
3 pages
Cosine Similarity
No ratings yet
Cosine Similarity
5 pages
Introduction To Vector Embeddings and Vector Databases
No ratings yet
Introduction To Vector Embeddings and Vector Databases
11 pages
Lec 1 & 2
No ratings yet
Lec 1 & 2
135 pages
L16--Vectors-27112024-113202am (2)
No ratings yet
L16--Vectors-27112024-113202am (2)
40 pages
Experiment 4 Code
No ratings yet
Experiment 4 Code
3 pages
Is Cosine-Similarity of Embeddings Really About Similarity
No ratings yet
Is Cosine-Similarity of Embeddings Really About Similarity
9 pages
Guided Learning Pathways Project: 4/7, 2011 Tetsuro Takahashi
No ratings yet
Guided Learning Pathways Project: 4/7, 2011 Tetsuro Takahashi
21 pages
Untitled presentation
No ratings yet
Untitled presentation
10 pages
Module 1-Sage Vectors and Norm
No ratings yet
Module 1-Sage Vectors and Norm
11 pages
Vertopal.com Math Linear Algebra
No ratings yet
Vertopal.com Math Linear Algebra
54 pages
Math Linear Algebra
No ratings yet
Math Linear Algebra
54 pages
Term Weighting & The Vector Space Model
No ratings yet
Term Weighting & The Vector Space Model
2 pages
1 (6)
No ratings yet
1 (6)
107 pages
ShortCourse-QTT-Lecture1
No ratings yet
ShortCourse-QTT-Lecture1
40 pages
Word Embeddings
No ratings yet
Word Embeddings
163 pages
What is Vector
No ratings yet
What is Vector
4 pages
PG2
No ratings yet
PG2
4 pages
Fqiwefp
No ratings yet
Fqiwefp
2 pages
Understanding+Vector+Embeddings
No ratings yet
Understanding+Vector+Embeddings
14 pages
Chapter1 Vectors
No ratings yet
Chapter1 Vectors
87 pages
2(d) Vector Space Model
No ratings yet
2(d) Vector Space Model
9 pages
Text Similarity Metrics
No ratings yet
Text Similarity Metrics
10 pages
Session 1
No ratings yet
Session 1
66 pages
CIKM2022_submission_3961
No ratings yet
CIKM2022_submission_3961
5 pages
Vector DB Guide
No ratings yet
Vector DB Guide
47 pages
Maths Roadmap For Machine Learning - Linear Algebra-1
No ratings yet
Maths Roadmap For Machine Learning - Linear Algebra-1
5 pages
Practical 07
No ratings yet
Practical 07
9 pages
BME303_Lab6_NinaSawaf
No ratings yet
BME303_Lab6_NinaSawaf
15 pages
NLP03 Vector Space Models
No ratings yet
NLP03 Vector Space Models
61 pages
CS115 Linear Algebra Review
No ratings yet
CS115 Linear Algebra Review
78 pages
Billion-Scale Similarity Search With GPUs
No ratings yet
Billion-Scale Similarity Search With GPUs
12 pages
alshammari-2023-ijca-922667
No ratings yet
alshammari-2023-ijca-922667
4 pages
L14 VSM
No ratings yet
L14 VSM
24 pages
Quantitative Methods For Computer Science Term 1, 2021 Introduction To Linear Algebra
No ratings yet
Quantitative Methods For Computer Science Term 1, 2021 Introduction To Linear Algebra
23 pages
Worksheet04 - Recommender Systems
No ratings yet
Worksheet04 - Recommender Systems
2 pages
Maths Roadmap for Machine Learning-1
No ratings yet
Maths Roadmap for Machine Learning-1
8 pages
Linear Algebra For Deep Learning. The Math Behind Every Deep Learning - by Vihar Kurama - Towards Data Science
No ratings yet
Linear Algebra For Deep Learning. The Math Behind Every Deep Learning - by Vihar Kurama - Towards Data Science
20 pages
Linear Algebra Tutorial
No ratings yet
Linear Algebra Tutorial
9 pages
Vector Calculus: Mengxia Zhu Fall 2007
No ratings yet
Vector Calculus: Mengxia Zhu Fall 2007
26 pages
Cosine Similarity_
No ratings yet
Cosine Similarity_
3 pages
Vector Calculus: Mengxia Zhu Fall 2007
No ratings yet
Vector Calculus: Mengxia Zhu Fall 2007
26 pages
THE FAISS LIBRARY
No ratings yet
THE FAISS LIBRARY
21 pages
Azure For Starters
From Everand
Azure For Starters
Chinmoy Mukherjee
No ratings yet
50 Recipes for Programming Node.js
From Everand
50 Recipes for Programming Node.js
Jamie Munro
3/5 (4)
Nsec Solved Past Paper 2016 PDF
No ratings yet
Nsec Solved Past Paper 2016 PDF
23 pages
Vertiv™ VR: Installer/User Guide
No ratings yet
Vertiv™ VR: Installer/User Guide
30 pages
International Finance: THE GOLD STANDARD (1876-1913)
No ratings yet
International Finance: THE GOLD STANDARD (1876-1913)
20 pages
Autocorrelation-Applied Tests
No ratings yet
Autocorrelation-Applied Tests
16 pages
Project Management Notes
No ratings yet
Project Management Notes
82 pages
Solution Manual For Operations and Supply Chain Management The Core Canadian Edition 2nd Edition by Jacobs
No ratings yet
Solution Manual For Operations and Supply Chain Management The Core Canadian Edition 2nd Edition by Jacobs
9 pages
My Resume
No ratings yet
My Resume
3 pages
Materials Lab Report
No ratings yet
Materials Lab Report
10 pages
Positive Voltage Regulator Ics: Description
No ratings yet
Positive Voltage Regulator Ics: Description
58 pages
Web Designing Unit-1
No ratings yet
Web Designing Unit-1
27 pages
Process Productive of Donuts
No ratings yet
Process Productive of Donuts
3 pages
Aiwan e Quaid Smile
No ratings yet
Aiwan e Quaid Smile
2 pages
Can, Could, Be Able - Context Exercises With Key (2 Pages + Key) .
No ratings yet
Can, Could, Be Able - Context Exercises With Key (2 Pages + Key) .
4 pages
#47 the Darkest Vengeance
No ratings yet
#47 the Darkest Vengeance
24 pages
L1S2 - Opportunity Cost and Marginal Analysis Exercises
No ratings yet
L1S2 - Opportunity Cost and Marginal Analysis Exercises
2 pages
Abstract class and Interface
No ratings yet
Abstract class and Interface
47 pages
4 Lachman Midlife Pivotal Period
No ratings yet
4 Lachman Midlife Pivotal Period
25 pages
User Manual QK75 Terminal v3
No ratings yet
User Manual QK75 Terminal v3
5 pages
Soal Latihan Olimpiade
No ratings yet
Soal Latihan Olimpiade
3 pages
4) Lecture_4_Unit 1 - PAYMENT VOUCHER
No ratings yet
4) Lecture_4_Unit 1 - PAYMENT VOUCHER
15 pages
Mta24113109 - Ioqm - 25 - 08 - 2024 17 - 11 - 49
No ratings yet
Mta24113109 - Ioqm - 25 - 08 - 2024 17 - 11 - 49
2 pages
Volume 3 Curriculum Designs Final Dec 2017 Min
No ratings yet
Volume 3 Curriculum Designs Final Dec 2017 Min
183 pages
managementkkkkmmmmh
No ratings yet
managementkkkkmmmmh
1 page
Bandel Final Used Inst List
No ratings yet
Bandel Final Used Inst List
3 pages
Customers
No ratings yet
Customers
437 pages
Annotation
No ratings yet
Annotation
41 pages
Ninja User Manual
No ratings yet
Ninja User Manual
21 pages
Week 3-Motivation and Behavioural Change
No ratings yet
Week 3-Motivation and Behavioural Change
31 pages
What Is Non
No ratings yet
What Is Non
15 pages

PythonAI-VectorEmbeddingsForSharing

Uploaded by

PythonAI-VectorEmbeddingsForSharing

Uploaded by

Python +

3. Wait a few minutes for Codespace to start up

Different embedding models output different embeddings, with varying

Generate embeddings for single or multiple inputs:

def cosine_similarity(v1, v2):

return dot_product / magnitude

text-embedding-ada-002 text-embedding-3-small (1536)

text-embedding-ada-002 text-embedding-3-small (1536)

Query Query vector K closest vectors

[-0.003335318, - [[“snake”, [-0.122, ..],

similarities.sort(key=lambda x: x[1], reverse=True)

HNSW hnswlib PostgreSQL pgvector extension

IVFFlat faiss PostgreSQL pgvector extension

Faiss faiss None, in-memory index only

Join upcoming stream on RAG on 3/18! aka.ms/PythonAI/series

Two vectors with same magnitude After normalization,

[1, 2, 3] [0.26726124 0.53452248 0.80178373]

def euclidean(v1, v2):

def manhattan(v1, v2):

def dot_product(v1, v2):

x1y1 + x2y2 + x3y3

def cosine_distance(v1, v2):

>> cosine_similarity(v1, v2) == dot_product(v1, v2)

>>> 1 - cosine_distance(v1, v2) == dot_product(v1, v2)

 Scalar quantization: Reduce each number to an integer

[0.03265173360705376, [53, 40, 20, ...]

 Binary quantization: Reduce each number to a single bit

[0.03265173360705376, [1, 1, 0, ...]

Finding Dory 0.471386 ✅ Finding Dory 0.888350

1. Pick a center C based on average, sample, or offline

The Princess and the Frog 0.471984 Ice Princess 0.648046

Finding Dory 0.471386 ✅ Finding Dory 0.643830

74.64% reduction! 96.44% reduction!

Learn more in RAG time https://aka.ms/rag-time/journey3

movie similarity movie similarity

The Little Mermaid 0.498209 Lilo & Stitch 0.575990

Big Hero 6 0.491800 ✅ Big Hero 6 0.574590

You might also like