PythonAI-VectorEmbeddingsForSharing
PythonAI-VectorEmbeddingsForSharing
AI
Python + AI
🧠 3/11: LLMs
↖️ 3/13: Vector
embeddings
🔍 3/18: RAG
3/20: Vision models
3/25: Structured outputs
3/27: Quality
Register & Safety
aka.ms/PythonAI/serie
Python + AI
↖️Vector embeddings
Pamela Fox
Python Cloud Advocate
www.pamelafox.org
Today we'll cover...
• What are vector embeddings?
• Vector similarity space
• Vector search
• Vector distance metrics
• Vector quantization
• Dimension reduction
Vector embeddings 101
Want to follow along?
1. Open this GitHub repository:
https://github.com/pamelafox/vector-embeddings-demo
s
2. Use "Code" button to create a GitHub Codespace:
Notebook: generate_embedding.ipynb
Vector embeddings vary across
models
"queen "queen "queen
" " "
word2vec-google-news-300 text-embedding-ada-002 text-embedding-3-small
300 dimensions 1536 dimensions 1536 dimensions
[0.0052490234375, [-0.00449855113402009, [0.04379640519618988,
-0.1435546875, -0.006737332791090012, -0.03982372209429741,
-0.0693359375,...] - 0.044741131365299225, ...]
0.002418933203443885, ...]
Notebook: comparison.ipynb
Vector similarity
We compute embeddings so that we can calculate similarity between inputs.
The most common distance measurement is cosine similarity.
dot_product = sum(
[a * b for a, b in zip(v1,
v2)])
magnitude = (
sum([a**2 for a in v1]) *
sum([a**2 for a in v2])) ** 0.5
Notebook: similarity.ipynb
Similarity space varies across models
cosin cosin
word word
e e
1.000 1.000
dog dog
0 0
anima 0.885 anima 0.661
l 5 l 9
0.866 0.650
god cat
0 2
0.863 0.618
cat car
5 5
0.856 0.592
fish horse
6 7
0.855 0.573
bird boat
5 7
0.853 0.565
Similarity values range across models
Cosine similarity of "dog" to 1000 other words across two models.
https://learn.microsoft.com/azure/postgresql/flexible-server/generative-ai-recommendation-system
Fraud detection:
https://www.redpanda.com/blog/fraud-detection-pipeline-redpanda-
pinecone
Vector search
Vector search
1 Compute the embedding vector for the query
2 Find K closest vectors for the query vector
• Search exhaustively or using approximations
similarities = []
for title, vector in vectors.items():
similarity = cosine_similarity(query_vector,
vector)
similarities.append((title, similarity))
return similarities
Notebook: search.ipynb
ANN (Approximate Nearest Neighbor)
search
There are multiple ANN search algorithms that can speed up
search time.
Algorithm Python package Example database support
p = hnswlib.Index(space='cosine', dim=1536)
p.init_index(
max_elements=len(movies),
ef_construction=200,
M=16)
vectors = list(movies.values())
ids = list([i for i in range(len(vectors))])
p.add_items(vectors, ids)
p.set_ef(50)
From HNSW research paper:
https://github.com/nmslib/hnswlib
Business use: Retrieval Augmented
Generation
Vector search can greatly improve the retrieval step in RAG.
Azure OpenAI +
Azure AI Search +
Azure AI Vision +
Azure App Service +
Code:
aka.ms/ragchat
Demo:
aka.ms/ragchat/demo
1. Euclidean distance
2. Manhattan distance
3. Inner product
4. Cosine distance
The metric that we pick may depend on whether the vectors are unit
vectors.
Notebook: distance_metrics.ipynb
Unit vectors
A unit vector is a vector with a magnitude of 1.
def magnitude(vector):
return sum([a**2 for a in vector]) ** 0.5
euclidean(
[0.26726124 0.53452248
0.80178373],
[0.80178373 0.26726124
0.53452248]
)
0.65
5
Manhattan distance
The "taxicab" distance between two points in Euclidean space.
manhattan(
[0.26726124 0.53452248
0.80178373],
[0.80178373 0.26726124
0.53452248]
)
1.07
Dot product
The sum of products of corresponding vector elements.
0.78
6
Cosine distance
The complement of the cosine of the angle between two vectors in
Euclidean space.
def cosine_similarity(v1, v2):
return dot_product(v1, v2) /
(magnitude(v1) * magnitude(v2))
cosine_distance(
[0.26726124 0.53452248
0.80178373],
[0.80178373 0.26726124
0.53452248]
)
0.21
Cosine similarity vs. Dot product
For unit vectors, the cosine similarity is the same as the dot
product.
True
True
In some vector databases, the dot product operator will be slightly faster
than cosine distance operators, since it does not need to calculate the
magnitude.
If your embeddings are unit vectors, consider using dot product as the metric.
OpenAI embedding models currently all output unit vectors!
Vector quantization
Vector quantization
Most vector embeddings are stored as floating point numbers (64-bit in
Python). We can use quantization to reduce the size of the embeddings.
Notebook: quantization.ipynb
Scalar quantization: The process
float3 int8
2
[0.03265173360705376, 0.01370371412485838, ...] [53, 40,...]
[-0.00786194484680891, - [27, 19, ...]
0.018985141068696976, ...] [29, 44, ...]
[-0.0039056178648024797,
0.019039113074541092, ...]
1. Calculate the min/max of all the embeddings
2. Normalize each embedding's values to [0, 1] range
3. Map normalized values into integer buckets from -128 to
+127
Min float Max float
~Min ~Max
observed observed
value value
-128 127
Scalar quantization: Before & after
"Moan
float3 a" int8
quantizati
2
[0.03265173360705376, [53, 40, 20, ...]
on
0.01370371412485838,
-0.017748944461345673,...]
Scalar quantization: Affects on
similarity
float3 int8
2
[0.03265173360705, 0.013703...] [53, 40,...]
[-0.00786194484680891, -0.0189...] [27, 19, ...]
[-0.0039056178648024797, [29, 44, ...]
0.0190...]
movie similarity movie similarity
Moana 1.000000 Moana 1.000000
Mulan 0.546800 ✅ Mulan 0.903532
Lilo & Stitch 0.502114 The Little Mermaid 0.894227
The Little Mermaid 0.498209 Lilo & Stitch 0.893718
Big Hero 6 0.491800 ✅ Big Hero 6 0.890959
Monsters University 0.484857 Monsters University 0.890915
✅
The Princess and the Frog 0.471984 ✅ The Princess and the Frog 0.889009
0 C 1
Binary quantization: Before & after
"Moan
float3 a" bit
quantizati
2
[0.03265173360705376, [1, 1, 0, ...]
on
0.01370371412485838,
-0.017748944461345673,...]
Binary quantization: Affects on
similarity
float3 bit
2
[0.03265173360705, 0.013703...] [1, 1,...]
[-0.00786194484680891, -0.0189...] [0, 0, ...]
[-0.0039056178648024797, [0, 1, ...]
0.0190...]
movie similarity movie similarity
Moana 1.000000 Moana 1.000000
Mulan 0.546800 ✅ Mulan 0.686634
Lilo & Stitch 0.502114 The Little Mermaid 0.666260
The Little Mermaid 0.498209 The Princess and the Frog 0.659825
Big Hero 6 0.491800 Lilo & Stitch 0.657599
Monsters University 0.484857 ❌ Big Hero 6 0.655869
Databases with vector storage support can often save more space
with bits,
using techniques such as bit packing.
Quantization: effects on index size in AI
Search
Azure AI Search supports quantization as a way to reduce vector storage
space needed.
float3 int8 bit
2
[0.03265173360705,...] [53, 40,...] [1, 1,...]
[- [27, [0,
0.00786194484680891,...] 19, ...] 0, ...]
[- [29, [0,
0.00390561786480247,...] 44, ...] 1, ...]
Vector
index size 1177.12 298.519 41.8636
(MB)
AI Search has two storage locations for vectors: the HNSW index used for
searching, and the actual data storage. The stats above are for index size.
print(embeddings_response.data[0].embedding)
Notebook: dimension_reduction.ipynb
Dimension reduction: Before & after
"Moan "Moan
a" a"
dimensions=153 dimensions=256
6
[0.03265173360705376, [0.06316128373146057,
0.01370371412485838, 0.02650836855173111,
-0.017748944461345673,...] -0.03433343395590782,...]
Dimension reduction: Affects on
similarity
dimensions=1536 dimensions=256
[0.03265173360705376, [0.03265173360705376,
0.01370371412485838, 0.01370371412485838,
-0.017748944461345673,...] -0.017748944461345673,...]
1 MRL
Dimension To keep high accuracy,
Reduction only compress vectors in
... index,
oversample when retrieving,
2 Scalar or Binary and rescore using originals.
Quantization
. That's how Azure AI Search
can handle billions of vectors
Learn more in RAG time https://aka.ms/rag-time/journey3
Dive even deeper into vector
embeddings!
Vector embeddings 101 Quantization:
• Embedding projector • Scalar quantization 101
• Why are Cosine Similarities of Text embe • Product quantization 101
ddings almost always positive? • Binary and scalar quantiz
• Expected Angular Differences in Embeddi ation
ng Random Text?
• Embeddings: What they are and why the
y matter MRL dimension reduction:
• Unboxing Nomic Embed v
ANN algorithms 1.5: Resizable Production
• HNSW tutorial Embeddings with MRL
• Video: HNSW for Vector Search Explained • MRL from the Ground Up
Distance metrics:
• Two Forms of the Dot Product
• Is Cosine-Similarity of Embeddings Really
About Similarity?
Next steps 🧠 3/11: LLMs
Join upcoming streams! →
↖️ 3/13: Vector
embeddings
Come to office hours on
Thursdays in Discord: 🔍 3/18: RAG
aka.ms/pythonai/oh
3/20: Vision models
Get more Python AI 3/25: Structured outputs
resources 3/27: Quality & Safety
aka.ms/thesource/Python_A Register @ aka.ms/PythonAI/series
I
Thank you!